OLAC Record
oai:www.ldc.upenn.edu:LDC2020T21

Metadata
Title:BOLT English PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Palmer, Martha, et al. BOLT English PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech LDC2020T21. Web Download. Philadelphia: Linguistic Data Consortium, 2020
Contributor:Palmer, Martha
Hwang, Jena D.
Bonial, Claire
O'Gorman, Tim
Gung, James
Stowe, Kevin
Green, Meredith
Date (W3CDTF):2020
Date Issued (W3CDTF):2020-09-15
Description:*Introduction* BOLT English PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech was developed by the University of Colorado Boulder - CLEAR (Computational Language and Education Research) and consists of propbank and verb sense disambiguation annotation on English discussion forum (DF), SMS/Chat and conversational telephone speech (CTS) data. The DARPA BOLT (Broad Operational Language Translation) program developed machine translation and information retrieval for less formal genres, focusing particularly on user-generated content. LDC supported the BOLT program by collecting informal data sources -- discussion forums, text messaging and chat -- in Chinese, Egyptian Arabic and English. The collected data was translated and annotated for various tasks including word alignment, treebanking, propbanking and co-reference. *Data* DF data was collected from the web using a combination of manual and automatic processes. SMS/Chat material was donated or collected via live platforms. CTS data was taken from LDC's Arabic and Chinese CALLHOME and CALLFRIEND telephone collections; the audio files were transcribed and translated into English. Propbank annotation and verb sense disambiguation were applied to BOLT phrase structure treebank annotation, specifically, to each predicate verb in a tree. Propbank annotation provided a layer of semantic annotation over treebank and was performed on all three genres. DF and SMS/Chat data was also annotated for verb sense disambiguation using Verbnet 3.2 classes. Annotation files are presented as UTF-8 encoded and are in either plain text or XML formats. *Sponsorship* This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-11-C-0145. The content does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred. *Samples* Please view this PropBank sample (TXT) and Sense sample (TXT). *Updates* None at this time.
Extent:Corpus size: 96135 KB
Identifier:LDC2020T21
https://catalog.ldc.upenn.edu/LDC2020T21
ISBN: 1-58563-943-5
ISLRN: 640-422-732-913-3
DOI: 10.35111/7TNG-WK28
Language:English
Language (ISO639):eng
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2020T21
Rights Holder:Portions © 1996, 1997, 2011-2020 Trustees of the University of Pennsylvania
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2020T21
DateStamp:  2021-11-08
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Palmer, Martha; Hwang, Jena D.; Bonial, Claire; O'Gorman, Tim; Gung, James; Stowe, Kevin; Green, Meredith. 2020. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Text iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2020T21
Up-to-date as of: Tue May 7 7:25:49 EDT 2024