![]() |
OLAC Record oai:www.ldc.upenn.edu:LDC2021T18 |
| Metadata | ||
| Title: | BOLT Egyptian Arabic PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech | |
| Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
| Bibliographic Citation: | Palmer, Martha, et al. BOLT Egyptian Arabic PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech LDC2021T18. Web Download. Philadelphia: Linguistic Data Consortium, 2021 | |
| Contributor: | Palmer, Martha | |
| Hwang, Jena D. | ||
| Mansouri, Aous | ||
| Bonial, Claire | ||
| O'Gorman, Tim | ||
| Gung, James | ||
| Date (W3CDTF): | 2021 | |
| Date Issued (W3CDTF): | 2021-11-15 | |
| Description: | *Introduction* BOLT Egyptian Arabic PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech was developed by the University of Colorado Boulder - CLEAR (Computational Language and Education Research) and consists of propbank annotation on Egyptian Arabic discussion forum (DF), SMS/Chat and conversational telephone speech (CTS) data. The DARPA BOLT (Broad Operational Language Translation) program developed machine translation and information retrieval for less formal genres, focusing particularly on user-generated content. LDC supported the BOLT program by collecting informal data sources -- discussion forums, text messaging and chat -- in Chinese, Egyptian Arabic and English. The collected data was translated and annotated for various tasks including word alignment, treebanking, propbanking and co-reference. *Data* DF data was collected from the web using a manual process. SMS/Chat material was donated or collected via live platforms. CTS data was taken from LDC's Egyptian Arabic CALLHOME and CALLFRIEND telephone collections. Propbank annotation provides a layer of semantic annotation over treebank. In this release, it was applied to BOLT phrase structure treebank annotation and was carried out in two phases: (1) a frame file for each predicate was created, and (2) the predicate argument structure was annotated using the frame file as a reference. Annotation files are presented as UTF-8 encoded and are in either plain text or XML formats. *Sponsorship* This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-11-C-0145. The content does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred. *Samples* Please view this PropBank sample (TXT) and frame sample (XML). *Updates* None at this time. | |
| Extent: | Corpus size: 104462 KB | |
| Identifier: | LDC2021T18 | |
| https://catalog.ldc.upenn.edu/LDC2021T18 | ||
| ISBN: 1-58563-978-8 | ||
| DOI: 10.35111/j2xe-rr02 | ||
| Language: | Egyptian Arabic | |
| Language (ISO639): | arz | |
| License: | LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | |
| Medium: | Distribution: Web Download | |
| Publisher: | Linguistic Data Consortium | |
| Publisher (URI): | https://www.ldc.upenn.edu | |
| Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2021T18 | |
| Rights Holder: | Portions © 1996, 1997, 2002, 2012-2017, 2019, 2021 Trustees of the University of Pennsylvania | |
| Type (DCMI): | Text | |
| Type (OLAC): | primary_text | |
OLAC Info |
||
| Archive: | The LDC Corpus Catalog | |
| Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
| GetRecord: | OAI-PMH request for OLAC format | |
| GetRecord: | Pre-generated XML file | |
OAI Info |
||
| OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2021T18 | |
| DateStamp: | 2025-09-30 | |
| GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
| Citation: | Palmer, Martha; Hwang, Jena D.; Mansouri, Aous; Bonial, Claire; O'Gorman, Tim; Gung, James. 2021. Linguistic Data Consortium. | |
| Terms: | area_Africa country_EG dcmi_Text iso639_arz olac_primary_text | |