OLAC Record
oai:www.ldc.upenn.edu:LDC2005T13

Metadata
Title:CCGbank
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Hockenmaier, Julia, and Mark Steedman. CCGbank LDC2005T13. Web Download. Philadelphia: Linguistic Data Consortium, 2005
Contributor:Hockenmaier, Julia
Steedman, Mark
Date (W3CDTF):2005
Date Issued (W3CDTF):2005-05-15
Description:*Introduction* CCGbank was developed by the University of Edinburgh and contains approximately 49,000 sentences of English text formatted in Combinatory Categorial Grammar (CCG) derivations. The sentences used for this corpus are from Treebank-2 (LDC95T7) and represent 99.44% of the entire treebank. For the remaining 274 sentences, the translation algorithm failed to provide a CCG derivation. CCG is a grammatical theory which provides a completely transparent interface between surface syntax and underlying semantics. Each (complete or partial) syntactic derivation corresponds directly to an interpretable structure. This allows CCG to provide an account for the incremental nature of human language processing. The syntactic rules of CCG are based on categorial calculus and combinatory logic. The main attraction of using CCG for parsing is that it facilitates the recovery of the non-local dependencies involved in constructions such as extraction, coordination, control, and raising. *Data* There are three sets of files which mirror the directory and file structure of the Penn Treebank: the human readable files in HTML format, the machine-readable corpus files (.auto), and the predicate-argument structure files (.parg). The corpus also includes a lexicon specifying the categories that the words of a language can take and files detailing grammar rule instantiations. *Samples* For an example of the data in this corpus, please view this sample (HTML). *Update* The current version, 1.1, is a bug fix that supersedes the old package. It is available for download.
Identifier:LDC2005T13
https://catalog.ldc.upenn.edu/LDC2005T13
ISBN: 1-58563-340-2
ISLRN: 181-921-208-336-7
DOI: 10.35111/a589-6d76
Language:English
Language (ISO639):eng
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2005T13
Rights Holder:Portions © 2005 Julia Hockenmaier and Mark Steedman, © 2005 Trustees of the University of Pennsylvania
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2005T13
DateStamp:  2021-11-12
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Hockenmaier, Julia; Steedman, Mark. 2005. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Text iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2005T13
Up-to-date as of: Tue May 7 7:24:23 EDT 2024