OLAC Record
oai:www.ldc.upenn.edu:LDC97T12

Metadata
Title:DSO Corpus of Sense-Tagged English
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Hwee Tou Ng, and Hian Beng Lee. DSO Corpus of Sense-Tagged English LDC97T12. Web Download. Philadelphia: Linguistic Data Consortium, 1997
Contributor:Hwee Tou Ng
Hian Beng Lee
Date (W3CDTF):1997
Description:*Introduction* This corpus contains sense-tagged word occurrences for 121 nouns and 70 verbs which are among the most frequently occurring and ambiguous words in English. These occurrences are provided in about 192,800 sentences taken from the Brown corpus and the Wall Street Journal and have been hand tagged by students at the Linguistics Program of the National University of Singapore. WordNet 1.5 sense definitions of these nouns and verbs were used to identify a word sense for each occurrence of each word. *Data* In addition to providing the word occurrences in their full sentential context, the corpus includes complete listings of the WordNet 1.5 sense definitions used in the tagging. The following example illustrates the format of a sentence with a sense tag for the word "action," followed by the corresponding WordNet1.5 sense definition: ca01.db #020 `` These >> actions 8 proceeding, legal proceeding, judicial proceeding, proceedings -- (the institution of a legal action) => due process, due process of law -- (the administration of justice according to established rules and principles) => group action -- (action taken by a group of people) => act, human action, human activity -- (something that people do or cause to happen) (In the actual corpus, all tagged occurrences of a given noun or verb are stored together in one file, with each full sentence on one line; all noun and verb word sense definitions are stored together in two separate files.) This sense tagged corpus was provided by Hwee Tou Ng of the Defence Science Organisation (DSO) of Singapore. It was first reported in the following paper at ACL-96: "Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach," by Hwee Tou Ng and Hian Beng Lee, in Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, pages 40-47, Santa Cruz, California, USA, June 1996. ( http://xxx.lanl.gov/abs/cmp-lg/9606032 ) *Updates* There are no updates at this time.
Identifier:LDC97T12
https://catalog.ldc.upenn.edu/LDC97T12
ISBN: 1-58563-119-1
ISLRN: 690-427-158-676-8
DOI: 10.35111/84c8-t325
Language:English
Language (ISO639):eng
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC97T12
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC97T12
DateStamp:  2020-11-30
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Hwee Tou Ng; Hian Beng Lee. 1997. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Text iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC97T12
Up-to-date as of: Tue May 7 7:24:38 EDT 2024