OLAC Record
oai:www.ldc.upenn.edu:LDC2001T61

Metadata
Title:CALLHOME Spanish Dialogue Act Annotation
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Waibel, Alex, et al. CALLHOME Spanish Dialogue Act Annotation LDC2001T61. Web Download. Philadelphia: Linguistic Data Consortium, 2001
Contributor:Waibel, Alex
Lavie, Alon
Levin, Lori
Ries, Klaus
Valle-Argueta, Liza
Date (W3CDTF):2001
Description:*Introduction* The CALLHOME Spanish Dialogue Act Annotation Corpus, Linguistic Data Consortium (LDC) catalog number LDC2001T61 and ISBN 1-58563-197-3, was developed under Project CLARITY. The goal of CLARITY was to glean discourse information from unrestricted conversational speech using shallow, corpus-based analysis. The annotation was carried out at Interactive Systems Labs at Carnegie Mellon University. *Data* This publication used a three-level coding scheme to manually tag the LDC CALLHOME Spanish Transcripts. The three levels of the coding scheme are: * a dialogue act level consisting of a tag set extended from DAMSL and Switchboard; * a dialogue game level featuring short sequences of dialogue acts * a genre level similiar to topical segments. All available (120) dialogues have been annotated. Dialogue games are short sequences of dialogue acts such as question/answer pairs. Genres can be storytelling, discussion, planning, etc. Segmentation takes topics into account as well. Genres, games, and dialogue acts are annotated by type. Genres are additionally annotated for activities and topics (on a 0-5 scale), for the central object or person being discussed (who or what category), and contain a short synopsis of the segment. All available 120 CALLHOME Spanish dialogues have been annotated. The dialogue act annotation scheme is a further development of the SwitchBoard DAMSL tagset. Dialogue games are short sequences of dialogue acts such as question/answer pairs. Genres can be storytelling, discussion, planning etc. and the segmentation takes topic into account as well. Genres, games and dialogue acts are annotated for their type. Genres are additionally annotated for activities and topics (on a 0-5 scale), for the central object or person being discussed (who or what category) and contain a short gist of the segment. An example of the tagging from one conversation is presented below. Sm, eso es para eso, de seguro. No importa. No importa. Bueno aqum, la Zaida esta estudiando tambiin en la universidad con la Liana. Y qui estudia, mama, qui estan estudiando. [background speech] Estan estudiando Sociales. Ciencias Sociales. Ah, para maes- para maestra de Sociales. Sm *Updates* There are no updates at this time.
Extent:Corpus size: 13312 KB
Identifier:LDC2001T61
https://catalog.ldc.upenn.edu/LDC2001T61
ISBN: 1-58563-197-3
ISLRN: 431-710-911-315-6
DOI: 10.35111/wqck-a633
Language:Spanish
Language (ISO639):spa
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2001T61
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2001T61
DateStamp:  2020-11-30
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Waibel, Alex; Lavie, Alon; Levin, Lori; Ries, Klaus; Valle-Argueta, Liza. 2001. Linguistic Data Consortium.
Terms: area_Europe country_ES dcmi_Text iso639_spa olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2001T61
Up-to-date as of: Fri Dec 6 7:46:42 EST 2024