OLAC Record
oai:www.ldc.upenn.edu:LDC2024S06

Metadata
Title:Diaspora Tibetan Speech
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Geissler, Christopher, Sarah Babinski, and Jason Shaw. Diaspora Tibetan Speech LDC2024S06. Web Download. Philadelphia: Linguistic Data Consortium, 2024
Contributor:Geissler, Christopher
Babinski, Sarah
Shaw, Jason
Date (W3CDTF):2024
Date Issued (W3CDTF):2024-06-17
Description:*Introduction* Diaspora Tibetan Speech was developed at Yale University. It contains approximately 28 hours of Tibetan elicited speech by 73 speakers from the diaspora Tibetan community in Kathmandu, Nepal, along with transcripts, elicitation materials and speaker demographic information. *Data* Recordings were collected in 2016. All speakers were adults and varied in age as well as age of diaspora. A substantial number of speakers were born in Nepal. Each speaker contributed one recording comprising a series of elicitation tasks: some demographic information; a word list and numbers; some sentences in isolation; a scripted story; and free speech based on "frog story" type illustrations. All elicitation materials are included with the corpus documentation in PDF format. The word- and number-list sections of the recordings were time aligned at the word level as Praat TextGrids. Five recordings were fully transcribed word-for-word by a native Tibetan speaker and are presented in both Microsoft Word and PDF format to preserve font encoding. The transcripts are not time-aligned but include general time stamps. Other transcripts are available as Excel spreadsheets with word-to-word correspondence of Tibetan script, phonetic transcription, and English translation. Demographic information includes age at recording, age at diaspora, and other information. The audio data is presented as single channel, 16 kHz, 16-bit wav files. *Sample* Please view the following samples: * Audio (wav) * Transcription (docx) * Dictionary (xlsx) * TextGrid *Updates* None at this time.
Extent:Corpus size: 3100518 KB
Format:Sampling Rate: 16000
Sampling Format: pcm
Identifier:LDC2024S06
https://catalog.ldc.upenn.edu/LDC2024S06
ISLRN: 883-684-044-738-1
DOI: 10.35111/b8wr-w485
Language:Tibetan
Language (ISO639):bod
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2024S06
Rights Holder:Portions © 2024 Dr. Christopher Geissler, © 2024 Dr. Sarah Babinski, © 2024 Dr. Jason Shaw, © 2024 Trustees of the University of Pennsylvania
Type (DCMI):Sound
Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2024S06
DateStamp:  2024-06-20
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Geissler, Christopher; Babinski, Sarah; Shaw, Jason. 2024. Linguistic Data Consortium.
Terms: area_Asia country_CN dcmi_Sound dcmi_Text iso639_bod olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2024S06
Up-to-date as of: Fri Dec 6 7:49:17 EST 2024