OLAC Record oai:www.ldc.upenn.edu:LDC2005S14 |
Metadata | ||
Title: | Levantine Arabic QT Training Data Set 4 (Speech + Transcripts) | |
Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
Bibliographic Citation: | Maamouri, Mohamed, Tim Buckwalter, and Hubert Jin. Levantine Arabic QT Training Data Set 4 (Speech + Transcripts) LDC2005S14. Web Download. Philadelphia: Linguistic Data Consortium, 2005 | |
Contributor: | Maamouri, Mohamed | |
Buckwalter, Tim | ||
Jin, Hubert | ||
Date (W3CDTF): | 2005 | |
Date Issued (W3CDTF): | 2005-06-15 | |
Description: | *Introduction* Levantine Arabic QT Training Data Set 4 (Speech + Transcripts) was developed by the Linguistic Data Consortium (LDC) and contains approximately 138 hours of conversational telephone speech in Levantine Arabic and the associated transcripts. *Data* This release contains 901 call total. The majority of speakers in this corpus are Lebanese. The data is similar to the training data in Set 3: Arabic CTS Levantine Fisher Training Data Set 3, Speech (LDC2005S07) and Arabic CTS Levantine Fisher Training Data Set 3, Transcripts. Here's a breakdown of the dialects and gender distribution for all 901 calls: Dialect Number of Calls Females Males Jordanian 171 71 100 Lebanese 1373 511 862 Palestinian 229 71 158 Syrian 29 12 17 Totals 1802 665 1137 All the calls are 2-channel ulaw sphere files with a sample rate of 8 kHz. All the transcripts are in UTF-8 format. The corpus also includes a word list with frequency of occurences. The list shows all the occurences of words in their pronunciation spellings mapped to their corresponding canonical forms, as well as their raw frequency (the amount of times they appear in the corpus) and source document frequency (the number of documents in which they appear in the corpus). *Samples* For an example of the data in this corpus, please view this audio sample (SPH) and transcript sample (TXT). *Updates* None at this time. | |
Format: | Sampling Rate: 8000 | |
Identifier: | LDC2005S14 | |
https://catalog.ldc.upenn.edu/LDC2005S14 | ||
ISBN: 1-58563-342-9 | ||
ISLRN: 546-803-428-857-5 | ||
DOI: 10.35111/a75r-qp57 | ||
Language: | North Levantine Arabic | |
South Levantine Arabic | ||
Language (ISO639): | apc | |
ajp | ||
License: | LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | |
Medium: | Distribution: Web Download | |
Publisher: | Linguistic Data Consortium | |
Publisher (URI): | https://www.ldc.upenn.edu | |
Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2005S14 | |
Rights Holder: | Portions © 2005 Trustees of the University of Pennsylvania | |
Type (DCMI): | Sound | |
Text | ||
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | The LDC Corpus Catalog | |
Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2005S14 | |
DateStamp: | 2022-01-20 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Maamouri, Mohamed; Buckwalter, Tim; Jin, Hubert. 2005. Linguistic Data Consortium. | |
Terms: | area_Asia country_JO country_SY dcmi_Sound dcmi_Text iso639_ajp iso639_apc olac_primary_text |