OLAC Record
oai:www.ldc.upenn.edu:LDC2015T23

Metadata
Title:KHATT: Handwritten Arabic Text
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Mahmoud, Sabri A., et al. KHATT: Handwritten Arabic Text LDC2015T23. Web Download. Philadelphia: Linguistic Data Consortium, 2015
Contributor:Mahmoud, Sabri A.
Ahmad, Irfan
Al-Khatib, Wasfi G.
Alshayeb, Mohammad
Parvez, Mohammad Tanvir
Märgner, Volker
Fink, Gernot A.
Date (W3CDTF):2015
Date Issued (W3CDTF):2015-11-16
Description:*Introduction* KHATT: Handwritten Arabic Text was developed by King Fahd University of Petroleum & Minerals, Technical University of Dortmund and Braunschweig University of Technology. It is comprised of scanned Arabic handwriting from 1,000 distinct male and female writers representing diverse countries, age groups, handedness and education levels. Participants produced text on a topic of their choice in an unrestricted style. KHATT was designed to promote research in areas such as text recognition and writer identification. *Data* The majority of participants were natives of Saudi Arabia; the next largest group was from a collection of regional countries (Egypt, Jordan, Kuwait, Morocco, Palestine, Tunisia and Yemen). Most writers were between 16-25 years of age with high school or university qualifications. Scanned text is presented as tiff images scanned at 200, 300 and 600 DPI (dots per inch). The source images are four-page tiffs consisting of metadata about the writer, fixed paragraphs and free writing. Image files of isolated paragraphs or lines are also included. Ground-truth files are presented as plain-text Unicode. Data is divided into training, validation and test sets. *Samples* Please view this image sample and this text sample. *Updates* None at this time.
Extent:Corpus size: 28956648 KB
Identifier:LDC2015T23
https://catalog.ldc.upenn.edu/LDC2015T23
ISBN: 1-58563-736-X
ISLRN: 866-063-772-506-2
DOI: 10.35111/vc52-tm53
Language:Arabic
Language (ISO639):ara
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2015T23
Rights Holder:Portions © 2015 King Fahd University of Petroleum & Minerals, Trustees of the University of Pennsylvania
Type (DCMI):StillImage
Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2015T23
DateStamp:  2021-03-09
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Mahmoud, Sabri A.; Ahmad, Irfan; Al-Khatib, Wasfi G.; Alshayeb, Mohammad; Parvez, Mohammad Tanvir; Märgner, Volker; Fink, Gernot A. 2015. Linguistic Data Consortium.
Terms: dcmi_StillImage dcmi_Text iso639_ara olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2015T23
Up-to-date as of: Fri Dec 6 7:48:27 EST 2024