![]() |
OLAC Record oai:www.ldc.upenn.edu:LDC2015T15 |
| Metadata | ||
| Title: | TS Wikipedia | |
| Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
| Bibliographic Citation: | Sezer, Taner, and Türker Sezer. TS Wikipedia LDC2015T15. Web Download. Philadelphia: Linguistic Data Consortium, 2015 | |
| Contributor: | Sezer, Taner | |
| Sezer, Türker | ||
| Date (W3CDTF): | 2015 | |
| Date Issued (W3CDTF): | 2015-07-15 | |
| Description: | *Introduction* TS Wikipedia is a collection of approximately 1.6 million processed Turkish Wikipedia pages. The data is tokenized and includes part-of-speech tags, morphological analysis, lemmas, bi-grams and tri-grams. *Data* The data is in a word-per-line format with five tab-separated columns: token, part-of-speech tag, morphological analysis, lemma and corrected token spelling if needed. All data is presented in UTF-8 XML files and was selected and filtered to reduce non-Turkish characters, mathematical formulas and non-Turkish entries. *Samples* Please view this sample. *Updates* None at this time. | |
| Extent: | Corpus size: 2279688 KB | |
| Identifier: | LDC2015T15 | |
| https://catalog.ldc.upenn.edu/LDC2015T15 | ||
| ISBN: 1-58563-723-8 | ||
| DOI: 10.35111/mem6-4951 | ||
| Language: | Turkish | |
| Language (ISO639): | tur | |
| License: | Creative Commons-Attribution-Share-Alike 3.0 (NFP, Non-Member): https://catalog.ldc.upenn.edu/license/creative-commons-attribution-share-alike-3-dot-0-nfp-non-member.pdf | |
| LDC For-Profit Membership Agreement: https://catalog.ldc.upenn.edu/license/ldc-for-profit-membership.pdf | ||
| Medium: | Distribution: Web Download | |
| Publisher: | Linguistic Data Consortium | |
| Publisher (URI): | https://www.ldc.upenn.edu | |
| Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2015T15 | |
| Rights Holder: | Portions © 2015 Taner Sezer, © 2015 Trustees of the University of Pennsylvania | |
| Type (DCMI): | Text | |
| Type (OLAC): | primary_text | |
OLAC Info |
||
| Archive: | The LDC Corpus Catalog | |
| Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
| GetRecord: | OAI-PMH request for OLAC format | |
| GetRecord: | Pre-generated XML file | |
OAI Info |
||
| OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2015T15 | |
| DateStamp: | 2020-11-30 | |
| GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
| Citation: | Sezer, Taner; Sezer, Türker. 2015. Linguistic Data Consortium. | |
| Terms: | area_Asia country_TR dcmi_Text iso639_tur olac_primary_text | |