![]() |
OLAC Record oai:lindat.mff.cuni.cz:11234/1-1989 |
| Metadata | ||
| Title: | CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings | |
| Bibliographic Citation: | http://hdl.handle.net/11234/1-1989 | |
| Creator: | Ginter, Filip | |
| Hajič, Jan | ||
| Luotolahti, Juhani | ||
| Straka, Milan | ||
| Zeman, Daniel | ||
| Date (W3CDTF): | 2017-03-16T11:57:32Z | |
| Date Available: | 2017-03-16T11:57:32Z | |
| Description: | Automatic segmentation, tokenization and morphological and syntactic annotations of raw texts in 45 languages, generated by UDPipe (http://ufal.mff.cuni.cz/udpipe), together with word embeddings of dimension 100 computed from lowercased texts by word2vec (https://code.google.com/archive/p/word2vec/). For each language, automatic annotations in CoNLL-U format are provided in a separate archive. The word embeddings for all languages are distributed in one archive. Note that the CC BY-SA-NC 4.0 license applies to the automatically generated annotations and word embeddings, not to the underlying data, which may have different license and impose additional restrictions. Update 2018-09-03 =============== Added data in the 4 “surprise languages” from the 2017 ST: Buryat, Kurmanji, North Sami and Upper Sorbian. This has been promised before, during CoNLL-ST 2018 we gave the participants a link to this record saying the data was here. It wasn't, sorry. But now it is. | |
| Identifier (URI): | http://hdl.handle.net/11234/1-1989 | |
| Language: | Multiple languages | |
| Language (ISO639): | mul | |
| Publisher: | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) | |
| Rights: | Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) | |
| http://creativecommons.org/licenses/by-nc-sa/4.0/ | ||
| Subject: | CoNLL 2017 | |
| word embeddings | ||
| automatic annotation | ||
| Multiple languages | ||
| Subject (ISO639): | mul | |
| Type: | languageDescription | |
| Type (DCMI): | Text | |
| Type (OLAC): | language_description | |
OLAC Info |
||
| Archive: | LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
| Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
| GetRecord: | OAI-PMH request for OLAC format | |
| GetRecord: | Pre-generated XML file | |
OAI Info |
||
| OaiIdentifier: | oai:lindat.mff.cuni.cz:11234/1-1989 | |
| DateStamp: | 2021-06-29 | |
| GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
| Citation: | Ginter, Filip; Hajič, Jan; Luotolahti, Juhani; Straka, Milan; Zeman, Daniel. 2017. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL). | |
| Terms: | dcmi_Text iso639_mul olac_language_description | |
Inferred Metadata | ||
| Country: | ||
| Area: | ||