OLAC Record oai:lindat.mff.cuni.cz:11234/1-1989 |
Metadata | ||
Title: | CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings | |
Bibliographic Citation: | http://hdl.handle.net/11234/1-1989 | |
Creator: | Ginter, Filip | |
Hajič, Jan | ||
Luotolahti, Juhani | ||
Straka, Milan | ||
Zeman, Daniel | ||
Date (W3CDTF): | 2017-03-16T11:57:32Z | |
Date Available: | 2017-03-16T11:57:32Z | |
Description: | Automatic segmentation, tokenization and morphological and syntactic annotations of raw texts in 45 languages, generated by UDPipe (http://ufal.mff.cuni.cz/udpipe), together with word embeddings of dimension 100 computed from lowercased texts by word2vec (https://code.google.com/archive/p/word2vec/). For each language, automatic annotations in CoNLL-U format are provided in a separate archive. The word embeddings for all languages are distributed in one archive. Note that the CC BY-SA-NC 4.0 license applies to the automatically generated annotations and word embeddings, not to the underlying data, which may have different license and impose additional restrictions. Update 2018-09-03 =============== Added data in the 4 “surprise languages” from the 2017 ST: Buryat, Kurmanji, North Sami and Upper Sorbian. This has been promised before, during CoNLL-ST 2018 we gave the participants a link to this record saying the data was here. It wasn't, sorry. But now it is. | |
Identifier (URI): | http://hdl.handle.net/11234/1-1989 | |
Language: | Multiple languages | |
Language (ISO639): | mul | |
Publisher: | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) | |
Rights: | Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) | |
http://creativecommons.org/licenses/by-nc-sa/4.0/ | ||
Subject: | CoNLL 2017 | |
word embeddings | ||
automatic annotation | ||
Multiple languages | ||
Subject (ISO639): | mul | |
Type: | languageDescription | |
Type (DCMI): | Text | |
Type (OLAC): | language_description | |
OLAC Info |
||
Archive: | LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:lindat.mff.cuni.cz:11234/1-1989 | |
DateStamp: | 2021-06-29 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Ginter, Filip; Hajič, Jan; Luotolahti, Juhani; Straka, Milan; Zeman, Daniel. 2017. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL). | |
Terms: | dcmi_Text iso639_mul olac_language_description | |
Inferred Metadata | ||
Country: | ||
Area: |