OLAC Record
oai:lindat.mff.cuni.cz:11234/1-4698

Metadata
Title:Coreference in Universal Dependencies 1.0 (CorefUD 1.0)
Bibliographic Citation:http://hdl.handle.net/11234/1-4698
Creator:Nedoluzhko, Anna
Novák, Michal
Popel, Martin
Žabokrtský, Zdeněk
Zeldes, Amir
Zeman, Daniel
Bourgonje, Peter
Cinková, Silvie
Hajič, Jan
Hardmeier, Christian
Krielke, Pauline
Landragin, Frédéric
Lapshinova-Koltunski, Ekaterina
Martí, M. Antònia
Mikulová, Marie
Ogrodniczuk, Maciej
Recasens, Marta
Stede, Manfred
Straka, Milan
Toldova, Svetlana
Vincze, Veronika
Žitkus, Voldemaras
Date (W3CDTF):2022-04-06T12:53:57Z
Date Available:2022-04-06T12:53:57Z
Description:CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version 1.0 consists of 17 datasets for 11 languages. The datasets are enriched with automatic morphological and syntactic annotations that are fully compliant with the standards of the Universal Dependencies project. All the datasets are stored in the CoNLL-U format, with coreference- and bridging-specific information captured by attribute-value pairs located in the MISC column. The collection is divided into a public edition and a non-public (ÚFAL-internal) edition. The publicly available edition is distributed via LINDAT-CLARIAH-CZ and contains 13 datasets for 10 languages (1 dataset for Catalan, 2 for Czech, 2 for English, 1 for French, 2 for German, 1 for Hungarian, 1 for Lithuanian, 1 for Polish, 1 for Russian, and 1 for Spanish), excluding the test data. The non-public edition is available internally to ÚFAL members and contains additional 4 datasets for 2 languages (1 dataset for Dutch, and 3 for English), which we are not allowed to distribute due to their original license limitations. It also contains the test data portions for all datasets. When using any of the harmonized datasets, please get acquainted with its license (placed in the same directory as the data) and cite the original data resource too. Version 1.0 consists of the same corpora and languages as the previous version 0.2; however, the English GUM dataset has been updated to a newer and larger version, and in the Czech/English PCEDT dataset, the train-dev-test split has been changed to be compatible with OntoNotes. Nevertheless, the main change is in the file format (the MISC attributes have new form and interpretation).
Identifier (URI):http://hdl.handle.net/11234/1-4698
Is Replaced By (URI):http://hdl.handle.net/11234/1-5053
Language:Catalan
Czech
Dutch
English
French
German
Hungarian
Lithuanian
Polish
Russian
Spanish
Language (ISO639):cat
ces
nld
eng
fra
deu
hun
lit
pol
rus
spa
Publisher:Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Replaces (URI):http://hdl.handle.net/11234/1-4598
Rights:Licence CorefUD v0.2
https://lindat.mff.cuni.cz/repository/xmlui/page/license-corefud-0.2
Subject:dependency
treebank
coreference
bridging relations
harmonized annotation
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11234/1-4698
DateStamp:  2023-02-25
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Nedoluzhko, Anna; Novák, Michal; Popel, Martin; Žabokrtský, Zdeněk; Zeldes, Amir; Zeman, Daniel; Bourgonje, Peter; Cinková, Silvie; Hajič, Jan; Hardmeier, Christian; Krielke, Pauline; Landragin, Frédéric; Lapshinova-Koltunski, Ekaterina; Martí, M. Antònia; Mikulová, Marie; Ogrodniczuk, Maciej; Recasens, Marta; Stede, Manfred; Straka, Milan; Toldova, Svetlana; Vincze, Veronika; Žitkus, Voldemaras. 2022. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL).
Terms: area_Europe country_CZ country_DE country_ES country_FR country_GB country_HU country_LT country_NL country_PL country_RU dcmi_Text iso639_cat iso639_ces iso639_deu iso639_eng iso639_fra iso639_hun iso639_lit iso639_nld iso639_pol iso639_rus iso639_spa olac_primary_text


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11234/1-4698
Up-to-date as of: Thu Oct 5 0:43:10 EDT 2023