![]() |
OLAC Record oai:dspace-clarin-it.ilc.cnr.it:000-c0-111/1052 |
Metadata | ||
Title: | StarwarsNER French Italian Corpus - sample | |
Bibliographic Citation: | http://hdl.handle.net/20.500.11752/ILC-1052 | |
Creator: | Frontini, Francesca | |
Chahinian, Nanée | ||
Aelami, Mitra | ||
Cardillo, Franco Alberto | ||
Conard, Serge | ||
Debole, Franca | ||
Date (W3CDTF): | 2025-10-17T18:39:28Z | |
Date Available: | 2025-10-17T18:39:28Z | |
Description: | The StarwarsNER French Italian Corpus - sample is a multilingual benchmark resource for Named Entity Recognition (NER) in the wastewater and stormwater management domain. It supports research in: - Information extraction - Relation extraction - Entity linking The corpus consists of manually annotated parallel French and Italian documents, aligned at the sentence level. Annotations follow a domain-specific schema based on the Sewer Network Ontology <http://hdl.handle.net/20.500.11752/ILC-1037>. For copyright reasons, this release contains only a sample of the original corpus, namely 8 French documents from public administrations and their Italian translations. --- ## Resource Creation 1. **French corpus** - Collected from reports, regulations, and local media texts. - Manually annotated according to the STARWARS schema. 2. **Italian corpus** - Produced via machine translation of the French texts. - Reviewed and corrected by bilingual translation students and expert hydrologists. 3. **Annotation process** - Conducted with the **INCEpTION** annotation platform. - Ensured consistent alignment between French and Italian. For details, please refer to the publication: F.A. Cardillo, F. Debole, F. Frontini, M. Aelami, N. Chahinian, S. Conrad (2025) “Novel Benchmark for NER in the Wastewater and Stormwater Domain”, Proceedings of the 6th IEEE MNLP Conf. (CiST-MNLP’2025) 4-10 October 2025, Marrakech, Morocco. <https://arxiv.org/abs/2506.01938> --- ## Contents of this Package - **Texts**: Provided in plain text. - **Annotations**: Provided in **CONLL 2003 format, as exported from INCEpTION**. - **Annotation guidelines**: Included in both **French** and **Italian**, as used by annotators. | |
Identifier (URI): | http://hdl.handle.net/20.500.11752/ILC-1052 | |
Language: | Italian | |
French | ||
Language (ISO639): | ita | |
fra | ||
Publisher: | Istituto di Linguistica Computazionale “A. Zampolli” - Consiglio Nazionale delle Ricerche (ILC-CNR) | |
Institute of Information Science and Technologies "Alessandro Faedo" - National Research Council of Italy (ISTI CNR) | ||
Institut de Recherche pour le Développement | ||
Université de Montpellier | ||
Rights: | Creative Commons - Attribution 4.0 International (CC BY 4.0) | |
https://creativecommons.org/licenses/by/4.0 | ||
Subject: | Named Entity Recognition | |
Sewer Network | ||
Type: | corpus | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | ILC-CNR for CLARIN-IT repository hosted at Institute for Computational Linguistics "A. Zampolli", National Research Council, in Pisa | |
Description: | http://www.language-archives.org/archive/dspace-clarin-it.ilc.cnr.it | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:dspace-clarin-it.ilc.cnr.it:000-c0-111/1052 | |
DateStamp: | 2025-10-17 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Frontini, Francesca; Chahinian, Nanée; Aelami, Mitra; Cardillo, Franco Alberto; Conard, Serge; Debole, Franca. 2025. Istituto di Linguistica Computazionale “A. Zampolli” - Consiglio Nazionale delle Ricerche (ILC-CNR). | |
Terms: | area_Europe country_FR country_IT dcmi_Text iso639_fra iso639_ita olac_primary_text |