OLAC Record oai:clarin.eurac.edu:20.500.12124/9 |
Metadata | ||
Title: | KrdWrd CANOLA Corpus 1.1 | |
Bibliographic Citation: | http://hdl.handle.net/20.500.12124/9 | |
Creator: | Stemle, Egon W. | |
Steger, Johannes M. | ||
Date (W3CDTF): | 2019-08-14T16:05:22Z | |
Date Available: | 2019-08-14T16:05:22Z | |
Description: | The CANOLA Corpus is a visually annotated English web corpus for training classification engines to remove boiler plate on unseen Web pages. It was harvested, annotated and evaluated by the tools and infrastructure of the KrdWrd Project. | |
Identifier (URI): | http://hdl.handle.net/20.500.12124/9 | |
Language: | English | |
Language (ISO639): | eng | |
Publisher: | Institute for Applied Linguistics, Eurac Research | |
Replaces (URI): | http://hdl.handle.net/20.500.12124/8 | |
Rights: | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) | |
https://creativecommons.org/licenses/by-sa/4.0/ | ||
Subject: | boiler plate removal | |
web page cleaning | ||
WaC | ||
Web as Corpus | ||
training data | ||
manual annotation | ||
Type: | corpus | |
Type (DCMI): | Text | |
Type (OLAC): | primary_text | |
OLAC Info |
||
Archive: | Eurac Research CLARIN Centre | |
Description: | http://www.language-archives.org/archive/clarin.eurac.edu | |
GetRecord: | OAI-PMH request for OLAC format | |
GetRecord: | Pre-generated XML file | |
OAI Info |
||
OaiIdentifier: | oai:clarin.eurac.edu:20.500.12124/9 | |
DateStamp: | 2023-03-17 | |
GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
Citation: | Stemle, Egon W.; Steger, Johannes M. 2019. Institute for Applied Linguistics, Eurac Research. | |
Terms: | area_Europe country_GB dcmi_Text iso639_eng olac_primary_text |