OLAC Record
oai:clarin.eurac.edu:20.500.12124/8

Metadata
Title:KrdWrd CANOLA Corpus 1.0
Bibliographic Citation:http://hdl.handle.net/20.500.12124/8
Creator:Stemle, Egon W.
Steger, Johannes M.
Date (W3CDTF):2019-08-14T15:55:13Z
Date Available:2019-08-14T15:55:13Z
Description:The CANOLA Corpus is a visually annotated English web corpus for training classification engines to remove boiler plate on unseen Web pages. It was harvested, annotated and evaluated by the tools and infrastructure of the KrdWrd Project.
Identifier (URI):http://hdl.handle.net/20.500.12124/8
Is Replaced By (URI):http://hdl.handle.net/20.500.12124/9
Language:English
Language (ISO639):eng
Publisher:Institute for Applied Linguistics, Eurac Research
Rights:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
https://creativecommons.org/licenses/by-sa/4.0/
Subject:boiler plate removal
web page cleaning
WaC
Web as Corpus
training data
manual annotation
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  Eurac Research CLARIN Centre
Description:  http://www.language-archives.org/archive/clarin.eurac.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:clarin.eurac.edu:20.500.12124/8
DateStamp:  2023-03-17
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Stemle, Egon W.; Steger, Johannes M. 2019. Institute for Applied Linguistics, Eurac Research.
Terms: area_Europe country_GB dcmi_Text iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:clarin.eurac.edu:20.500.12124/8
Up-to-date as of: Mon Sep 18 0:46:44 EDT 2023