OLAC Record
oai:lindat.mff.cuni.cz:11234/1-5682

Metadata
Title:Corpus from the Aozora Bunko Library
Bibliographic Citation:http://hdl.handle.net/11234/1-5682
Creator:Rohacek, Jakub
Date (W3CDTF):2025-02-03T10:50:20Z
Date Available:2025-02-03T10:50:20Z
Description:This corpus contains a subset of available texts from the Aozora Bunko public library project, which contains various works of mostly older literature in Japanese. A custom python script was used to compile it from its official GitHub directory in order to fit specific requirements. It excluded any text currently not freely available in the public domain and organized the output into approximately same-sized text files. Furthermore, they contain an XML structure using tags to denote individual documents (books) as well as provide basic bibliographic information about their author, year, and title.
Identifier (URI):http://hdl.handle.net/11234/1-5682
Language:Japanese
Language (ISO639):jpn
Publisher:Masaryk University, NLP Centre
Rights:Creative Commons - Attribution 4.0 International (CC BY 4.0)
http://creativecommons.org/licenses/by/4.0/
Subject:Aozora
Bunko
Corpus
Japanese
Literature
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11234/1-5682
DateStamp:  2025-02-03
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Rohacek, Jakub. 2025. Masaryk University, NLP Centre.
Terms: area_Asia country_JP dcmi_Text iso639_jpn olac_primary_text


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11234/1-5682
Up-to-date as of: Sat Mar 22 1:07:32 EDT 2025