OLAC Record
oai:clarin.eurac.edu:20.500.12124/81

Metadata
Title:German Summary Corpus (GerSumCo) v1.0.0
Bibliographic Citation:http://hdl.handle.net/20.500.12124/81
Creator:Wedig, Helena
Strobl, Carola
Date (W3CDTF):2024-11-14T17:14:49Z
Date Available:2024-11-14T17:14:49Z
Description:The GerSumCo (German Summary Corpus) is a learner corpus comprising syntheses written by L2 German writers (CEFR B2/C1) and writers of L1 German. The corpus has been created with the objective of conducting a comparative analysis of the academic writing of L1 German and L2 German students. The two subcorpora (L1 and L2) contain a total of 286 texts (178 L1 and 108 L2), written by 286 students at 14 universities and language schools in Germany (Bamberg, Bochum, Dresden, Hamburg, Hildesheim, Kiel, Leipzig, Magdeburg, Osnabrück, Potsdam, Trier, Wuppertal), Poland (Gdansk) and China (Hangzhou). The texts were collected between 2022 and 2024 as part of a PhD research project about a contrastive interlanguage analysis using GerSumCo and Beldeko to identify L1-dependent features in cohesion in L2/L1 German. The metadata files (Meta_GerSumCo_L1 & Meta_GerSumCo_L2) contain the following information: - Up to three L1s of the writers - Up to three L2s of the writers - Collection date - Topic - Whether the text was written as homework or in class - Group of students the texts belonged to The file names contain the following information: - Whether the text is part of the L1 or L2 subcorpus - Topic The summaries, on average, consist of 230 words. The texts were either produced in class on computers or as homework, within a 60-minute time frame. Students were permitted to use online dictionaries, but no AI-based auxiliary means. They were required to summarise two texts on one of four topics related to language variation in German: Kiezdeutsch, Mundartdebatte in der Schweiz, Viadrinisch and Varianten-Wörterbuch des Deutschen. This version contains the TXT files of the texts and the CSV files containing the manual annotations of the texts with token ID, sentence ID, source text form, target form, automatic annotated lemma, POS (STTS) and simple UPOS part-of-speech tag.
Identifier (URI):http://hdl.handle.net/20.500.12124/81
Language:German
Language (ISO639):deu
Publisher:University of Antwerp
Rights:Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
https://creativecommons.org/licenses/by-nc/4.0/
Subject:cohesion
L2 German
summaries
learner language
synthesis writing
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  Eurac Research CLARIN Centre
Description:  http://www.language-archives.org/archive/clarin.eurac.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:clarin.eurac.edu:20.500.12124/81
DateStamp:  2024-11-14
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Wedig, Helena; Strobl, Carola. 2024. University of Antwerp.
Terms: area_Europe country_DE dcmi_Text iso639_deu olac_primary_text


http://www.language-archives.org/item.php/oai:clarin.eurac.edu:20.500.12124/81
Up-to-date as of: Wed Mar 26 0:59:34 EDT 2025