OLAC Record
oai:clarin.eurac.edu:20.500.12124/68

Metadata
Title:Beldeko Summary Corpus v1.1.0
Bibliographic Citation:http://hdl.handle.net/20.500.12124/68
Creator:Strobl, Carola
Wedig, Helena
Date (W3CDTF):2023-05-05T21:39:53Z
Date Available:2023-05-05T21:39:53Z
Description:Beldeko Summary Corpus v1.1.0 The Beldeko (Belgisches Deutschkorpus) Summary Corpus is a learner corpus that consists of summaries written by advanced L2 German learners (CEF level B2-C1) with L1 Dutch. It has been created with the aim of investigating the academic writing skills in L2 German of third-year students of two bachelor programmes in Applied Linguistics and Linguistics and Literature, respectively. The corpus consists of 301 summaries (70774 tokens) written by 115 students of three intact classes (convenience sampling). The texts were collected at Ghent University (in 2013 and in 2014) and University College of Ghent (in 2013) as pre- and posttests of an intervention study on collaborative writing carried out by Carola Strobl in the context of her PhD research (Strobl, C. (2015). Affordances of online technologies for academic writing instruction in a foreign language. Ghent University, unpublished doctoral dissertation). 82 students produced three summaries each (pretest, posttest immediately after the three-weeks-intervention, delayed posttest six weeks after the intervention; missing data are indicated as n.a. in the metadata file) and 33 students produced two summaries each (pretest and posttest, missing data are indicated as n.a. in the metadata file). The metadata file (Beldeko_Summary_1.1.0_metadata.xlsx) provides information about: • Institution of data collection (HG= University College of Ghent, UG= Ghent University) • Year of data collection (2013, 2014) • Participants´ gender (f, m) • Number of texts written and number of tokens in each text (T1, T2, T3) The individual file names of the corpus reveal institution, year, unique ID of participant (per institution per year), text number, in the given order. The summaries contain between 37-330 words each, with a mean of 230 words (the targeted word count was between 220-250 words). Outliers regarding text length were unfinished texts produced by students who struggled with the time restriction of 60 minutes. The texts were written in class, on computers. Students were allowed to use online auxiliary means such as dictionaries. The task consisted in summarizing two texts (fragments of newspaper articles or interviews or websites) about a topic related to language variation in German each time (Kiezdeutsch, Mundartdebatte in der Schweiz, Viadrinisch, Varianten-Wörterbuch des Deutschen; see also word files provided in metadata). More specifically, the topics were distributed as follows: Kiezdeutsch: HG_2013_T1, UG_2013_T1, UG_2014_T1 Mundartdebatte in der Schweiz: HG_2013_T2, UG_2013_T2, UG_2014_T2 Viadrinisch: HG_2013_T3, Varianten-Wörterbuch des Deutschen: UG_2014_T3 The new version of the corpus (Beldeko 1.1.0) contains the manual annotations of the texts with token id, sentence id, source text form, target form, POS (STTS) and simple UPOS part-of-speech tag.
Identifier (URI):http://hdl.handle.net/20.500.12124/68
Language:German
Language (ISO639):deu
Publisher:Ghent University
University of Antwerp
Replaces (URI):http://hdl.handle.net/20.500.12124/15
Rights:Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
https://creativecommons.org/licenses/by-nc/4.0/
Subject:academic writing
L1 Dutch
L2 German
learner corpus
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  Eurac Research CLARIN Centre
Description:  http://www.language-archives.org/archive/clarin.eurac.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:clarin.eurac.edu:20.500.12124/68
DateStamp:  2023-10-27
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Strobl, Carola; Wedig, Helena. 2023. Ghent University.
Terms: area_Europe country_DE dcmi_Text iso639_deu olac_primary_text


http://www.language-archives.org/item.php/oai:clarin.eurac.edu:20.500.12124/68
Up-to-date as of: Fri Oct 17 1:18:50 EDT 2025