Error-tagging of CroLTeC (computer learner corpus of Croatian as a foreign language)

sažetak izlaganja sa skupa

Error-tagging of CroLTeC (computer learner corpus of Croatian as a foreign language)

Nives Mikelić Preradović

Vrsta prilog sa skupa (u zborniku)

Tip sažetak izlaganja sa skupa

Godina 2019

Nadređena publikacija E-dictionaries and e-lexicography

Stranice str. 103-103

Status objavljeno

Sažetak

W describe the error-tagging scheme developed for the CroLTeC
learner corpus (http://teitok.iltec.pt/croltec/index.php?action=home)
- the first computer learner corpus of Croatian as a foreign language.
CroLTeC contains essays collected from 755 students with 36 different
mother tongues, among which the most prominent were Spanish, English,
German, Polish, Chinese, French and Arabic. It consists of 6,213 essays,
out of which 1,217 were digitally born, while 4,996 essays were scanned,
transcribed in RTF format and converted into XML format. CroLTeC
has a total of 1,054,287 tokens, and essays have been collected on all 6
CEFR levels of language learning at Croaticum – Center for Croatian as
Second and Foreign Language at the Faculty of Humanities and Social
Sciences in Zagreb. All CroLTeC essays contain metadata about the title,
number and type of essay (homework, part of exam or field class, etc.).
Data were lemmatized and annotated with morphosyntactic tags with the
RELDI tagger (Ljubesic et al., 2016). Also, the corpus ise searchable by
age, sex, language proficiency level and the mother tongue of the learner.
The error-tagging scheme is partially based on Solar (the scheme of
Slovene’s developmental corpus) and the error-coding of the Cambridge
Learner Corpus and further tailored to Croatian language. The goal
of the development of the error-annotation scheme is to build a subcorpus that will serve as a repository of authentic data about the learner’s
interlanguage. It should enable researchers and teachers of Croatian as a
foreign language to explore the interlanguage, to discover the aspects of
the grammar that are the most difficult to master and to tailor teaching
materials to different groups of learners (not only according to their
Croatian language proficiency level, but also to their first language).
Finally, the error-tagged sub-corpus should also serve as a starting point
for designing computer-aided tools to correct lexical errors, misuse of
verbal tenses, phrasal verbs and collocations.

Ključne riječi

Error-tagging; learner corpus

Error-tagging of CroLTeC (computer learner corpus of Croatian as a foreign language)

Error-tagging of CroLTeC (computer learner corpus of Croatian as a foreign language)

Sažetak

Ključne riječi

Ostale publikacije

Automatizacija dostupnosti virtualnog okruženja pomoću PowerCLI

Usporedba performansi i implementaijce prosljeđivanja grafičkih kartica na hyper-v i kvm hipervizorima

SMB Over QUIC: A Performance Evaluation