"Communication in Slovene" Slovensko Slovensko

Written Corpus of Slovene

the goal of the activity is a new written corpus of Slovene containing 1 billion words
its design will follow the examples of FIDA and FidaPLUS corpuses
XML TEI P5 format
lemmatised
fully morphosyntactically annotated
partly syntactically annotated
with named entity recognition
collecting and updating the materials is in progress from June 2008 to December 2013

Leader: Nataša Logar Berginc, Faculty of Social Sciences, University of Ljubljana

MŠŠ EU