CORPORA
NAME |
TYPE |
SIZE |
AVAILABILITY |
WEB PORTALS |
|
|
|
GIGAFIDA |
written corpus |
1.2 billion words |
special contract |
KRES |
written balanced corpus |
100 million words |
special contract |
GOS |
spoken corpus |
1 million words |
CC BY-NC-SA, download |
TEXT COLLECTIONS |
|
|
|
ŠOLAR |
learners’ corpus |
1 million words |
CC BY-NC-SA, download |
GOS |
spoken corpus |
1 million words |
CC BY-NC-SA, download |
DATA SETS (FOR LT) |
|
|
|
TRAINING CORPUS |
manually tagged corpus |
500,000 words |
CC BY-NC-SA, download |
ccGIDAFIDA |
tagged corpus |
100 million words |
CC BY-NC-SA, download |
ccKRES |
tagged corpus |
10 million words |
CC BY-NC-SA, download |