Gigafida_270_68 Kres_270_68 Gos_270_68

Web concordancers are computer programs which enable searching in large collections of texts – corpora – on the web. The interface developed in the Communication in Slovene project is simple, it was designed to enable user-friendly experience (also in schools). Through the interface, users can analyse and monitor how real modern Slovene is used, primarily in the two written corpora Gigafida and Kres, and the spoken corpus Gos.


The concordancer was designed by following good practice in other web concordancers, findings of corpus linguists and also opinions and comments of users. In our analyses and questionnaires it turned out that many of them do not use available features in the previous version of the concordancer or are not aware of them at all. We were surprised to learn that more than a quarter of users did not know about the possibility to search word lemmas and more than a third did not know anything about morpho-syntactic tagging. Results of the questionnaire showed that a large part of regular users actually do not have the appropriate knowledge to use this language resource properly or effectively. The final conclusion of the analysis was the decision that search routines have to be simplified to the extreme, and browsing and further processing of corpus data have to be intuitive and very easy to understand.


The concordancer does not require registration or authentication and it is not necessary to consult manuals, help pages or other information about the corpus before the first search. The first activity expected from the user is entering the query into the query window, and this window is actually the only item on the initial screen of the concordancer. In its intuitivneness, corpus search should resemble the searching the web with popular search engines.

We can search the corpus by entering a search string in the query window. The string could be a word (e.g. medved (bear), a combination of words (e.g. polarni medved (polar bear)) or a string of words containing also punctuation marks (e.g. kljub temu, da (in spite of). Advanced search enables users to employ additional filters, such as morpho-syntactic features of the search word or words appearing in the context. One does not have to study a special query language, filters can activated by selecting the desired features on predefined tables in the interface.


A significant difference in comparison with previous Slovene web concordancers is the introduction of automatic lemmatization in basic search. In the FidaPLUS concordancer the user had to specify that he or she is interested in all forms of the search word, while in the new concordancer one has to specify the converse condition – putting the query into quotes specifies that only the exact search string, and therefore one particular form in the morphological paradigm, should be shown in the result. As Slovene is a highly inflected language this kind of corpus searching is more intuitive.

An important new feature are filters on the left side of the screen. Filters appear automatically in each query and are based on the meta-data contained in each corpus document. In this way users can learn about the frequency of the search string according to different criteria such as the year of publication, text type etc. Filters also enable users to narrow down their result with one click, for example, they can choose only those concordances with text from the web, or only those collocates which are tagged as verbs in the corpus.

Functional clarity of the interface was achieved by offering the user only those functions and links that are really needed in each particular step of the work. Basically, this is reflected in the menu with three tabs, each tab enabling a different kind of corpus search and analysis: concordances, collocations and word lists. All tabs have similar features – the possiblity to export and print data, data filters, query window, search history etc. which can be found in the same part of the interface. At the same time, features which are different follow the most suitable presentation of the specific data they contain (concordances, lists etc.).


Web concordancers are results of two projects: concordancers for written corpora Gigafida and Kres were made in the “Communication in Slovene” project, and the concordancer for the spoken corpus Gos was made in the “Web concordancer for the national corpus of spoken Slovene” project. The owner of the written corpus concordancer is the Ministry of Education, Science and Sports. For further informaction or if you wish to use the concordancer for your corpus please write to the address info@slovenscina.eu.


Gigafida and Kres

Concept: Simon Rigač, Špela Arhar Holdt, Iztok Kosem, Simon Krek, Polona Gantar, Nataša Logar Berginc
Design and programming: Rok Rejc, Simon Rigač


Project leader (“Web concordancer for the national corpus of spoken Slovene”): Darinka Verdonik
Project partners:

Concept (prototype, coordination): Simon Rigač
Design and programming: Rok Rejc, Simon Rigač


