Slovene corpus summarizer


The Korpusnik tool was developed as part of the SLOKIT project (full name: Upgrading CLARIN.SI: Corpus Summarizer and Text Analyzer), which took place in 2022-2023 and was financed by the Ministry of Culture of the Republic of Slovenia. The project leader was dr. Iztok Kosem. The Jožef Stefan Institute (leading partner) and the Slovenian Association of Disabled Students participated in the project. Infrastructural support outside of project funding (hosting and maintenance of tools) is provided by the Centre for Language Resources and Technologies of the University of Ljubljana, also a member of the CLARIN.SI consortium. The main purpose of the project was to upgrade the CLARIN.SI research infrastructure portal with services that will bring the contents of the portal, especially corpora, closer to a wider range of users. The CLARIN.SI repository is indeed very rich with various data sources about the Slovenian language, which are mainly intended for researchers and developers - the goal was to simplify access to relevant and interesting data about the Slovenian language in a user-friendly way. One of the key activities to ensure this was, among other things, the improvement of information in reference corpora of the Slovenian language, i.e. the Gigafida corpus and the Gos corpus, which also benefits linguists and language resource developers. Also developed in the project was the SENTA tool for text simplification and analysis. Citation: Korpusnik: Slovene corpus summarizer. https://korpusnik.cjvt.si/, accessed dd. mm. yyyy.