no code implementations • ParlaCLARIN (LREC) 2022 • Maciej Ogrodniczuk, Petya Osenova, Tomaž Erjavec, Darja Fišer, Nikola Ljubešić, Çağrı Çöltekin, Matyáš Kopp, Meden Katja
In ParlaMint I, a CLARIN-ERIC supported project in pandemic times, a set of comparable and uniformly annotated multilingual corpora for 17 national parliaments were developed and released in 2021.
1 code implementation • 4 Nov 2022 • Angel Daza, Antske Fokkens, Tomaž Erjavec
We also propose and present the results of a method for expanding the identified abbreviations in context.
no code implementations • 31 Mar 2020 • Tomaž Erjavec
MULTEXT-East language resources, a multilingual dataset for language engineering research, focused on the morphosyntactic level of linguistic description.
no code implementations • 5 Jun 2019 • Nikola Ljubešić, Darja Fišer, Tomaž Erjavec
In this paper we present datasets of Facebook comment threads to mainstream media posts in Slovene and English developed inside the Slovene national project FRENK which cover two topics, migrants and LGBT, and are manually annotated for different types of socially unacceptable discourse (SUD).
no code implementations • 5 Jun 2019 • Nikola Ljubešić, Darja Fišer, Tomaž Erjavec
This paper presents a dataset and supervised learning experiments for term extraction from Slovene academic texts.
no code implementations • 18 Feb 2016 • Mark A. Finlayson, Tomaž Erjavec
This chapter outlines the process of creating end-to-end linguistic annotations, identifying specific tasks that researchers often perform.