no code implementations • LREC (BUCC) 2022 • Diego Alves, Marko Tadić, Božo Bekavac
This article presents a comparative analysis of dependency parsing results for a set of 16 languages, coming from a large variety of linguistic families and genera, whose parallel corpora were used to train a deep-learning tool.
no code implementations • 28 Feb 2023 • Simon Gottschalk, Endri Kacupaj, Sara Abdollahi, Diego Alves, Gabriel Amaral, Elisavet Koutsiana, Tin Kuculo, Daniela Major, Caio Mello, Gullal S. Cheema, Abdul Sittar, Swati, Golsa Tahmasebzadeh, Gaurish Thakkar
Accessing and understanding contemporary and historical events of global impact such as the US elections and the Olympic Games is a major prerequisite for cross-lingual event analytics that investigate event causes, perception and consequences across country borders.
no code implementations • 14 Dec 2022 • Diego Alves, Gaurish Thakkar, Gabriel Amaral, Tin Kuculo, Marko Tadić
With the ever-growing popularity of the field of NLP, the demand for datasets in low resourced-languages follows suit.
no code implementations • 14 Dec 2022 • Jelena Sarajlić, Gaurish Thakkar, Diego Alves, Nives Mikelic Preradović
This paper presents a corpus annotated for the task of direct-speech extraction in Croatian.
no code implementations • 14 Dec 2022 • Diego Alves, Gaurish Thakkar, Marko Tadić
This article presents the application of the Universal Named Entity framework to generate automatically annotated corpora.
no code implementations • LREC 2020 • Diego Alves, Gaurish Thakkar, Marko Tadić
Due to the differences in terms of availability of language resources for each language, we have built this strategy in three steps, starting with processing chains for the well-resourced languages and finishing with the development of new modules for the under-resourced ones.
no code implementations • LREC 2020 • Diego Alves, Gaurish Thakkar, Marko Tadić
We considered the difference between reported and our tested results within a single percentage point as being within the limits of acceptable tolerance and thus consider this result as reproducible.
no code implementations • 23 Oct 2020 • Diego Alves, Tin Kuculo, Gabriel Amaral, Gaurish Thakkar, Marko Tadic
We introduce the Universal Named-Entity Recognition (UNER)framework, a 4-level classification hierarchy, and the methodology that isbeing adopted to create the first multilingual UNER corpus: the SETimesparallel corpus annotated for named-entities.