no code implementations • CLIB 2022 • Verginica Barbu Mititelu, Mihaela Cristescu, Maria Mitrofan, Bianca-Mădălina Zgreabăn, Elena-Andreea Bărbulescu
In this paper we present a new version of the Romanian journalistic treebank annotated with verbal multiword expressions of four types: idioms, light verb constructions, reflexive verbs and inherently adpositional verbs, the last type being recently added to the corpus.
no code implementations • COLING (MWE) 2020 • Carlos Ramisch, Agata Savary, Bruno Guillaume, Jakub Waszczuk, Marie Candito, Ashwini Vaidya, Verginica Barbu Mititelu, Archna Bhatia, Uxoa Iñurrieta, Voula Giouli, Tunga Güngör, Menghan Jiang, Timm Lichte, Chaya Liebeskind, Johanna Monti, Renata Ramisch, Sara Stymne, Abigail Walsh, Hongzhi Xu
We present edition 1. 2 of the PARSEME shared task on identification of verbal multiword expressions (VMWEs).
no code implementations • LREC 2022 • Tamás Váradi, Bence Nyéki, Svetla Koeva, Marko Tadić, Vanja Štefanec, Maciej Ogrodniczuk, Bartłomiej Nitoń, Piotr Pęzik, Verginica Barbu Mititelu, Elena Irimia, Maria Mitrofan, Dan Tufiș, Radovan Garabík, Simon Krek, Andraž Repar
This article presents the current outcomes of the CURLICAT CEF Telecom project, which aims to collect and deeply annotate a set of large corpora from selected domains.
no code implementations • SMM4H (COLING) 2022 • Vasile Pais, Verginica Barbu Mititelu, Elena Irimia, Maria Mitrofan, Carol Luca Gasan, Roxana Micu
This paper introduces a manually annotated dataset for named entity recognition (NER) in micro-blogging text for Romanian language.
no code implementations • LDL (ACL) 2022 • Verginica Barbu Mititelu, Elena Irimia, Vasile Pais, Andrei-Marius Avram, Maria Mitrofan
In this paper, we report on (i) the conversion of Romanian language resources to the Linked Open Data specifications and requirements, on (ii) their publication and (iii) interlinking with other language resources (for Romanian or for other languages).
no code implementations • CMLC (LREC) 2022 • Vasile Pais, Maria Mitrofan, Verginica Barbu Mititelu, Elena Irimia, Roxana Micu, Carol Luca Gasan
Following the successful creation of a national representative corpus of contemporary Romanian language, we turned our attention to the social media text, as present in micro-blogging platforms.
no code implementations • CLIB 2020 • Svetlozara Leseva, Verginica Barbu Mititelu, Ivelina Stoyanova
Mature wordnets offer the opportunity of digging out interesting linguistic information otherwise not explicitly marked in the network.
no code implementations • LREC 2022 • Ana-Maria Barbu, Verginica Barbu Mititelu, Cătălin Mititelu
We present here the efforts of aligning two language resources for Romanian: the Romanian Reference Treebank and the Valence Lexicon of Romanian Verbs: for each occurrence of those verbs in the treebank that were included as entries in the lexicon, a set of valence frames is automatically assigned, then manually validated by two linguists and, when necessary, corrected.
no code implementations • CLIB 2020 • Andrei-Marius Avram, Verginica Barbu Mititelu
This paper presents an open-source wordnet editor that has been developed to ensure further expansion of the Romanian wordnet.
no code implementations • 17 Jun 2023 • Andrei-Marius Avram, Verginica Barbu Mititelu, Vasile Păiş, Dumitru-Clementin Cercel, Ştefan Trăuşan-Matu
Correctly identifying multiword expressions (MWEs) is an important task for most natural language processing systems since their misidentification can result in ambiguity and misunderstanding of the underlying text.
no code implementations • 22 Apr 2023 • Andrei-Marius Avram, Verginica Barbu Mititelu, Dumitru-Clementin Cercel
Multiword expressions are a key ingredient for developing large-scale and linguistically sound natural language processing technology.
no code implementations • CLIB 2022 • Radu Ion, Andrei-Marius Avram, Vasile Păiş, Maria Mitrofan, Verginica Barbu Mititelu, Elena Irimia, Valentin Badea
The paper will present the QA system and its integration with the Romanian language technologies portal RELATE, the COVID-19 data set and different evaluations of the QA performance.
no code implementations • 22 Nov 2021 • Vasile Păiş, Radu Ion, Andrei-Marius Avram, Elena Irimia, Verginica Barbu Mititelu, Maria Mitrofan
The paper contains a detailed description of the acquisition process, corpus statistics as well as an evaluation of the corpus influence on a low-latency ASR system as well as a dialogue component.
no code implementations • LREC 2020 • Tam{\'a}s V{\'a}radi, Svetla Koeva, Martin Yamalov, Marko Tadi{\'c}, B{\'a}lint Sass, Bart{\l}omiej Nito{\'n}, Maciej Ogrodniczuk, Piotr P{\k{e}}zik, Verginica Barbu Mititelu, Radu Ion, Elena Irimia, Maria Mitrofan, Vasile P{\u{a}}i{\textcommabelow{s}}, Dan Tufi{\textcommabelow{s}}, Radovan Garab{\'\i}k, Simon Krek, Andraz Repar, Matja{\v{z}} Rihtar, Janez Brank
This article presents the current outcomes of the MARCELL CEF Telecom project aiming to collect and deeply annotate a large comparable corpus of legal documents.
no code implementations • WS 2019 • Verginica Barbu Mititelu, Ivelina Stoyanova, Svetlozara Leseva, Maria Mitrofan, Tsvetana Dimitrova, Maria Todorova
The contribution of this work is in outlining essential features of the description and classification of VMWEs and the cross-language comparison at the lexical level, which is essential for the understanding of the need for uniform annotation guidelines and a viable procedure for validation of the annotation.
no code implementations • WS 2019 • Maria Mitrofan, Verginica Barbu Mititelu, Grigorina Mitrofan
In an era when large amounts of data are generated daily in various fields, the biomedical field among others, linguistic resources can be exploited for various tasks of Natural Language Processing.
no code implementations • WS 2019 • Verginica Barbu Mititelu, Mihaela Cristescu, Mihaela Onofrei
This paper reports on the Romanian journalistic corpus annotated with verbal multiword expressions following the PARSEME guidelines.
no code implementations • WS 2018 • Eduard Barbu, Verginica Barbu Mititelu
A hybrid pipeline comprising rules and machine learning is used to filter a noisy web English-German parallel corpus for the Parallel Corpus Filtering task.
no code implementations • COLING 2018 • Carlos Ramisch, Silvio Ricardo Cordeiro, Agata Savary, Veronika Vincze, Verginica Barbu Mititelu, Archna Bhatia, Maja Buljan, C, Marie ito, Polona Gantar, Voula Giouli, Tunga G{\"u}ng{\"o}r, Abdelati Hawwari, Uxoa I{\~n}urrieta, Jolanta Kovalevskait{\.e}, Simon Krek, Timm Lichte, Chaya Liebeskind, Johanna Monti, Carla Parra Escart{\'\i}n, Behrang Qasemizadeh, Renata Ramisch, Nathan Schneider, Ivelina Stoyanova, Ashwini Vaidya, Abigail Walsh
Corpora were created for 20 languages, which are also briefly discussed.
no code implementations • WS 2017 • Tiberiu Boros, Sonia Pipa, Verginica Barbu Mititelu, Dan Tufis
{``}Multiword expressions{''} are groups of words acting as a morphologic, syntactic and semantic unit in linguistic analysis.
no code implementations • LREC 2016 • Dan Tufi{\textcommabelow{s}}, Verginica Barbu Mititelu, Elena Irimia, {\textcommabelow{S}}tefan Daniel Dumitrescu, Tiberiu Boro{\textcommabelow{s}}
The article describes the current status of a large national project, CoRoLa, aiming at building a reference corpus for the contemporary Romanian language.
no code implementations • LREC 2014 • Verginica Barbu Mititelu, Elena Irimia, Dan Tufi{\textcommabelow{s}}
Our project is a joined effort of two institutes of the Romanian Academy.
no code implementations • LREC 2012 • Verginica Barbu Mititelu
Keeping pace with other wordnets development, we present the challenges raised by the Romanian derivational system and our methodology for identifying derived words and their stems in the Romanian Wordnet.