no code implementations • CLIB 2022 • Verginica Barbu Mititelu, Mihaela Cristescu, Maria Mitrofan, Bianca-Mădălina Zgreabăn, Elena-Andreea Bărbulescu
In this paper we present a new version of the Romanian journalistic treebank annotated with verbal multiword expressions of four types: idioms, light verb constructions, reflexive verbs and inherently adpositional verbs, the last type being recently added to the corpus.
no code implementations • BioNLP (ACL) 2022 • Maria Mitrofan, Vasile Pais
Recognition of named entities present in text is an important step towards information extraction and natural language understanding.
no code implementations • LREC 2022 • Tamás Váradi, Bence Nyéki, Svetla Koeva, Marko Tadić, Vanja Štefanec, Maciej Ogrodniczuk, Bartłomiej Nitoń, Piotr Pęzik, Verginica Barbu Mititelu, Elena Irimia, Maria Mitrofan, Dan Tufiș, Radovan Garabík, Simon Krek, Andraž Repar
This article presents the current outcomes of the CURLICAT CEF Telecom project, which aims to collect and deeply annotate a set of large corpora from selected domains.
no code implementations • loresmt (COLING) 2022 • Vasile Pais, Maria Mitrofan, Andrei-Marius Avram
This paper presents the usage of the RELATE platform for translation tasks involving the Romanian language.
1 code implementation • EMNLP (NLLP) 2021 • Vasile Pais, Maria Mitrofan, Carol Luca Gasan, Vlad Coneschi, Alexandru Ianov
Furthermore, the system combines multiple distributional representations of words, including word embeddings trained on a large legal domain corpus.
Ranked #1 on Named Entity Recognition (NER) on LegalNERo
no code implementations • NAACL (SMM4H) 2021 • Vasile Pais, Maria Mitrofan
This paper presents our contribution to the ProfNER shared task.
no code implementations • GWC 2019 • Verginica Mititelu, Maria Mitrofan
Given the alignment of the Romanian wordnet to the Princeton WordNet, this type of annotation can be further used for drawing comparisons between equivalent verbal literals in various languages, provided that such information is annotated in the wordnets of the respective languages and their wordnets are aligned to Princeton WordNet, and thus to the Romanian wordnet.
no code implementations • GWC 2019 • Elena Irimia, Maria Mitrofan, Verginica Mititelu
The evaluation is made for two situations: one in which the words are not semantically disambiguated before expanding the lexicon, and another one in which they are disambiguated with senses from the Romanian wordnet.
no code implementations • SMM4H (COLING) 2022 • Vasile Pais, Verginica Barbu Mititelu, Elena Irimia, Maria Mitrofan, Carol Luca Gasan, Roxana Micu
This paper introduces a manually annotated dataset for named entity recognition (NER) in micro-blogging text for Romanian language.
no code implementations • SMM4H (COLING) 2022 • Andrei-Marius Avram, Vasile Pais, Maria Mitrofan
This paper presents our system employed for the Social Media Mining for Health (SMM4H) 2022 competition Task 10 - SocialDisNER.
no code implementations • CMLC (LREC) 2022 • Vasile Pais, Maria Mitrofan, Verginica Barbu Mititelu, Elena Irimia, Roxana Micu, Carol Luca Gasan
Following the successful creation of a national representative corpus of contemporary Romanian language, we turned our attention to the social media text, as present in micro-blogging platforms.
no code implementations • LDL (ACL) 2022 • Verginica Barbu Mititelu, Elena Irimia, Vasile Pais, Andrei-Marius Avram, Maria Mitrofan
In this paper, we report on (i) the conversion of Romanian language resources to the Linked Open Data specifications and requirements, on (ii) their publication and (iii) interlinking with other language resources (for Romanian or for other languages).
1 code implementation • 29 Oct 2024 • Vasile Păiş, Radu Ion, Andrei-Marius Avram, Maria Mitrofan, Dan Tufiş
This paper presents the design and evolution of the RELATE platform.
no code implementations • CLIB 2022 • Radu Ion, Andrei-Marius Avram, Vasile Păiş, Maria Mitrofan, Verginica Barbu Mititelu, Elena Irimia, Valentin Badea
The paper will present the QA system and its integration with the Romanian language technologies portal RELATE, the COVID-19 data set and different evaluations of the QA performance.
no code implementations • 22 Nov 2021 • Vasile Păiş, Radu Ion, Andrei-Marius Avram, Elena Irimia, Verginica Barbu Mititelu, Maria Mitrofan
The paper contains a detailed description of the acquisition process, corpus statistics as well as an evaluation of the corpus influence on a low-latency ASR system as well as a dialogue component.
no code implementations • LREC 2020 • Dan Tufi{\textcommabelow{s}}, Maria Mitrofan, Vasile P{\u{a}}i{\textcommabelow{s}}, Radu Ion, Andrei Coman
We present the Romanian legislative corpus which is a valuable linguistic asset for the development of machine translation systems, especially for under-resourced languages.
no code implementations • LREC 2020 • Tam{\'a}s V{\'a}radi, Svetla Koeva, Martin Yamalov, Marko Tadi{\'c}, B{\'a}lint Sass, Bart{\l}omiej Nito{\'n}, Maciej Ogrodniczuk, Piotr P{\k{e}}zik, Verginica Barbu Mititelu, Radu Ion, Elena Irimia, Maria Mitrofan, Vasile P{\u{a}}i{\textcommabelow{s}}, Dan Tufi{\textcommabelow{s}}, Radovan Garab{\'\i}k, Simon Krek, Andraz Repar, Matja{\v{z}} Rihtar, Janez Brank
This article presents the current outcomes of the MARCELL CEF Telecom project aiming to collect and deeply annotate a large comparable corpus of legal documents.
no code implementations • WS 2019 • Radu Ion, Vasile Florian P{\u{a}}i{\textcommabelow{s}}, Maria Mitrofan
This paper describes the Named Entity Recognition system of the Institute for Artificial Intelligence {``}Mihai Dr{\u{a}}g{\u{a}}nescu{''} of the Romanian Academy (RACAI for short).
no code implementations • WS 2019 • Maria Mitrofan, Verginica Barbu Mititelu, Grigorina Mitrofan
In an era when large amounts of data are generated daily in various fields, the biomedical field among others, linguistic resources can be exploited for various tasks of Natural Language Processing.
no code implementations • WS 2019 • Verginica Barbu Mititelu, Ivelina Stoyanova, Svetlozara Leseva, Maria Mitrofan, Tsvetana Dimitrova, Maria Todorova
The contribution of this work is in outlining essential features of the description and classification of VMWEs and the cross-language comparison at the lexical level, which is essential for the understanding of the need for uniform annotation guidelines and a viable procedure for validation of the annotation.
no code implementations • RANLP 2017 • Maria Mitrofan, Radu Ion
This paper presents the adaptation of the Hidden Markov Models-based TTL part-of-speech tagger to the biomedical domain.
no code implementations • RANLP 2017 • Maria Mitrofan
Named Entity Recognition (NER) is an important component of natural language processing (NLP), with applicability in biomedical domain, enabling knowledge-discovery from medical texts.
Medical Named Entity Recognition named-entity-recognition +4