no code implementations • NAACL (SMM4H) 2021 • Vasile Pais, Maria Mitrofan
This paper presents our contribution to the ProfNER shared task.
no code implementations • CMLC (LREC) 2022 • Vasile Pais, Maria Mitrofan, Verginica Barbu Mititelu, Elena Irimia, Roxana Micu, Carol Luca Gasan
Following the successful creation of a national representative corpus of contemporary Romanian language, we turned our attention to the social media text, as present in micro-blogging platforms.
no code implementations • LDL (ACL) 2022 • Verginica Barbu Mititelu, Elena Irimia, Vasile Pais, Andrei-Marius Avram, Maria Mitrofan
In this paper, we report on (i) the conversion of Romanian language resources to the Linked Open Data specifications and requirements, on (ii) their publication and (iii) interlinking with other language resources (for Romanian or for other languages).
no code implementations • SemEval (NAACL) 2022 • Vasile Pais
This paper presents RACAI’s system used for the shared task of “Multilingual Complex Named Entity Recognition (MultiCoNER)”, organized as part of the “The 16th International Workshop on Semantic Evaluation (SemEval 2022)”.
no code implementations • loresmt (COLING) 2022 • Vasile Pais, Maria Mitrofan, Andrei-Marius Avram
This paper presents the usage of the RELATE platform for translation tasks involving the Romanian language.
no code implementations • SMM4H (COLING) 2022 • Andrei-Marius Avram, Vasile Pais, Maria Mitrofan
This paper presents our system employed for the Social Media Mining for Health (SMM4H) 2022 competition Task 10 - SocialDisNER.
no code implementations • SMM4H (COLING) 2022 • Vasile Pais, Verginica Barbu Mititelu, Elena Irimia, Maria Mitrofan, Carol Luca Gasan, Roxana Micu
This paper introduces a manually annotated dataset for named entity recognition (NER) in micro-blogging text for Romanian language.
1 code implementation • EMNLP (NLLP) 2021 • Vasile Pais, Maria Mitrofan, Carol Luca Gasan, Vlad Coneschi, Alexandru Ianov
Furthermore, the system combines multiple distributional representations of words, including word embeddings trained on a large legal domain corpus.
Ranked #1 on Named Entity Recognition (NER) on LegalNERo
no code implementations • BioNLP (ACL) 2022 • Maria Mitrofan, Vasile Pais
Recognition of named entities present in text is an important step towards information extraction and natural language understanding.
2 code implementations • 6 Dec 2021 • Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Shrivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein, Jinho D. Choi, Eduard Hovy, Ondrej Dusek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo, Samuel Cahyawijaya, Emile Chapuis, Wanxiang Che, Mukund Choudhary, Christian Clauss, Pierre Colombo, Filip Cornell, Gautier Dagan, Mayukh Das, Tanay Dixit, Thomas Dopierre, Paul-Alexis Dray, Suchitra Dubey, Tatiana Ekeinhor, Marco Di Giovanni, Tanya Goyal, Rishabh Gupta, Louanes Hamla, Sang Han, Fabrice Harel-Canada, Antoine Honore, Ishan Jindal, Przemyslaw K. Joniak, Denis Kleyko, Venelin Kovatchev, Kalpesh Krishna, Ashutosh Kumar, Stefan Langer, Seungjae Ryan Lee, Corey James Levinson, Hualou Liang, Kaizhao Liang, Zhexiong Liu, Andrey Lukyanenko, Vukosi Marivate, Gerard de Melo, Simon Meoni, Maxime Meyer, Afnan Mir, Nafise Sadat Moosavi, Niklas Muennighoff, Timothy Sum Hon Mun, Kenton Murray, Marcin Namysl, Maria Obedkova, Priti Oli, Nivranshu Pasricha, Jan Pfister, Richard Plant, Vinay Prabhu, Vasile Pais, Libo Qin, Shahab Raji, Pawan Kumar Rajpoot, Vikas Raunak, Roy Rinberg, Nicolas Roberts, Juan Diego Rodriguez, Claude Roux, Vasconcellos P. H. S., Ananya B. Sai, Robin M. Schmidt, Thomas Scialom, Tshephisho Sefara, Saqib N. Shamsi, Xudong Shen, Haoyue Shi, Yiwen Shi, Anna Shvets, Nick Siegel, Damien Sileo, Jamie Simon, Chandan Singh, Roman Sitelew, Priyank Soni, Taylor Sorensen, William Soto, Aman Srivastava, KV Aditya Srivatsa, Tony Sun, Mukund Varma T, A Tabassum, Fiona Anting Tan, Ryan Teehan, Mo Tiwari, Marie Tolkiehn, Athena Wang, Zijian Wang, Gloria Wang, Zijie J. Wang, Fuxuan Wei, Bryan Wilie, Genta Indra Winata, Xinyi Wu, Witold Wydmański, Tianbao Xie, Usama Yaseen, Michael A. Yee, Jing Zhang, Yue Zhang
Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on.
2 code implementations • RANLP 2021 • Andrei-Marius Avram, Vasile Pais, Dan Tufis
EuroVoc is a multilingual thesaurus that was built for organizing the legislative documentary of the European Union institutions.
no code implementations • LREC 2020 • Vasile Pais, Radu Ion
This paper describes RACAI{'}s automatic term extraction system, which participated in the TermEval 2020 shared task on English monolingual term extraction.
no code implementations • LREC 2020 • Vasile Pais, Dan Tufi{\textcommabelow{s}}, Radu Ion
This paper describes RACAI{'}s word sense alignment system, which participated in the Monolingual Word Sense Alignment shared task organized at GlobaLex 2020 workshop.
4 code implementations • 15 Feb 2018 • Tiberiu Boros, Stefan Daniel Dumitrescu, Vasile Pais
In this paper we introduce a set of resources and tools aimed at providing support for natural language processing, text-to-speech synthesis and speech recognition for Romanian.