no code implementations • EACL (VarDial) 2021 • Bharathi Raja Chakravarthi, Gaman Mihaela, Radu Tudor Ionescu, Heidi Jauhiainen, Tommi Jauhiainen, Krister Lindén, Nikola Ljubešić, Niko Partanen, Ruba Priyadharshini, Christoph Purschke, Eswari Rajagopal, Yves Scherrer, Marcos Zampieri
This paper describes the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2021.
no code implementations • LREC 2022 • Farhad Akhbardeh, Marcos Zampieri, Cecilia Ovesdotter Alm, Travis Desell
Event identification in technical logbooks poses challenges given the limited logbook data available in specific technical domains, the large set of possible classes, and logbook entries typically being in short form and non-standard technical language.
no code implementations • NAACL (BEA) 2022 • Kai North, Marcos Zampieri, Matthew Shardlow
Identifying complex words in texts is an important first step in text simplification (TS) systems.
no code implementations • Findings (EMNLP) 2021 • Liviu P. Dinu, Ioan-Bogdan Iordache, Ana Sabina Uban, Marcos Zampieri
In this paper we study pejorative language, an under-explored topic in computational linguistics.
no code implementations • WMT (EMNLP) 2021 • Farhad Akhbardeh, Arkady Arkhangorodsky, Magdalena Biesialska, Ondřej Bojar, Rajen Chatterjee, Vishrav Chaudhary, Marta R. Costa-Jussa, Cristina España-Bonet, Angela Fan, Christian Federmann, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Leonie Harter, Kenneth Heafield, Christopher Homan, Matthias Huck, Kwabena Amponsah-Kaakyire, Jungo Kasai, Daniel Khashabi, Kevin Knight, Tom Kocmi, Philipp Koehn, Nicholas Lourie, Christof Monz, Makoto Morishita, Masaaki Nagata, Ajay Nagesh, Toshiaki Nakazawa, Matteo Negri, Santanu Pal, Allahsera Auguste Tapo, Marco Turchi, Valentin Vydrin, Marcos Zampieri
This paper presents the results of the newstranslation task, the multilingual low-resourcetranslation for Indo-European languages, thetriangular translation task, and the automaticpost-editing task organised as part of the Con-ference on Machine Translation (WMT) 2021. In the news task, participants were asked tobuild machine translation systems for any of10 language pairs, to be evaluated on test setsconsisting mainly of news stories.
no code implementations • WMT (EMNLP) 2020 • Santanu Pal, Marcos Zampieri
In this paper we present the WIPRO-RIT systems submitted to the Similar Language Translation shared task at WMT 2020.
no code implementations • VarDial (COLING) 2020 • Mihaela Gaman, Dirk Hovy, Radu Tudor Ionescu, Heidi Jauhiainen, Tommi Jauhiainen, Krister Lindén, Nikola Ljubešić, Niko Partanen, Christoph Purschke, Yves Scherrer, Marcos Zampieri
This paper presents the results of the VarDial Evaluation Campaign 2020 organized as part of the seventh workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with COLING 2020.
no code implementations • 11 Dec 2024 • Nishat Raihan, Christian Newman, Marcos Zampieri
Large language models (LLMs) have demonstrated remarkable capabilities across various NLP tasks and have recently expanded their impact to coding tasks, bridging the gap between natural languages (NL) and programming languages (PL).
no code implementations • 24 Oct 2024 • Shafkat Farabi, Tharindu Ranasinghe, Diptesh Kanojia, Yu Kong, Marcos Zampieri
In this paper, we present the first comprehensive survey on multimodal sarcasm detection - henceforth MSD - to date.
no code implementations • 23 Oct 2024 • Nishat Raihan, Joanna C. S. Santos, Marcos Zampieri
The recently introduced Mojo programming language (PL) by Modular, has received significant attention in the scientific community due to its claimed significant speed boost over Python.
1 code implementation • 21 Oct 2024 • Nishat Raihan, Mohammed Latif Siddiq, Joanna C. S. Santos, Marcos Zampieri
Large language models (LLMs) are becoming increasingly better at a wide range of Natural Language Processing tasks (NLP), such as text generation and understanding.
no code implementations • 20 Oct 2024 • Mamadou K. Keita, Christopher Homan, Sofiane Abdoulaye Hamani, Adwoa Bremang, Marcos Zampieri, Habibatou Abdoulaye Alfari, Elysabhete Amadou Ibrahim, Dennis Owusu
Our experiments show that the MT-based approach using the M2M100 model outperforms others, achieving a detection rate of 95. 82% and a suggestion accuracy of 78. 90% in automatic evaluations, and scoring 3. 0 out of 5. 0 in logical/grammar error correction during MEs by native speakers.
no code implementations • 19 Oct 2024 • Nishat Raihan, Antonios Anastasopoulos, Marcos Zampieri
The HumanEval Benchmark, developed by OpenAI, remains the most widely used code generation benchmark.
no code implementations • 11 Oct 2024 • Ana-Maria Bucur, Andreea-Codrina Moldovan, Krutika Parvatikar, Marcos Zampieri, Ashiqur R. KhudaBukhsh, Liviu P. Dinu
In this context, we present a survey on natural language processing (NLP) approaches to modeling depression in social media, providing the reader with a post-COVID-19 outlook.
no code implementations • 18 Sep 2024 • Sujan Dutta, Deepak Pandita, Tharindu Cyril Weerasooriya, Marcos Zampieri, Christopher M. Homan, Ashiqur R. KhudaBukhsh
Ensuring annotator quality in training and evaluation data is a key piece of machine learning in NLP.
no code implementations • 26 Aug 2024 • Alphaeus Dmonte, Roland Oruche, Marcos Zampieri, Prasad Calyam, Isabelle Augenstein
The large and ever-increasing amount of data available on the Internet coupled with the laborious task of manual claim and fact verification has sparked the interest in the development of automated claim verification systems.
no code implementations • 15 Aug 2024 • Deepak Pandita, Tharindu Cyril Weerasooriya, Sujan Dutta, Sarah K. Luger, Tharindu Ranasinghe, Ashiqur R. KhudaBukhsh, Marcos Zampieri, Christopher M. Homan
Human feedback is essential for building human-centered AI systems across domains where disagreement is prevalent, such as AI safety, content moderation, or sentiment analysis.
no code implementations • 26 Jul 2024 • Alphaeus Dmonte, Tejas Arya, Tharindu Ranasinghe, Marcos Zampieri
The prevalence of offensive content on the internet, encompassing hate speech and cyberbullying, is a pervasive issue worldwide.
1 code implementation • 11 May 2024 • Nishat Raihan, Dhiman Goswami, Antara Mahmud, Antonios Anastasopoulos, Marcos Zampieri
Code-mixing is a well-studied linguistic phenomenon that occurs when two or more languages are mixed in text or speech.
no code implementations • 24 Apr 2024 • Alphaeus Dmonte, Marcos Zampieri, Kevin Lybarger, Massimiliano Albanese, Genya Coulter
In this paper, we present a novel taxonomy for characterizing election-related claims.
no code implementations • 17 Apr 2024 • Marcos Zampieri, Damith Premasiri, Tharindu Ranasinghe
Since most social media data originates from end users, we propose a privacy preserving decentralized architecture for identifying offensive language online by introducing Federated Learning (FL) in the context of offensive language identification.
1 code implementation • 3 Apr 2024 • Nishat Raihan, Dhiman Goswami, Sadiya Sayara Chowdhury Puspo, Christian Newman, Tharindu Ranasinghe, Marcos Zampieri
Recent advances in AI, machine learning, and NLP have led to the development of a new generation of Large Language Models (LLMs) that are trained on massive amounts of data and often have trillions of parameters.
no code implementations • 22 Mar 2024 • Dhiman Goswami, Sadiya Sayara Chowdhury Puspo, Md Nishat Raihan, Al Nahian Bin Emran, Amrita Ganguly, Marcos Zampieri
This paper presents the MasonTigers entry to the SemEval-2024 Task 1 - Semantic Textual Relatedness.
no code implementations • 22 Mar 2024 • Md Nishat Raihan, Dhiman Goswami, Al Nahian Bin Emran, Sadiya Sayara Chowdhury Puspo, Amrita Ganguly, Marcos Zampieri
Our paper presents team MasonTigers submission to the SemEval-2024 Task 9 - which provides a dataset of puzzles for testing natural language understanding.
no code implementations • 22 Feb 2024 • Kai North, Tharindu Ranasinghe, Matthew Shardlow, Marcos Zampieri
We present MultiLS, the first LS framework that allows for the creation of a multi-task LS dataset.
no code implementations • 3 Feb 2024 • Amrita Ganguly, Al Nahian Bin Emran, Sadiya Sayara Chowdhury Puspo, Md Nishat Raihan, Dhiman Goswami, Marcos Zampieri
The automatic identification of offensive language such as hate speech is important to keep discussions civil in online communities.
1 code implementation • 26 Jan 2024 • Md Mushfiqur Rahman, Mohammad Sabik Irbaz, Kai North, Michelle S. Williams, Marcos Zampieri, Kevin Lybarger
Our innovative RLHF reward function surpassed existing RL text simplification reward functions in effectiveness.
no code implementations • 6 Dec 2023 • Tharindu Ranasinghe, Marcos Zampieri
Following a similar approach, we also train the first multilingual pre-trained model for offensive language identification using mT5 and evaluate its performance on a set of six different languages (German, Hindi, Korean, Marathi, Sinhala, and Spanish).
no code implementations • 25 Nov 2023 • Dhiman Goswami, Md Nishat Raihan, Sadiya Sayara Chowdhury Puspo, Marcos Zampieri
In this paper, we discuss the nlpBDpatriots entry to the shared task on Sentiment Analysis of Bangla Social Media Posts organized at the first workshop on Bangla Language Processing (BLP) co-located with EMNLP.
no code implementations • 25 Nov 2023 • Md Nishat Raihan, Dhiman Goswami, Sadiya Sayara Chowdhury Puspo, Marcos Zampieri
In this paper, we discuss the nlpBDpatriots entry to the shared task on Violence Inciting Text Detection (VITD) organized as part of the first workshop on Bangla Language Processing (BLP) co-located with EMNLP.
no code implementations • 25 Nov 2023 • Md Nishat Raihan, Umma Hani Tanmoy, Anika Binte Islam, Kai North, Tharindu Ranasinghe, Antonios Anastasopoulos, Marcos Zampieri
Identifying offensive content in social media is vital for creating safe online communities.
1 code implementation • 27 Oct 2023 • Md Nishat Raihan, Dhiman Goswami, Antara Mahmud, Antonios Anastasopoulos, Marcos Zampieri
Code-mixing is a well-studied linguistic phenomenon when two or more languages are mixed in text or speech.
1 code implementation • 27 Oct 2023 • Dhiman Goswami, Md Nishat Raihan, Antara Mahmud, Antonios Anastasopoulos, Marcos Zampieri
Code-mixing is a well-studied linguistic phenomenon when two or more languages are mixed in text or speech.
no code implementations • 31 May 2023 • Noëmi Aepli, Çağrı Çöltekin, Rob van der Goot, Tommi Jauhiainen, Mourhaf Kazzaz, Nikola Ljubešić, Kai North, Barbara Plank, Yves Scherrer, Marcos Zampieri
This report presents the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2023.
no code implementations • 19 May 2023 • Kai North, Tharindu Ranasinghe, Matthew Shardlow, Marcos Zampieri
To reflect these recent advances, we present a comprehensive survey of papers published between 2017 and 2023 on LS and its sub-tasks with a special focus on deep learning.
no code implementations • 8 Mar 2023 • Kai North, Marcos Zampieri, Matthew Shardlow
Finally, we include brief sections on applications of lexical complexity prediction, such as readability and text simplification, together with related studies on languages other than English.
1 code implementation • 2 Mar 2023 • Marcos Zampieri, Kai North, Tommi Jauhiainen, Mariano Felice, Neha Kumari, Nishant Nair, Yash Bangera
Research has shown that this is a problematic assumption, particularly in the case of very similar languages (e. g., Croatian and Serbian) and national language varieties (e. g., Brazilian and European Portuguese), where texts may contain no distinctive marker of the particular language or variety.
no code implementations • 6 Feb 2023 • Horacio Saggion, Sanja Štajner, Daniel Ferrés, Kim Cheng SHEANG, Matthew Shardlow, Kai North, Marcos Zampieri
We report findings of the TSAR-2022 shared task on multilingual lexical simplification, organized as part of the Workshop on Text Simplification, Accessibility, and Readability TSAR-2022 held in conjunction with EMNLP 2022.
2 code implementations • 29 Jan 2023 • Tharindu Cyril Weerasooriya, Sujan Dutta, Tharindu Ranasinghe, Marcos Zampieri, Christopher M. Homan, Ashiqur R. KhudaBukhsh
For (2), we introduce a first-of-its-kind dataset of vicarious offense.
1 code implementation • 1 Dec 2022 • Tharindu Ranasinghe, Isuri Anuradha, Damith Premasiri, Kanishka Silva, Hansi Hettiarachchi, Lasitha Uyangodage, Marcos Zampieri
SOLD is a manually annotated dataset containing 10, 000 posts from Twitter annotated as offensive and not offensive at both sentence-level and token-level, improving the explainability of the ML models.
1 code implementation • 22 Nov 2022 • Marcos Zampieri, Tharindu Ranasinghe, Mrinal Chaudhari, Saurabh Gaikwad, Prajwal Krishna, Mayuresh Nene, Shrunali Paygude
We introduce the Marathi Offensive Language Dataset v. 2. 0 or MOLD 2. 0 and present multiple experiments on this dataset.
no code implementations • 18 Nov 2022 • Tharindu Ranasinghe, Kai North, Damith Premasiri, Marcos Zampieri
The widespread of offensive content online has become a reason for great concern in recent years, motivating researchers to develop robust systems capable of identifying such content automatically.
no code implementations • COLING 2022 • Kai North, Marcos Zampieri, Tharindu Ranasinghe
To continue improving the performance of LS systems we introduce ALEXSIS-PT, a novel multi-candidate dataset for Brazilian Portuguese LS containing 9, 605 candidate substitutions for 387 complex words.
2 code implementations • 12 Sep 2022 • Sanja Stajner, Daniel Ferres, Matthew Shardlow, Kai North, Marcos Zampieri, Horacio Saggion
To showcase the usability of the dataset, we adapt two state-of-the-art lexical simplification systems with differing architectures (neural vs.\ non-neural) to all three languages (English, Spanish, and Brazilian Portuguese) and evaluate their performances on our new dataset.
no code implementations • 17 Dec 2021 • Thomas Mandl, Sandip Modha, Gautam Kishore Shahi, Hiren Madhu, Shrey Satapara, Prasenjit Majumder, Johannes Schaefer, Tharindu Ranasinghe, Marcos Zampieri, Durgesh Nandini, Amit Kumar Jaiswal
This paper presents the HASOC subtrack for English, Hindi, and Marathi.
no code implementations • Findings (EMNLP) 2021 • Diptanu Sarkar, Marcos Zampieri, Tharindu Ranasinghe, Alexander Ororbia
Transformer-based models such as BERT, XLNET, and XLM-R have achieved state-of-the-art performance across various NLP tasks including the identification of offensive language and hate speech, an important problem in social media.
1 code implementation • RANLP 2021 • Saurabh Gaikwad, Tharindu Ranasinghe, Marcos Zampieri, Christopher M. Homan
The widespread presence of offensive language on social media motivated the development of systems capable of recognizing such content automatically.
1 code implementation • 1 Sep 2021 • Christian D. Newman, Michael J. Decker, Reem S. AlSuhaibani, Anthony Peruma, Satyajit Mohapatra, Tejal Vishnoi, Marcos Zampieri, Mohamed W. Mkaouer, Timothy J. Sheldon, Emily Hill
We study the quality of the ensemble's annotations on five different types of identifier names: function, class, attribute, parameter, and declaration statement at the level of both individual words and full identifier names.
no code implementations • ACL 2021 • Farhad Akhbardeh, Cecilia Ovesdotter Alm, Marcos Zampieri, Travis Desell
In this paper we focus on the problem of technical issue classification by considering logbook datasets from the automotive, aviation, and facilities maintenance domains.
no code implementations • GermEval 2021 • Skye Morgan, Tharindu Ranasinghe, Marcos Zampieri
This paper addresses the identification of toxic, engaging, and fact-claiming comments on social media.
no code implementations • SEMEVAL 2021 • Matthew Shardlow, Richard Evans, Gustavo Henrique Paetzold, Marcos Zampieri
This paper presents the results and main findings of SemEval-2021 Task 1 - Lexical Complexity Prediction.
no code implementations • Findings (ACL) 2021 • Ana-Maria Bucur, Marcos Zampieri, Liviu P. Dinu
In this paper, we analyze the interplay between the use of offensive language and mental health.
no code implementations • SEMEVAL 2021 • Abhinandan Desai, Kai North, Marcos Zampieri, Christopher M. Homan
This paper describes team LCP-RIT's submission to the SemEval-2021 Task 1: Lexical Complexity Prediction (LCP).
no code implementations • 12 May 2021 • Tharindu Ranasinghe, Marcos Zampieri
We report results of 0. 8415 F1 macro for Bengali in TRAC-2 shared task, 0. 8532 F1 macro for Danish and 0. 8701 F1 macro for Greek in OffensEval 2020, 0. 8568 F1 macro for Hindi in HASOC 2019 shared task and 0. 7513 F1 macro for Spanish in in SemEval-2019 Task 5 (HatEval) showing that our approach compares favourably to the best systems submitted to recent shared tasks on these three languages.
1 code implementation • SEMEVAL 2021 • Tharindu Ranasinghe, Diptanu Sarkar, Marcos Zampieri, Alexander Ororbia
In recent years, the widespread use of social media has led to an increase in the generation of toxic and offensive content on online platforms.
no code implementations • 31 Mar 2021 • Allahsera Auguste Tapo, Michael Leventhal, Sarah Luger, Christopher M. Homan, Marcos Zampieri
Translating to and from low-resource languages is a challenge for machine translation (MT) systems due to a lack of parallel data.
no code implementations • EACL (VarDial) 2021 • Tommi Jauhiainen, Tharindu Ranasinghe, Marcos Zampieri
This paper describes the submissions by team HWR to the Dravidian Language Identification (DLI) shared task organized at VarDial 2021 workshop.
1 code implementation • NAACL 2021 • Tharindu Ranasinghe, Marcos Zampieri
The interest in offensive content identification in social media has grown substantially in recent years.
no code implementations • 17 Feb 2021 • Matthew Shardlow, Richard Evans, Marcos Zampieri
We develop a protocol for the annotation of lexical complexity and use this to annotate a new dataset, CompLex 2. 0.
Complex Word Identification Lexical Complexity Prediction +1
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Farhad Akhbardeh, Travis Desell, Marcos Zampieri
Processing maintenance logbook records is an important step in the development of predictive maintenance systems.
no code implementations • EMNLP 2020 • Loïc Barrault, Magdalena Biesialska, Ondřej Bojar, Marta R. Costa-jussà, Christian Federmann, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Matthias Huck, Eric Joanis, Tom Kocmi, Philipp Koehn, Chi-kiu Lo, Nikola Ljubešić, Christof Monz, Makoto Morishita, Masaaki Nagata, Toshiaki Nakazawa, Santanu Pal, Matt Post, Marcos Zampieri
In the news task, participants were asked to build machine translation systems for any of 11 language pairs, to be evaluated on test sets consisting mainly of news stories.
no code implementations • loresmt (AACL) 2020 • Allahsera Auguste Tapo, Bakary Coulibaly, Sébastien Diarra, Christopher Homan, Julia Kreutzer, Sarah Luger, Arthur Nagashima, Marcos Zampieri, Michael Leventhal
Low-resource languages present unique challenges to (neural) machine translation.
no code implementations • 1 Nov 2020 • Tharindu Ranasinghe, Sarthak Gupte, Marcos Zampieri, Ifeoma Nwogu
This paper describes the WLV-RIT entry to the Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC) shared task 2020.
1 code implementation • EMNLP 2020 • Tharindu Ranasinghe, Marcos Zampieri
In this paper, we take advantage of English data available by applying cross-lingual contextual word embeddings and transfer learning to make predictions in languages with less resources.
no code implementations • SEMEVAL 2020 • Marcos Zampieri, Preslav Nakov, Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Hamdy Mubarak, Leon Derczynski, Zeses Pitenis, Çağrı Çöltekin
We present the results and main findings of SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval 2020).
no code implementations • COLING 2020 • Farhad Akhbardeh, Travis Desell, Marcos Zampieri
Furthermore, it provides a way to encourage discussion on and sharing of new datasets and tools for logbook data analysis.
no code implementations • LREC 2020 • Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, Marcos Zampieri
The task consisted of two sub-tasks - aggression identification (sub-task A) and gendered identification (sub-task B) - in three languages - Bangla, Hindi and English.
no code implementations • LREC 2020 • Matthew Shardlow, Michael Cooper, Marcos Zampieri
Predicting which words are considered hard to understand for a given target population is a vital step in many NLP applications such astext simplification.
no code implementations • Findings (ACL) 2021 • Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Marcos Zampieri, Preslav Nakov
The widespread use of offensive content in social media has led to an abundance of research in detecting language such as hate speech, cyberbullying, and cyber-aggression.
no code implementations • 31 Mar 2020 • Michael Leventhal, Allahsera Tapo, Sarah Luger, Marcos Zampieri, Christopher M. Homan
We present novel methods for assessing the quality of human-translated aligned texts for learning machine translation models of under-resourced languages.
1 code implementation • 16 Mar 2020 • Matthew Shardlow, Michael Cooper, Marcos Zampieri
With a few exceptions, previous studies have approached the task as a binary classification task in which systems predict a complexity value (complex vs. non-complex) for a set of target words in a text.
1 code implementation • LREC 2020 • Zeses Pitenis, Marcos Zampieri, Tharindu Ranasinghe
As offensive language has become a rising issue for online communities and social media platforms, researchers have been investigating ways of coping with abusive content and developing systems to detect its different types: cyberbullying, hate speech, aggression, etc.
no code implementations • 16 Aug 2019 • Santanu Pal, Marcos Zampieri, Josef van Genabith
The first edition of this shared task featured data from three pairs of similar languages: Czech and Polish, Hindi and Nepali, and Portuguese and Spanish.
no code implementations • WS 2019 • Mihaela Vela, Santanu Pal, Marcos Zampieri, Sudip Kumar Naskar, Josef van Genabith
User feedback revealed that the users preferred using CATaLog Online over existing CAT tools in some respects, especially by selecting the output of the MT system and taking advantage of the color scheme for TM suggestions.
no code implementations • WS 2019 • Lo{\"\i}c Barrault, Ond{\v{r}}ej Bojar, Marta R. Costa-juss{\`a}, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, Shervin Malmasi, Christof Monz, Mathias M{\"u}ller, Santanu Pal, Matt Post, Marcos Zampieri
This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019.
no code implementations • WS 2019 • Santanu Pal, Marcos Zampieri, Josef van Genabith
The first edition of this shared task featured data from three pairs of similar languages: Czech and Polish, Hindi and Nepali, and Portuguese and Spanish.
no code implementations • WS 2019 • Marcos Zampieri, Shervin Malmasi, Yves Scherrer, Tanja Samard{\v{z}}i{\'c}, Francis Tyers, Miikka Silfverberg, Natalia Klyueva, Tung-Le Pan, Chu-Ren Huang, Radu Tudor Ionescu, Andrei M. Butnaru, Tommi Jauhiainen
In this paper, we present the findings of the Third VarDial Evaluation Campaign organized as part of the sixth edition of the workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with NAACL 2019.
no code implementations • WS 2019 • Gustavo Henrique Paetzold, Marcos Zampieri
This paper presents methods to discriminate between languages and dialects written in Cuneiform script, one of the first writing systems in the world.
no code implementations • SEMEVAL 2019 • Gustavo Henrique Paetzold, Shervin Malmasi, Marcos Zampieri
We tested our approach on the SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter (HatEval) shared task dataset.
2 code implementations • SEMEVAL 2019 • Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar
We present the results and the main findings of SemEval-2019 Task 6 on Identifying and Categorizing Offensive Language in Social Media (OffensEval).
1 code implementation • NAACL 2019 • Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar
In particular, we model the task hierarchically, identifying the type and the target of offensive messages in social media.
no code implementations • ALTA 2018 • Fernando Benites, Shervin Malmasi, Marcos Zampieri
We present methods for the automatic classification of patent applications using an annotated dataset provided by the organizers of the ALTA 2018 shared task - Classifying Patent Applications.
no code implementations • 14 Aug 2018 • Liviu P. Dinu, Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi
In this paper we present ensemble-based systems for dialect and language variety identification using the datasets made available by the organizers of the VarDial Evaluation Campaign 2018.
no code implementations • COLING 2018 • Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Ahmed Ali, Suwon Shon, James Glass, Yves Scherrer, Tanja Samard{\v{z}}i{\'c}, Nikola Ljube{\v{s}}i{\'c}, J{\"o}rg Tiedemann, Chris van der Lee, Stefan Grondelaers, Nelleke Oostdijk, Dirk Speelman, Antal Van den Bosch, Ritesh Kumar, Bornini Lahiri, Mayank Jain
We present the results and the findings of the Second VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects.
no code implementations • COLING 2018 • Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, Marcos Zampieri
For this task, the participants were provided with a dataset of 15, 000 aggression-annotated Facebook Posts and Comments each in Hindi (in both Roman and Devanagari script) and English for training and validation.
no code implementations • COLING 2018 • Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi, Santanu Pal, Liviu P. Dinu
In this paper we present a system based on SVM ensembles trained on characters and words to discriminate between five similar languages of the Indo-Aryan family: Hindi, Braj Bhasha, Awadhi, Bhojpuri, and Magahi.
no code implementations • COLING 2018 • Marta R. Costa-jussà, Marcos Zampieri, Santanu Pal
In this paper we present the first neural-based machine translation system trained to translate between standard national varieties of the same language.
no code implementations • WS 2018 • Iria del Río, Marcos Zampieri, Shervin Malmasi
In this paper we present NLI-PT, the first Portuguese dataset compiled for Native Language Identification (NLI), the task of identifying an author's first language based on their second language writing.
no code implementations • WS 2018 • Seid Muhie Yimam, Chris Biemann, Shervin Malmasi, Gustavo H. Paetzold, Lucia Specia, Sanja Štajner, Anaïs Tack, Marcos Zampieri
We report the findings of the second Complex Word Identification (CWI) shared task organized as part of the BEA workshop co-located with NAACL-HLT'2018.
1 code implementation • 22 Apr 2018 • Tommi Jauhiainen, Marco Lui, Marcos Zampieri, Timothy Baldwin, Krister Lindén
Language identification (LI) is the problem of determining the natural language that a document or part thereof is written in.
no code implementations • 14 Mar 2018 • Shervin Malmasi, Marcos Zampieri
In this study we approach the problem of distinguishing general profanity from hate speech in social media, something which has not been widely considered.
1 code implementation • LREC 2018 • Diego Moussallem, Mohamed Ahmed Sherif, Diego Esteves, Marcos Zampieri, Axel-Cyrille Ngonga Ngomo
In this paper, we describe the LIDIOMS data set, a multilingual RDF representation of idioms currently containing five languages: English, German, Italian, Portuguese, and Russian.
1 code implementation • LREC 2018 • Diego Moussallem, Thiago castro Ferreira, Marcos Zampieri, Maria Claudia Cavalcanti, Geraldo Xexéo, Mariana Neves, Axel-Cyrille Ngonga Ngomo
The generation of natural language from Resource Description Framework (RDF) data has recently gained significant attention due to the continuous growth of Linked Data.
1 code implementation • RANLP 2017 • Shervin Malmasi, Marcos Zampieri
In this paper we examine methods to detect hate speech in social media, while distinguishing this from general profanity.
no code implementations • 25 Oct 2017 • Octavia-Maria Sulea, Marcos Zampieri, Shervin Malmasi, Mihaela Vela, Liviu P. Dinu, Josef van Genabith
In this paper, we investigate the application of text classification methods to support law professionals.
no code implementations • WS 2017 • Marcos Zampieri, Shervin Malmasi, Gustavo Paetzold, Lucia Specia
This paper revisits the problem of complex word identification (CWI) following up the SemEval CWI shared task.
no code implementations • 2 Oct 2017 • Marcos Zampieri
This technical report describes the framework used for processing three large Portuguese corpora.
no code implementations • 13 Sep 2017 • Ekaterina Lapshninova-Koltunski, Marcos Zampieri
In this paper we describe the use of text classification methods to investigate genre and method variation in an English - German translation corpus.
no code implementations • RANLP 2017 • Octavia-Maria Sulea, Marcos Zampieri, Mihaela Vela, Josef van Genabith
In this paper, we investigate the application of text classification methods to predict the law area and the decision of cases judged by the French Supreme Court.
no code implementations • WS 2017 • Marcos Zampieri, Alina Maria Ciobanu, Liviu P. Dinu
This paper presents an ensemble system combining the output of multiple SVM classifiers to native language identification (NLI).
no code implementations • 3 Jul 2017 • Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi, Liviu P. Dinu
This paper presents a computational approach to author profiling taking gender and language variety into account.
no code implementations • WS 2017 • Shervin Malmasi, Marcos Zampieri
This paper presents three systems submitted to the German Dialect Identification (GDI) task at the VarDial Evaluation Campaign 2017.
no code implementations • WS 2017 • Marcos Zampieri, Shervin Malmasi, Nikola Ljube{\v{s}}i{\'c}, Preslav Nakov, Ahmed Ali, J{\"o}rg Tiedemann, Yves Scherrer, No{\"e}mi Aepli
We present the results of the VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part of the fourth edition of the VarDial workshop at EACL{'}2017.
no code implementations • WS 2017 • Shervin Malmasi, Marcos Zampieri
This paper presents the systems submitted by the MAZA team to the Arabic Dialect Identification (ADI) shared task at the VarDial Evaluation Campaign 2017.
no code implementations • WS 2016 • Shervin Malmasi, Marcos Zampieri
In this paper we describe a system developed to identify a set of four regional Arabic dialects (Egyptian, Gulf, Levantine, North African) and Modern Standard Arabic (MSA) in a transcribed speech corpus.
no code implementations • COLING 2016 • Santanu Pal, Sudip Kumar Naskar, Marcos Zampieri, Tapas Nayak, Josef van Genabith
We present a free web-based CAT tool called CATaLog Online which provides a novel and user-friendly online CAT environment for post-editors/translators.
no code implementations • WS 2016 • Shervin Malmasi, Marcos Zampieri, Nikola Ljube{\v{s}}i{\'c}, Preslav Nakov, Ahmed Ali, J{\"o}rg Tiedemann
We present the results of the third edition of the Discriminating between Similar Languages (DSL) shared task, which was organized as part of the VarDial{'}2016 workshop at COLING{'}2016.
no code implementations • LREC 2016 • Cyril Goutte, Serge Léger, Shervin Malmasi, Marcos Zampieri
We present an analysis of the performance of machine learning classifiers on discriminating between similar languages and language varieties.
no code implementations • LREC 2016 • Marcos Zampieri, Shervin Malmasi, Mark Dras
This paper presents a number of experiments to model changes in a historical Portuguese corpus composed of literary texts for the purpose of temporal text classification.
no code implementations • WS 2016 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Martin Popel, Matt Post, Raphael Rubino, Carolina Scarton, Lucia Specia, Marco Turchi, Karin Verspoor, Marcos Zampieri
no code implementations • LREC 2016 • Santanu Pal, Marcos Zampieri, Sudip Kumar Naskar, Tapas Nayak, Mihaela Vela, Josef van Genabith
The tool features a number of editing and log functions similar to the desktop version of CATaLog enhanced with several new features that we describe in detail in this paper.
no code implementations • LREC 2014 • Marcos Zampieri, Binyam Gebre
This paper presents VarClass, an open-source tool for language identification available both to be downloaded as well as through a graphical user-friendly interface.