no code implementations • SemEval (NAACL) 2022 • Shervin Malmasi, Anjie Fang, Besnik Fetahu, Sudipta Kar, Oleg Rokhlenko
Divided into 13 tracks, the task focused on methods to identify complex named entities (like names of movies, products and groups) in 11 languages in both monolingual and multi-lingual scenarios.
no code implementations • NAACL 2022 • Besnik Fetahu, Anjie Fang, Oleg Rokhlenko, Shervin Malmasi
Named entity recognition (NER) in a real-world setting remains challenging and is impacted by factors like text genre, corpus quality, and data availability.
Cross-Domain Named Entity Recognition Cross-Lingual Transfer +4
no code implementations • COLING 2022 • Jason Ingyu Choi, Saar Kuzi, Nikhita Vedula, Jie Zhao, Giuseppe Castellucci, Marcus Collins, Shervin Malmasi, Oleg Rokhlenko, Eugene Agichtein
Conversational Task Assistants (CTAs) are conversational agents whose goal is to help humans perform real-world tasks.
no code implementations • 17 Oct 2024 • Zhiyu Chen, Jason Choi, Besnik Fetahu, Shervin Malmasi
We consider the task of identifying High Consideration (HC) queries.
no code implementations • 12 Jul 2024 • Saar Kuzi, Shervin Malmasi
Consumers on a shopping mission often leverage both product search and information seeking systems, such as web search engines and Question Answering (QA) systems, in an iterative process to improve their understanding of available products and reach a purchase decision.
no code implementations • 7 Jun 2024 • Lütfi Kerem Senel, Besnik Fetahu, Davis Yoshida, Zhiyu Chen, Giuseppe Castellucci, Nikhita Vedula, Jason Choi, Shervin Malmasi
Recommender systems are widely used to suggest engaging content, and Large Language Models (LLMs) have given rise to generative recommenders.
no code implementations • 2 May 2024 • Nikhita Vedula, Oleg Rokhlenko, Shervin Malmasi
Digital assistants have become ubiquitous in e-commerce applications, following the recent advancements in Information Retrieval (IR), Natural Language Processing (NLP) and Generative Artificial Intelligence (AI).
no code implementations • 9 Apr 2024 • Besnik Fetahu, Nachshon Cohen, Elad Haramaty, Liane Lewin-Eytan, Oleg Rokhlenko, Shervin Malmasi
We focus on the domain of e-commerce, namely in identifying Shopping Product Questions (SPQs), where the user asking a product-related question may have an underlying shopping need.
1 code implementation • 9 Apr 2024 • Nikhita Vedula, Giuseppe Castellucci, Eugene Agichtein, Oleg Rokhlenko, Shervin Malmasi
Conversational Task Assistants (CTAs) guide users in performing a multitude of activities, such as making recipes.
no code implementations • 3 Apr 2024 • Parth Patwa, Simone Filice, Zhiyu Chen, Giuseppe Castellucci, Oleg Rokhlenko, Shervin Malmasi
Large Language Models (LLMs) operating in 0-shot or few-shot settings achieve competitive results in Text Classification tasks.
no code implementations • 18 Jan 2024 • Lingbo Mo, Besnik Fetahu, Oleg Rokhlenko, Shervin Malmasi
Yes/No or polar questions represent one of the main linguistic question categories.
no code implementations • 18 Jan 2024 • Besnik Fetahu, Tejas Mehta, Qun Song, Nikhita Vedula, Oleg Rokhlenko, Shervin Malmasi
E-commerce customers frequently seek detailed product information for purchase decisions, commonly contacting sellers directly with extended queries.
no code implementations • 25 Oct 2023 • Besnik Fetahu, Zhiyu Chen, Oleg Rokhlenko, Shervin Malmasi
E-commerce product catalogs contain billions of items.
no code implementations • 25 Oct 2023 • Besnik Fetahu, Pedro Faustini, Giuseppe Castellucci, Anjie Fang, Oleg Rokhlenko, Shervin Malmasi
Using a new dataset of 6681 input questions and human written hints, we evaluated the models with automatic metrics and human evaluation.
no code implementations • 20 Oct 2023 • Besnik Fetahu, Zhiyu Chen, Sudipta Kar, Oleg Rokhlenko, Shervin Malmasi
We present MULTICONER V2, a dataset for fine-grained Named Entity Recognition covering 33 entity classes across 12 languages, in both monolingual and multilingual settings.
no code implementations • 6 Jun 2023 • Zhiyu Chen, Jason Choi, Besnik Fetahu, Oleg Rokhlenko, Shervin Malmasi
We propose an intent-aware FAQ retrieval system consisting of (1) an intent classifier that predicts when a user's information need can be answered by an FAQ; (2) a reformulation model that rewrites a query into a natural question.
no code implementations • 27 May 2023 • Pedro Faustini, Zhiyu Chen, Besnik Fetahu, Oleg Rokhlenko, Shervin Malmasi
Spoken Question Answering (QA) is a key feature of voice assistants, usually backed by multiple QA systems.
1 code implementation • 24 May 2023 • Zhuoer Wang, Marcus Collins, Nikhita Vedula, Simone Filice, Shervin Malmasi, Oleg Rokhlenko
Cycle training uses two models which are inverses of each other: one that generates text from structured data, and one which generates the structured data from natural language text.
no code implementations • 11 May 2023 • Besnik Fetahu, Sudipta Kar, Zhiyu Chen, Oleg Rokhlenko, Shervin Malmasi
The task highlights the need for future research on improving NER robustness on noisy data containing complex entities.
Multilingual Named Entity Recognition named-entity-recognition +3
no code implementations • 22 Feb 2023 • Sudipta Kar, Giuseppe Castellucci, Simone Filice, Shervin Malmasi, Oleg Rokhlenko
In this paper, we approach the problem of incrementally expanding MTL models' capability to solve new tasks over time by distilling the knowledge of an already trained model on n tasks into a new one for solving n+1 tasks.
no code implementations • 27 Oct 2022 • Zhiyu Chen, Jie Zhao, Anjie Fang, Besnik Fetahu, Oleg Rokhlenko, Shervin Malmasi
Furthermore, human evaluation shows that our method can generate more accurate and detailed rewrites when compared to human annotations.
no code implementations • COLING 2022 • Shervin Malmasi, Anjie Fang, Besnik Fetahu, Sudipta Kar, Oleg Rokhlenko
We present MultiCoNER, a large multilingual dataset for Named Entity Recognition that covers 3 domains (Wiki sentences, questions, and search queries) across 11 languages, as well as multilingual and code-mixing subsets.
no code implementations • NAACL 2021 • Tao Meng, Anjie Fang, Oleg Rokhlenko, Shervin Malmasi
We propose GEMNET, a novel approach for gazetteer knowledge integration, including (1) a flexible Contextual Gazetteer Representation (CGR) encoder that can be fused with any word-level model; and (2) a Mixture-of- Experts gating network that overcomes the feature overuse issue by learning to conditionally combine the context and gazetteer features, instead of assigning them fixed weights.
no code implementations • LREC 2020 • Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, Marcos Zampieri
The task consisted of two sub-tasks - aggression identification (sub-task A) and gendered identification (sub-task B) - in three languages - Bangla, Hindi and English.
no code implementations • WS 2019 • Lo{\"\i}c Barrault, Ond{\v{r}}ej Bojar, Marta R. Costa-juss{\`a}, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, Shervin Malmasi, Christof Monz, Mathias M{\"u}ller, Santanu Pal, Matt Post, Marcos Zampieri
This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019.
no code implementations • WS 2019 • Marcos Zampieri, Shervin Malmasi, Yves Scherrer, Tanja Samard{\v{z}}i{\'c}, Francis Tyers, Miikka Silfverberg, Natalia Klyueva, Tung-Le Pan, Chu-Ren Huang, Radu Tudor Ionescu, Andrei M. Butnaru, Tommi Jauhiainen
In this paper, we present the findings of the Third VarDial Evaluation Campaign organized as part of the sixth edition of the workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with NAACL 2019.
no code implementations • SEMEVAL 2019 • Gustavo Henrique Paetzold, Shervin Malmasi, Marcos Zampieri
We tested our approach on the SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter (HatEval) shared task dataset.
2 code implementations • SEMEVAL 2019 • Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar
We present the results and the main findings of SemEval-2019 Task 6 on Identifying and Categorizing Offensive Language in Social Media (OffensEval).
1 code implementation • NAACL 2019 • Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar
In particular, we model the task hierarchically, identifying the type and the target of offensive messages in social media.
no code implementations • ALTA 2018 • Fernando Benites, Shervin Malmasi, Marcos Zampieri
We present methods for the automatic classification of patent applications using an annotated dataset provided by the organizers of the ALTA 2018 shared task - Classifying Patent Applications.
no code implementations • CL 2018 • Shervin Malmasi, Mark Dras
Ensemble methods using multiple classifiers have proven to be among the most successful approaches for the task of Native Language Identification (NLI), achieving the current state of the art.
no code implementations • 14 Aug 2018 • Liviu P. Dinu, Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi
In this paper we present ensemble-based systems for dialect and language variety identification using the datasets made available by the organizers of the VarDial Evaluation Campaign 2018.
no code implementations • COLING 2018 • Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Ahmed Ali, Suwon Shon, James Glass, Yves Scherrer, Tanja Samard{\v{z}}i{\'c}, Nikola Ljube{\v{s}}i{\'c}, J{\"o}rg Tiedemann, Chris van der Lee, Stefan Grondelaers, Nelleke Oostdijk, Dirk Speelman, Antal Van den Bosch, Ritesh Kumar, Bornini Lahiri, Mayank Jain
We present the results and the findings of the Second VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects.
no code implementations • COLING 2018 • Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, Marcos Zampieri
For this task, the participants were provided with a dataset of 15, 000 aggression-annotated Facebook Posts and Comments each in Hindi (in both Roman and Devanagari script) and English for training and validation.
no code implementations • COLING 2018 • Alina Maria Ciobanu, Shervin Malmasi, Liviu P. Dinu
In this paper we present the GDI_classification entry to the second German Dialect Identification (GDI) shared task organized within the scope of the VarDial Evaluation Campaign 2018.
no code implementations • COLING 2018 • Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi, Santanu Pal, Liviu P. Dinu
In this paper we present a system based on SVM ensembles trained on characters and words to discriminate between five similar languages of the Indo-Aryan family: Hindi, Braj Bhasha, Awadhi, Bhojpuri, and Magahi.
no code implementations • WS 2018 • Iria del Río, Marcos Zampieri, Shervin Malmasi
In this paper we present NLI-PT, the first Portuguese dataset compiled for Native Language Identification (NLI), the task of identifying an author's first language based on their second language writing.
no code implementations • WS 2018 • Seid Muhie Yimam, Chris Biemann, Shervin Malmasi, Gustavo H. Paetzold, Lucia Specia, Sanja Štajner, Anaïs Tack, Marcos Zampieri
We report the findings of the second Complex Word Identification (CWI) shared task organized as part of the BEA workshop co-located with NAACL-HLT'2018.
no code implementations • 14 Mar 2018 • Shervin Malmasi, Marcos Zampieri
In this study we approach the problem of distinguishing general profanity from hate speech in social media, something which has not been widely considered.
1 code implementation • RANLP 2017 • Shervin Malmasi, Marcos Zampieri
In this paper we examine methods to detect hate speech in social media, while distinguishing this from general profanity.
no code implementations • 25 Oct 2017 • Octavia-Maria Sulea, Marcos Zampieri, Shervin Malmasi, Mihaela Vela, Liviu P. Dinu, Josef van Genabith
In this paper, we investigate the application of text classification methods to support law professionals.
no code implementations • WS 2017 • Marcos Zampieri, Shervin Malmasi, Gustavo Paetzold, Lucia Specia
This paper revisits the problem of complex word identification (CWI) following up the SemEval CWI shared task.
no code implementations • WS 2017 • Shervin Malmasi, Keelan Evanini, Aoife Cahill, Joel Tetreault, Robert Pugh, Christopher Hamill, Diane Napolitano, Yao Qian
We believe this makes for a more interesting shared task while building on the methods and results from the previous two shared tasks.
no code implementations • 16 Jul 2017 • Shervin Malmasi
We present the first open-set language identification experiments using one-class classification.
no code implementations • 3 Jul 2017 • Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi, Liviu P. Dinu
This paper presents a computational approach to author profiling taking gender and language variety into account.
no code implementations • ACL 2017 • Shervin Malmasi, Mark Dras, Mark Johnson, Lan Du, Magdalena Wolska
Most work on segmenting text does so on the basis of topic changes, but it can be of interest to segment by other, stylistically expressed characteristics such as change of authorship or native language.
no code implementations • ACL 2017 • Shervin Malmasi, Mark Dras
We evaluate feature hashing for language identification (LID), a method not previously used for this task.
no code implementations • WS 2017 • Marcos Zampieri, Shervin Malmasi, Nikola Ljube{\v{s}}i{\'c}, Preslav Nakov, Ahmed Ali, J{\"o}rg Tiedemann, Yves Scherrer, No{\"e}mi Aepli
We present the results of the VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part of the fourth edition of the VarDial workshop at EACL{'}2017.
no code implementations • WS 2017 • Shervin Malmasi, Marcos Zampieri
This paper presents three systems submitted to the German Dialect Identification (GDI) task at the VarDial Evaluation Campaign 2017.
no code implementations • WS 2017 • Shervin Malmasi, Marcos Zampieri
This paper presents the systems submitted by the MAZA team to the Arabic Dialect Identification (ADI) shared task at the VarDial Evaluation Campaign 2017.
no code implementations • 19 Mar 2017 • Shervin Malmasi, Mark Dras
Ensemble methods using multiple classifiers have proven to be the most successful approach for the task of Native Language Identification (NLI), achieving the current state of the art.
no code implementations • WS 2016 • Shervin Malmasi
In this study we apply classification methods for detecting subdialectal differences in Sorani Kurdish texts produced in different regions, namely Iran and Iraq.
no code implementations • WS 2016 • Shervin Malmasi, Marcos Zampieri
In this paper we describe a system developed to identify a set of four regional Arabic dialects (Egyptian, Gulf, Levantine, North African) and Modern Standard Arabic (MSA) in a transcribed speech corpus.
no code implementations • WS 2016 • Shervin Malmasi, Marcos Zampieri, Nikola Ljube{\v{s}}i{\'c}, Preslav Nakov, Ahmed Ali, J{\"o}rg Tiedemann
We present the results of the third edition of the Discriminating between Similar Languages (DSL) shared task, which was organized as part of the VarDial{'}2016 workshop at COLING{'}2016.
no code implementations • LREC 2016 • Marcos Zampieri, Shervin Malmasi, Mark Dras
This paper presents a number of experiments to model changes in a historical Portuguese corpus composed of literary texts for the purpose of temporal text classification.
no code implementations • LREC 2016 • Cyril Goutte, Serge Léger, Shervin Malmasi, Marcos Zampieri
We present an analysis of the performance of machine learning classifiers on discriminating between similar languages and language varieties.