Search Results for author: Shervin Malmasi

Found 70 papers, 5 papers with code

SemEval-2022 Task 11: Multilingual Complex Named Entity Recognition (MultiCoNER)

no code implementations • SemEval (NAACL) 2022 • Shervin Malmasi, Anjie Fang, Besnik Fetahu, Sudipta Kar, Oleg Rokhlenko

Divided into 13 tracks, the task focused on methods to identify complex named entities (like names of movies, products and groups) in 11 languages in both monolingual and multi-lingual scenarios.

named-entity-recognition Named Entity Recognition +1

Paper
Add Code

Dynamic Gazetteer Integration in Multilingual Models for Cross-Lingual and Cross-Domain Named Entity Recognition

no code implementations • NAACL 2022 • Besnik Fetahu, Anjie Fang, Oleg Rokhlenko, Shervin Malmasi

Named entity recognition (NER) in a real-world setting remains challenging and is impacted by factors like text genre, corpus quality, and data availability.

Cross-Domain Named Entity Recognition Cross-Lingual Transfer +4

Paper
Add Code

Wizard of Tasks: A Novel Conversational Dataset for Solving Real-World Tasks in Conversational Settings

no code implementations • COLING 2022 • Jason Ingyu Choi, Saar Kuzi, Nikhita Vedula, Jie Zhao, Giuseppe Castellucci, Marcus Collins, Shervin Malmasi, Oleg Rokhlenko, Eugene Agichtein

Conversational Task Assistants (CTAs) are conversational agents whose goal is to help humans perform real-world tasks.

abstractive question answering intent-classification +2

Paper
Add Code

Leveraging Interesting Facts to Enhance User Engagement with Conversational Interfaces

1 code implementation • 9 Apr 2024 • Nikhita Vedula, Giuseppe Castellucci, Eugene Agichtein, Oleg Rokhlenko, Shervin Malmasi

Conversational Task Assistants (CTAs) guide users in performing a multitude of activities, such as making recipes.

Paper
Code

Identifying Shopping Intent in Product QA for Proactive Recommendations

no code implementations • 9 Apr 2024 • Besnik Fetahu, Nachshon Cohen, Elad Haramaty, Liane Lewin-Eytan, Oleg Rokhlenko, Shervin Malmasi

We focus on the domain of e-commerce, namely in identifying Shopping Product Questions (SPQs), where the user asking a product-related question may have an underlying shopping need.

Friction Question Answering

Paper
Add Code

Enhancing Low-Resource LLMs Classification with PEFT and Synthetic Data

no code implementations • 3 Apr 2024 • Parth Patwa, Simone Filice, Zhiyu Chen, Giuseppe Castellucci, Oleg Rokhlenko, Shervin Malmasi

Large Language Models (LLMs) operating in 0-shot or few-shot settings achieve competitive results in Text Classification tasks.

In-Context Learning text-classification +1

Paper
Add Code

Controllable Decontextualization of Yes/No Question and Answers into Factual Statements

no code implementations • 18 Jan 2024 • Lingbo Mo, Besnik Fetahu, Oleg Rokhlenko, Shervin Malmasi

Yes/No or polar questions represent one of the main linguistic question categories.

Negation

Paper
Add Code

Instant Answering in E-Commerce Buyer-Seller Messaging using Message-to-Question Reformulation

no code implementations • 18 Jan 2024 • Besnik Fetahu, Tejas Mehta, Qun Song, Nikhita Vedula, Oleg Rokhlenko, Shervin Malmasi

E-commerce customers frequently seek detailed product information for purchase decisions, commonly contacting sellers directly with extended queries.

Question Answering

Paper
Add Code

Follow-on Question Suggestion via Voice Hints for Voice Assistants

no code implementations • 25 Oct 2023 • Besnik Fetahu, Pedro Faustini, Giuseppe Castellucci, Anjie Fang, Oleg Rokhlenko, Shervin Malmasi

Using a new dataset of 6681 input questions and human written hints, we evaluated the models with automatic metrics and human evaluation.

Paper
Add Code

InstructPTS: Instruction-Tuning LLMs for Product Title Summarization

no code implementations • 25 Oct 2023 • Besnik Fetahu, Zhiyu Chen, Oleg Rokhlenko, Shervin Malmasi

E-commerce product catalogs contain billions of items.

Retrieval

Paper
Add Code

MultiCoNER v2: a Large Multilingual dataset for Fine-grained and Noisy Named Entity Recognition

no code implementations • 20 Oct 2023 • Besnik Fetahu, Zhiyu Chen, Sudipta Kar, Oleg Rokhlenko, Shervin Malmasi

We present MULTICONER V2, a dataset for fine-grained Named Entity Recognition covering 33 entity classes across 12 languages, in both monolingual and multilingual settings.

named-entity-recognition Named Entity Recognition +2

Paper
Add Code

Generate-then-Retrieve: Intent-Aware FAQ Retrieval in Product Search

no code implementations • 6 Jun 2023 • Zhiyu Chen, Jason Choi, Besnik Fetahu, Oleg Rokhlenko, Shervin Malmasi

We propose an intent-aware FAQ retrieval system consisting of (1) an intent classifier that predicts when a user's information need can be answered by an FAQ; (2) a reformulation model that rewrites a query into a natural question.

Retrieval

Paper
Add Code

Answering Unanswered Questions through Semantic Reformulations in Spoken QA

no code implementations • 27 May 2023 • Pedro Faustini, Zhiyu Chen, Besnik Fetahu, Oleg Rokhlenko, Shervin Malmasi

Spoken Question Answering (QA) is a key feature of voice assistants, usually backed by multiple QA systems.

Question Answering Specificity

Paper
Add Code

Faithful Low-Resource Data-to-Text Generation through Cycle Training

1 code implementation • 24 May 2023 • Zhuoer Wang, Marcus Collins, Nikhita Vedula, Simone Filice, Shervin Malmasi, Oleg Rokhlenko

Cycle training uses two models which are inverses of each other: one that generates text from structured data, and one which generates the structured data from natural language text.

Data-to-Text Generation

Paper
Code

SemEval-2023 Task 2: Fine-grained Multilingual Named Entity Recognition (MultiCoNER 2)

no code implementations • 11 May 2023 • Besnik Fetahu, Sudipta Kar, Zhiyu Chen, Oleg Rokhlenko, Shervin Malmasi

The task highlights the need for future research on improving NER robustness on noisy data containing complex entities.

Multilingual Named Entity Recognition named-entity-recognition +3

Paper
Add Code

Preventing Catastrophic Forgetting in Continual Learning of New Natural Language Tasks

no code implementations • 22 Feb 2023 • Sudipta Kar, Giuseppe Castellucci, Simone Filice, Shervin Malmasi, Oleg Rokhlenko

In this paper, we approach the problem of incrementally expanding MTL models' capability to solve new tasks over time by distilling the knowledge of an already trained model on n tasks into a new one for solving n+1 tasks.

Continual Learning Multi-Task Learning

Paper
Add Code

Reinforced Question Rewriting for Conversational Question Answering

no code implementations • 27 Oct 2022 • Zhiyu Chen, Jie Zhao, Anjie Fang, Besnik Fetahu, Oleg Rokhlenko, Shervin Malmasi

Furthermore, human evaluation shows that our method can generate more accurate and detailed rewrites when compared to human annotations.

Question Rewriting Retrieval

Paper
Add Code

MultiCoNER: A Large-scale Multilingual dataset for Complex Named Entity Recognition

no code implementations • COLING 2022 • Shervin Malmasi, Anjie Fang, Besnik Fetahu, Sudipta Kar, Oleg Rokhlenko

We present MultiCoNER, a large multilingual dataset for Named Entity Recognition that covers 3 domains (Wiki sentences, questions, and search queries) across 11 languages, as well as multilingual and code-mixing subsets.

Machine Translation named-entity-recognition +3

Paper
Add Code

GEMNET: Effective Gated Gazetteer Representations for Recognizing Complex Entities in Low-context Input

no code implementations • NAACL 2021 • Tao Meng, Anjie Fang, Oleg Rokhlenko, Shervin Malmasi

We propose GEMNET, a novel approach for gazetteer knowledge integration, including (1) a flexible Contextual Gazetteer Representation (CGR) encoder that can be fused with any word-level model; and (2) a Mixture-of- Experts gating network that overcomes the feature overuse issue by learning to conditionally combine the context and gazetteer features, instead of assigning them fixed weights.

named-entity-recognition Named Entity Recognition +1

Paper
Add Code

Evaluating Aggression Identification in Social Media

no code implementations • LREC 2020 • Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, Marcos Zampieri

The task consisted of two sub-tasks - aggression identification (sub-task A) and gendered identification (sub-task B) - in three languages - Bangla, Hindi and English.

Aggression Identification

Paper
Add Code

Findings of the 2019 Conference on Machine Translation (WMT19)

no code implementations • WS 2019 • Lo{\"\i}c Barrault, Ond{\v{r}}ej Bojar, Marta R. Costa-juss{\`a}, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, Shervin Malmasi, Christof Monz, Mathias M{\"u}ller, Santanu Pal, Matt Post, Marcos Zampieri

This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019.

Machine Translation Translation

Paper
Add Code

A Report on the Third VarDial Evaluation Campaign

no code implementations • WS 2019 • Marcos Zampieri, Shervin Malmasi, Yves Scherrer, Tanja Samard{\v{z}}i{\'c}, Francis Tyers, Miikka Silfverberg, Natalia Klyueva, Tung-Le Pan, Chu-Ren Huang, Radu Tudor Ionescu, Andrei M. Butnaru, Tommi Jauhiainen

In this paper, we present the findings of the Third VarDial Evaluation Campaign organized as part of the sixth edition of the workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with NAACL 2019.

Dialect Identification Morphological Analysis

Paper
Add Code

UTFPR at SemEval-2019 Task 5: Hate Speech Identification with Recurrent Neural Networks

no code implementations • SEMEVAL 2019 • Gustavo Henrique Paetzold, Shervin Malmasi, Marcos Zampieri

We tested our approach on the SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter (HatEval) shared task dataset.

Paper
Add Code

SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval)

2 code implementations • SEMEVAL 2019 • Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar

We present the results and the main findings of SemEval-2019 Task 6 on Identifying and Categorizing Offensive Language in Social Media (OffensEval).

Language Identification

Paper
Code

Predicting the Type and Target of Offensive Posts in Social Media

2 code implementations • NAACL 2019 • Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar

In particular, we model the task hierarchically, identifying the type and the target of offensive messages in social media.

Language Identification Vocal Bursts Type Prediction

Paper
Code

Classifying Patent Applications with Ensemble Methods

no code implementations • ALTA 2018 • Fernando Benites, Shervin Malmasi, Marcos Zampieri

We present methods for the automatic classification of patent applications using an annotated dataset provided by the organizers of the ALTA 2018 shared task - Classifying Patent Applications.

Classification General Classification

Paper
Add Code

Native Language Identification With Classifier Stacking and Ensembles

no code implementations • CL 2018 • Shervin Malmasi, Mark Dras

Ensemble methods using multiple classifiers have proven to be among the most successful approaches for the task of Native Language Identification (NLI), achieving the current state of the art.

Cross-corpus General Classification +3

Paper
Add Code

Classifier Ensembles for Dialect and Language Variety Identification

no code implementations • 14 Aug 2018 • Liviu P. Dinu, Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi

In this paper we present ensemble-based systems for dialect and language variety identification using the datasets made available by the organizers of the VarDial Evaluation Campaign 2018.

Dialect Identification

Paper
Add Code

Benchmarking Aggression Identification in Social Media

no code implementations • COLING 2018 • Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, Marcos Zampieri

For this task, the participants were provided with a dataset of 15, 000 aggression-annotated Facebook Posts and Comments each in Hindi (in both Roman and Devanagari script) and English for training and validation.

Aggression Identification Benchmarking

Paper
Add Code

Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign

no code implementations • COLING 2018 • Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Ahmed Ali, Suwon Shon, James Glass, Yves Scherrer, Tanja Samard{\v{z}}i{\'c}, Nikola Ljube{\v{s}}i{\'c}, J{\"o}rg Tiedemann, Chris van der Lee, Stefan Grondelaers, Nelleke Oostdijk, Dirk Speelman, Antal Van den Bosch, Ritesh Kumar, Bornini Lahiri, Mayank Jain

We present the results and the findings of the Second VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects.

Dependency Parsing Dialect Identification

Paper
Add Code

German Dialect Identification Using Classifier Ensembles

no code implementations • COLING 2018 • Alina Maria Ciobanu, Shervin Malmasi, Liviu P. Dinu

In this paper we present the GDI_classification entry to the second German Dialect Identification (GDI) shared task organized within the scope of the VarDial Evaluation Campaign 2018.

Dialect Identification

Paper
Add Code

Discriminating between Indo-Aryan Languages Using SVM Ensembles

no code implementations • COLING 2018 • Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi, Santanu Pal, Liviu P. Dinu

In this paper we present a system based on SVM ensembles trained on characters and words to discriminate between five similar languages of the Indo-Aryan family: Hindi, Braj Bhasha, Awadhi, Bhojpuri, and Magahi.

Language Identification

Paper
Add Code

A Portuguese Native Language Identification Dataset

no code implementations • WS 2018 • Iria del Río, Marcos Zampieri, Shervin Malmasi

In this paper we present NLI-PT, the first Portuguese dataset compiled for Native Language Identification (NLI), the task of identifying an author's first language based on their second language writing.

Language Acquisition Native Language Identification +1

Paper
Add Code

A Report on the Complex Word Identification Shared Task 2018

no code implementations • WS 2018 • Seid Muhie Yimam, Chris Biemann, Shervin Malmasi, Gustavo H. Paetzold, Lucia Specia, Sanja Štajner, Anaïs Tack, Marcos Zampieri

We report the findings of the second Complex Word Identification (CWI) shared task organized as part of the BEA workshop co-located with NAACL-HLT'2018.

Binary Classification Classification +2

Paper
Add Code

Challenges in Discriminating Profanity from Hate Speech

no code implementations • 14 Mar 2018 • Shervin Malmasi, Marcos Zampieri

In this study we approach the problem of distinguishing general profanity from hate speech in social media, something which has not been widely considered.

Clustering General Classification

Paper
Add Code

Detecting Hate Speech in Social Media

1 code implementation • RANLP 2017 • Shervin Malmasi, Marcos Zampieri

In this paper we examine methods to detect hate speech in social media, while distinguishing this from general profanity.

General Classification

Paper
Code

Exploring the Use of Text Classification in the Legal Domain

no code implementations • 25 Oct 2017 • Octavia-Maria Sulea, Marcos Zampieri, Shervin Malmasi, Mihaela Vela, Liviu P. Dinu, Josef van Genabith

In this paper, we investigate the application of text classification methods to support law professionals.

General Classification text-classification +1

Paper
Add Code

Complex Word Identification: Challenges in Data Annotation and System Performance

no code implementations • WS 2017 • Marcos Zampieri, Shervin Malmasi, Gustavo Paetzold, Lucia Specia

This paper revisits the problem of complex word identification (CWI) following up the SemEval CWI shared task.

Complex Word Identification General Classification

Paper
Add Code

A Report on the 2017 Native Language Identification Shared Task

no code implementations • WS 2017 • Shervin Malmasi, Keelan Evanini, Aoife Cahill, Joel Tetreault, Robert Pugh, Christopher Hamill, Diane Napolitano, Yao Qian

We believe this makes for a more interesting shared task while building on the methods and results from the previous two shared tasks.

Grammatical Error Correction Language Acquisition +1

Paper
Add Code

Open-Set Language Identification

no code implementations • 16 Jul 2017 • Shervin Malmasi

We present the first open-set language identification experiments using one-class classification.

General Classification Language Identification +1

Paper
Add Code

Including Dialects and Language Varieties in Author Profiling

no code implementations • 3 Jul 2017 • Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi, Liviu P. Dinu

This paper presents a computational approach to author profiling taking gender and language variety into account.

Paper
Add Code

Unsupervised Text Segmentation Based on Native Language Characteristics

no code implementations • ACL 2017 • Shervin Malmasi, Mark Dras, Mark Johnson, Lan Du, Magdalena Wolska

Most work on segmenting text does so on the basis of topic changes, but it can be of interest to segment by other, stylistically expressed characteristics such as change of authorship or native language.

Segmentation Text Segmentation

Paper
Add Code

Feature Hashing for Language and Dialect Identification

no code implementations • ACL 2017 • Shervin Malmasi, Mark Dras

We evaluate feature hashing for language identification (LID), a method not previously used for this task.

Dialect Identification Dimensionality Reduction +3

Paper
Add Code

Arabic Dialect Identification Using iVectors and ASR Transcripts

no code implementations • WS 2017 • Shervin Malmasi, Marcos Zampieri

This paper presents the systems submitted by the MAZA team to the Arabic Dialect Identification (ADI) shared task at the VarDial Evaluation Campaign 2017.

Dialect Identification Machine Translation

Paper
Add Code

German Dialect Identification in Interview Transcriptions

no code implementations • WS 2017 • Shervin Malmasi, Marcos Zampieri

This paper presents three systems submitted to the German Dialect Identification (GDI) task at the VarDial Evaluation Campaign 2017.

Dialect Identification Machine Translation

Paper
Add Code

Findings of the VarDial Evaluation Campaign 2017

no code implementations • WS 2017 • Marcos Zampieri, Shervin Malmasi, Nikola Ljube{\v{s}}i{\'c}, Preslav Nakov, Ahmed Ali, J{\"o}rg Tiedemann, Yves Scherrer, No{\"e}mi Aepli

We present the results of the VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part of the fourth edition of the VarDial workshop at EACL{'}2017.

Dependency Parsing Dialect Identification

Paper
Add Code

Native Language Identification using Stacked Generalization

no code implementations • 19 Mar 2017 • Shervin Malmasi, Mark Dras

Ensemble methods using multiple classifiers have proven to be the most successful approach for the task of Native Language Identification (NLI), achieving the current state of the art.

Native Language Identification

Paper
Add Code

Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task

no code implementations • WS 2016 • Shervin Malmasi, Marcos Zampieri, Nikola Ljube{\v{s}}i{\'c}, Preslav Nakov, Ahmed Ali, J{\"o}rg Tiedemann

We present the results of the third edition of the Discriminating between Similar Languages (DSL) shared task, which was organized as part of the VarDial{'}2016 workshop at COLING{'}2016.

Dialect Identification General Classification +1

Paper
Add Code

Arabic Dialect Identification in Speech Transcripts

no code implementations • WS 2016 • Shervin Malmasi, Marcos Zampieri

In this paper we describe a system developed to identify a set of four regional Arabic dialects (Egyptian, Gulf, Levantine, North African) and Modern Standard Arabic (MSA) in a transcribed speech corpus.

Dialect Identification Machine Translation

Paper
Add Code

Subdialectal Differences in Sorani Kurdish

no code implementations • WS 2016 • Shervin Malmasi

In this study we apply classification methods for detecting subdialectal differences in Sorani Kurdish texts produced in different regions, namely Iran and Iraq.

General Classification Information Retrieval +2

Paper
Add Code

Discriminating Similar Languages: Evaluations and Explorations

no code implementations • LREC 2016 • Cyril Goutte, Serge Léger, Shervin Malmasi, Marcos Zampieri

We present an analysis of the performance of machine learning classifiers on discriminating between similar languages and language varieties.

BIG-bench Machine Learning

Paper
Add Code

Modeling Language Change in Historical Corpora: The Case of Portuguese

no code implementations • LREC 2016 • Marcos Zampieri, Shervin Malmasi, Mark Dras

This paper presents a number of experiments to model changes in a historical Portuguese corpus composed of literary texts for the purpose of temporal text classification.

General Classification POS +2