Search Results for author: Shervin Malmasi

Found 67 papers, 4 papers with code

SemEval-2022 Task 11: Multilingual Complex Named Entity Recognition (MultiCoNER)

no code implementations SemEval (NAACL) 2022 Shervin Malmasi, Anjie Fang, Besnik Fetahu, Sudipta Kar, Oleg Rokhlenko

Divided into 13 tracks, the task focused on methods to identify complex named entities (like names of movies, products and groups) in 11 languages in both monolingual and multi-lingual scenarios.

named-entity-recognition Named Entity Recognition +1

Instant Answering in E-Commerce Buyer-Seller Messaging using Message-to-Question Reformulation

no code implementations18 Jan 2024 Besnik Fetahu, Tejas Mehta, Qun Song, Nikhita Vedula, Oleg Rokhlenko, Shervin Malmasi

E-commerce customers frequently seek detailed product information for purchase decisions, commonly contacting sellers directly with extended queries.

Question Answering

Follow-on Question Suggestion via Voice Hints for Voice Assistants

no code implementations25 Oct 2023 Besnik Fetahu, Pedro Faustini, Giuseppe Castellucci, Anjie Fang, Oleg Rokhlenko, Shervin Malmasi

Using a new dataset of 6681 input questions and human written hints, we evaluated the models with automatic metrics and human evaluation.

MultiCoNER v2: a Large Multilingual dataset for Fine-grained and Noisy Named Entity Recognition

no code implementations20 Oct 2023 Besnik Fetahu, Zhiyu Chen, Sudipta Kar, Oleg Rokhlenko, Shervin Malmasi

We present MULTICONER V2, a dataset for fine-grained Named Entity Recognition covering 33 entity classes across 12 languages, in both monolingual and multilingual settings.

named-entity-recognition Named Entity Recognition +2

Generate-then-Retrieve: Intent-Aware FAQ Retrieval in Product Search

no code implementations6 Jun 2023 Zhiyu Chen, Jason Choi, Besnik Fetahu, Oleg Rokhlenko, Shervin Malmasi

We propose an intent-aware FAQ retrieval system consisting of (1) an intent classifier that predicts when a user's information need can be answered by an FAQ; (2) a reformulation model that rewrites a query into a natural question.

Retrieval

Faithful Low-Resource Data-to-Text Generation through Cycle Training

1 code implementation24 May 2023 Zhuoer Wang, Marcus Collins, Nikhita Vedula, Simone Filice, Shervin Malmasi, Oleg Rokhlenko

Cycle training uses two models which are inverses of each other: one that generates text from structured data, and one which generates the structured data from natural language text.

Data-to-Text Generation

Preventing Catastrophic Forgetting in Continual Learning of New Natural Language Tasks

no code implementations22 Feb 2023 Sudipta Kar, Giuseppe Castellucci, Simone Filice, Shervin Malmasi, Oleg Rokhlenko

In this paper, we approach the problem of incrementally expanding MTL models' capability to solve new tasks over time by distilling the knowledge of an already trained model on n tasks into a new one for solving n+1 tasks.

Continual Learning Multi-Task Learning

Reinforced Question Rewriting for Conversational Question Answering

no code implementations27 Oct 2022 Zhiyu Chen, Jie Zhao, Anjie Fang, Besnik Fetahu, Oleg Rokhlenko, Shervin Malmasi

Furthermore, human evaluation shows that our method can generate more accurate and detailed rewrites when compared to human annotations.

Question Rewriting Retrieval

MultiCoNER: A Large-scale Multilingual dataset for Complex Named Entity Recognition

no code implementations COLING 2022 Shervin Malmasi, Anjie Fang, Besnik Fetahu, Sudipta Kar, Oleg Rokhlenko

We present MultiCoNER, a large multilingual dataset for Named Entity Recognition that covers 3 domains (Wiki sentences, questions, and search queries) across 11 languages, as well as multilingual and code-mixing subsets.

Machine Translation named-entity-recognition +3

GEMNET: Effective Gated Gazetteer Representations for Recognizing Complex Entities in Low-context Input

no code implementations NAACL 2021 Tao Meng, Anjie Fang, Oleg Rokhlenko, Shervin Malmasi

We propose GEMNET, a novel approach for gazetteer knowledge integration, including (1) a flexible Contextual Gazetteer Representation (CGR) encoder that can be fused with any word-level model; and (2) a Mixture-of- Experts gating network that overcomes the feature overuse issue by learning to conditionally combine the context and gazetteer features, instead of assigning them fixed weights.

named-entity-recognition Named Entity Recognition +1

Evaluating Aggression Identification in Social Media

no code implementations LREC 2020 Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, Marcos Zampieri

The task consisted of two sub-tasks - aggression identification (sub-task A) and gendered identification (sub-task B) - in three languages - Bangla, Hindi and English.

Aggression Identification

A Report on the Third VarDial Evaluation Campaign

no code implementations WS 2019 Marcos Zampieri, Shervin Malmasi, Yves Scherrer, Tanja Samard{\v{z}}i{\'c}, Francis Tyers, Miikka Silfverberg, Natalia Klyueva, Tung-Le Pan, Chu-Ren Huang, Radu Tudor Ionescu, Andrei M. Butnaru, Tommi Jauhiainen

In this paper, we present the findings of the Third VarDial Evaluation Campaign organized as part of the sixth edition of the workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with NAACL 2019.

Dialect Identification Morphological Analysis

UTFPR at SemEval-2019 Task 5: Hate Speech Identification with Recurrent Neural Networks

no code implementations SEMEVAL 2019 Gustavo Henrique Paetzold, Shervin Malmasi, Marcos Zampieri

We tested our approach on the SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter (HatEval) shared task dataset.

SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval)

2 code implementations SEMEVAL 2019 Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar

We present the results and the main findings of SemEval-2019 Task 6 on Identifying and Categorizing Offensive Language in Social Media (OffensEval).

Language Identification

Classifying Patent Applications with Ensemble Methods

no code implementations ALTA 2018 Fernando Benites, Shervin Malmasi, Marcos Zampieri

We present methods for the automatic classification of patent applications using an annotated dataset provided by the organizers of the ALTA 2018 shared task - Classifying Patent Applications.

Classification General Classification

Native Language Identification With Classifier Stacking and Ensembles

no code implementations CL 2018 Shervin Malmasi, Mark Dras

Ensemble methods using multiple classifiers have proven to be among the most successful approaches for the task of Native Language Identification (NLI), achieving the current state of the art.

Cross-corpus General Classification +3

Classifier Ensembles for Dialect and Language Variety Identification

no code implementations14 Aug 2018 Liviu P. Dinu, Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi

In this paper we present ensemble-based systems for dialect and language variety identification using the datasets made available by the organizers of the VarDial Evaluation Campaign 2018.

Dialect Identification

Benchmarking Aggression Identification in Social Media

no code implementations COLING 2018 Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, Marcos Zampieri

For this task, the participants were provided with a dataset of 15, 000 aggression-annotated Facebook Posts and Comments each in Hindi (in both Roman and Devanagari script) and English for training and validation.

Aggression Identification Benchmarking

German Dialect Identification Using Classifier Ensembles

no code implementations COLING 2018 Alina Maria Ciobanu, Shervin Malmasi, Liviu P. Dinu

In this paper we present the GDI_classification entry to the second German Dialect Identification (GDI) shared task organized within the scope of the VarDial Evaluation Campaign 2018.

Dialect Identification

Discriminating between Indo-Aryan Languages Using SVM Ensembles

no code implementations COLING 2018 Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi, Santanu Pal, Liviu P. Dinu

In this paper we present a system based on SVM ensembles trained on characters and words to discriminate between five similar languages of the Indo-Aryan family: Hindi, Braj Bhasha, Awadhi, Bhojpuri, and Magahi.

Language Identification

A Portuguese Native Language Identification Dataset

no code implementations WS 2018 Iria del Río, Marcos Zampieri, Shervin Malmasi

In this paper we present NLI-PT, the first Portuguese dataset compiled for Native Language Identification (NLI), the task of identifying an author's first language based on their second language writing.

Language Acquisition Native Language Identification +1

Challenges in Discriminating Profanity from Hate Speech

no code implementations14 Mar 2018 Shervin Malmasi, Marcos Zampieri

In this study we approach the problem of distinguishing general profanity from hate speech in social media, something which has not been widely considered.

Clustering General Classification

Detecting Hate Speech in Social Media

1 code implementation RANLP 2017 Shervin Malmasi, Marcos Zampieri

In this paper we examine methods to detect hate speech in social media, while distinguishing this from general profanity.

General Classification

Open-Set Language Identification

no code implementations16 Jul 2017 Shervin Malmasi

We present the first open-set language identification experiments using one-class classification.

General Classification Language Identification +1

Including Dialects and Language Varieties in Author Profiling

no code implementations3 Jul 2017 Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi, Liviu P. Dinu

This paper presents a computational approach to author profiling taking gender and language variety into account.

Unsupervised Text Segmentation Based on Native Language Characteristics

no code implementations ACL 2017 Shervin Malmasi, Mark Dras, Mark Johnson, Lan Du, Magdalena Wolska

Most work on segmenting text does so on the basis of topic changes, but it can be of interest to segment by other, stylistically expressed characteristics such as change of authorship or native language.

Segmentation Text Segmentation

German Dialect Identification in Interview Transcriptions

no code implementations WS 2017 Shervin Malmasi, Marcos Zampieri

This paper presents three systems submitted to the German Dialect Identification (GDI) task at the VarDial Evaluation Campaign 2017.

Dialect Identification Machine Translation

Arabic Dialect Identification Using iVectors and ASR Transcripts

no code implementations WS 2017 Shervin Malmasi, Marcos Zampieri

This paper presents the systems submitted by the MAZA team to the Arabic Dialect Identification (ADI) shared task at the VarDial Evaluation Campaign 2017.

Dialect Identification Machine Translation

Findings of the VarDial Evaluation Campaign 2017

no code implementations WS 2017 Marcos Zampieri, Shervin Malmasi, Nikola Ljube{\v{s}}i{\'c}, Preslav Nakov, Ahmed Ali, J{\"o}rg Tiedemann, Yves Scherrer, No{\"e}mi Aepli

We present the results of the VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part of the fourth edition of the VarDial workshop at EACL{'}2017.

Dependency Parsing Dialect Identification

Native Language Identification using Stacked Generalization

no code implementations19 Mar 2017 Shervin Malmasi, Mark Dras

Ensemble methods using multiple classifiers have proven to be the most successful approach for the task of Native Language Identification (NLI), achieving the current state of the art.

Native Language Identification

Subdialectal Differences in Sorani Kurdish

no code implementations WS 2016 Shervin Malmasi

In this study we apply classification methods for detecting subdialectal differences in Sorani Kurdish texts produced in different regions, namely Iran and Iraq.

General Classification Information Retrieval +2

Arabic Dialect Identification in Speech Transcripts

no code implementations WS 2016 Shervin Malmasi, Marcos Zampieri

In this paper we describe a system developed to identify a set of four regional Arabic dialects (Egyptian, Gulf, Levantine, North African) and Modern Standard Arabic (MSA) in a transcribed speech corpus.

Dialect Identification Machine Translation

Modeling Language Change in Historical Corpora: The Case of Portuguese

no code implementations LREC 2016 Marcos Zampieri, Shervin Malmasi, Mark Dras

This paper presents a number of experiments to model changes in a historical Portuguese corpus composed of literary texts for the purpose of temporal text classification.

General Classification POS +2

Discriminating Similar Languages: Evaluations and Explorations

no code implementations LREC 2016 Cyril Goutte, Serge Léger, Shervin Malmasi, Marcos Zampieri

We present an analysis of the performance of machine learning classifiers on discriminating between similar languages and language varieties.

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.