Search Results for author: Shervin Malmasi

Found 52 papers, 3 papers with code

GEMNET: Effective Gated Gazetteer Representations for Recognizing Complex Entities in Low-context Input

no code implementations NAACL 2021 Tao Meng, Anjie Fang, Oleg Rokhlenko, Shervin Malmasi

We propose GEMNET, a novel approach for gazetteer knowledge integration, including (1) a flexible Contextual Gazetteer Representation (CGR) encoder that can be fused with any word-level model; and (2) a Mixture-of- Experts gating network that overcomes the feature overuse issue by learning to conditionally combine the context and gazetteer features, instead of assigning them fixed weights.

Named Entity Recognition NER

Evaluating Aggression Identification in Social Media

no code implementations LREC 2020 Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, Marcos Zampieri

The task consisted of two sub-tasks - aggression identification (sub-task A) and gendered identification (sub-task B) - in three languages - Bangla, Hindi and English.

Aggression Identification

A Report on the Third VarDial Evaluation Campaign

no code implementations WS 2019 Marcos Zampieri, Shervin Malmasi, Yves Scherrer, Tanja Samard{\v{z}}i{\'c}, Francis Tyers, Miikka Silfverberg, Natalia Klyueva, Tung-Le Pan, Chu-Ren Huang, Radu Tudor Ionescu, Andrei M. Butnaru, Tommi Jauhiainen

In this paper, we present the findings of the Third VarDial Evaluation Campaign organized as part of the sixth edition of the workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with NAACL 2019.

Dialect Identification Morphological Analysis

UTFPR at SemEval-2019 Task 5: Hate Speech Identification with Recurrent Neural Networks

no code implementations SEMEVAL 2019 Gustavo Henrique Paetzold, Shervin Malmasi, Marcos Zampieri

We tested our approach on the SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter (HatEval) shared task dataset.

SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval)

1 code implementation SEMEVAL 2019 Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar

We present the results and the main findings of SemEval-2019 Task 6 on Identifying and Categorizing Offensive Language in Social Media (OffensEval).

Language Identification

Classifying Patent Applications with Ensemble Methods

no code implementations ALTA 2018 Fernando Benites, Shervin Malmasi, Marcos Zampieri

We present methods for the automatic classification of patent applications using an annotated dataset provided by the organizers of the ALTA 2018 shared task - Classifying Patent Applications.

Classification General Classification

Native Language Identification With Classifier Stacking and Ensembles

no code implementations CL 2018 Shervin Malmasi, Mark Dras

Ensemble methods using multiple classifiers have proven to be among the most successful approaches for the task of Native Language Identification (NLI), achieving the current state of the art.

General Classification Language Acquisition +2

Classifier Ensembles for Dialect and Language Variety Identification

no code implementations14 Aug 2018 Liviu P. Dinu, Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi

In this paper we present ensemble-based systems for dialect and language variety identification using the datasets made available by the organizers of the VarDial Evaluation Campaign 2018.

Dialect Identification

Benchmarking Aggression Identification in Social Media

no code implementations COLING 2018 Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, Marcos Zampieri

For this task, the participants were provided with a dataset of 15, 000 aggression-annotated Facebook Posts and Comments each in Hindi (in both Roman and Devanagari script) and English for training and validation.

Aggression Identification

German Dialect Identification Using Classifier Ensembles

no code implementations COLING 2018 Alina Maria Ciobanu, Shervin Malmasi, Liviu P. Dinu

In this paper we present the GDI_classification entry to the second German Dialect Identification (GDI) shared task organized within the scope of the VarDial Evaluation Campaign 2018.

Dialect Identification

Discriminating between Indo-Aryan Languages Using SVM Ensembles

no code implementations COLING 2018 Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi, Santanu Pal, Liviu P. Dinu

In this paper we present a system based on SVM ensembles trained on characters and words to discriminate between five similar languages of the Indo-Aryan family: Hindi, Braj Bhasha, Awadhi, Bhojpuri, and Magahi.

Language Identification

A Portuguese Native Language Identification Dataset

no code implementations WS 2018 Iria del Río, Marcos Zampieri, Shervin Malmasi

In this paper we present NLI-PT, the first Portuguese dataset compiled for Native Language Identification (NLI), the task of identifying an author's first language based on their second language writing.

Language Acquisition Native Language Identification +1

Challenges in Discriminating Profanity from Hate Speech

no code implementations14 Mar 2018 Shervin Malmasi, Marcos Zampieri

In this study we approach the problem of distinguishing general profanity from hate speech in social media, something which has not been widely considered.

General Classification

Detecting Hate Speech in Social Media

1 code implementation RANLP 2017 Shervin Malmasi, Marcos Zampieri

In this paper we examine methods to detect hate speech in social media, while distinguishing this from general profanity.

General Classification

Open-Set Language Identification

no code implementations16 Jul 2017 Shervin Malmasi

We present the first open-set language identification experiments using one-class classification.

General Classification Language Identification

Including Dialects and Language Varieties in Author Profiling

no code implementations3 Jul 2017 Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi, Liviu P. Dinu

This paper presents a computational approach to author profiling taking gender and language variety into account.

Unsupervised Text Segmentation Based on Native Language Characteristics

no code implementations ACL 2017 Shervin Malmasi, Mark Dras, Mark Johnson, Lan Du, Magdalena Wolska

Most work on segmenting text does so on the basis of topic changes, but it can be of interest to segment by other, stylistically expressed characteristics such as change of authorship or native language.

Text Segmentation

Findings of the VarDial Evaluation Campaign 2017

no code implementations WS 2017 Marcos Zampieri, Shervin Malmasi, Nikola Ljube{\v{s}}i{\'c}, Preslav Nakov, Ahmed Ali, J{\"o}rg Tiedemann, Yves Scherrer, No{\"e}mi Aepli

We present the results of the VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part of the fourth edition of the VarDial workshop at EACL{'}2017.

Dependency Parsing Dialect Identification

German Dialect Identification in Interview Transcriptions

no code implementations WS 2017 Shervin Malmasi, Marcos Zampieri

This paper presents three systems submitted to the German Dialect Identification (GDI) task at the VarDial Evaluation Campaign 2017.

Dialect Identification Machine Translation

Arabic Dialect Identification Using iVectors and ASR Transcripts

no code implementations WS 2017 Shervin Malmasi, Marcos Zampieri

This paper presents the systems submitted by the MAZA team to the Arabic Dialect Identification (ADI) shared task at the VarDial Evaluation Campaign 2017.

Dialect Identification Machine Translation

Native Language Identification using Stacked Generalization

no code implementations19 Mar 2017 Shervin Malmasi, Mark Dras

Ensemble methods using multiple classifiers have proven to be the most successful approach for the task of Native Language Identification (NLI), achieving the current state of the art.

Native Language Identification

Arabic Dialect Identification in Speech Transcripts

no code implementations WS 2016 Shervin Malmasi, Marcos Zampieri

In this paper we describe a system developed to identify a set of four regional Arabic dialects (Egyptian, Gulf, Levantine, North African) and Modern Standard Arabic (MSA) in a transcribed speech corpus.

Dialect Identification Machine Translation

Subdialectal Differences in Sorani Kurdish

no code implementations WS 2016 Shervin Malmasi

In this study we apply classification methods for detecting subdialectal differences in Sorani Kurdish texts produced in different regions, namely Iran and Iraq.

General Classification Information Retrieval +2

Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task

no code implementations WS 2016 Shervin Malmasi, Marcos Zampieri, Nikola Ljube{\v{s}}i{\'c}, Preslav Nakov, Ahmed Ali, J{\"o}rg Tiedemann

We present the results of the third edition of the Discriminating between Similar Languages (DSL) shared task, which was organized as part of the VarDial{'}2016 workshop at COLING{'}2016.

Dialect Identification General Classification

Discriminating Similar Languages: Evaluations and Explorations

no code implementations LREC 2016 Cyril Goutte, Serge Léger, Shervin Malmasi, Marcos Zampieri

We present an analysis of the performance of machine learning classifiers on discriminating between similar languages and language varieties.

Modeling Language Change in Historical Corpora: The Case of Portuguese

no code implementations LREC 2016 Marcos Zampieri, Shervin Malmasi, Mark Dras

This paper presents a number of experiments to model changes in a historical Portuguese corpus composed of literary texts for the purpose of temporal text classification.

General Classification POS +1

Cannot find the paper you are looking for? You can Submit a new open access paper.