Search Results for author: Mona Diab

Found 141 papers, 20 papers with code

BeSt: The Belief and Sentiment Corpus

no code implementations • LREC 2022 • Jennifer Tracey, Owen Rambow, Claire Cardie, Adam Dalton, Hoa Trang Dang, Mona Diab, Bonnie Dorr, Louise Guthrie, Magdalena Markowska, Smaranda Muresan, Vinodkumar Prabhakaran, Samira Shaikh, Tomek Strzalkowski

We present the BeSt corpus, which records cognitive state: who believes what (i. e., factuality), and who has what sentiment towards what.

Paper
Add Code

Active Learning for Rumor Identification on Social Media

no code implementations • Findings (EMNLP) 2021 • Parsa Farinneya, Mohammad Mahdi Abdollah Pour, Sardar Hamidian, Mona Diab

We discuss the impact of multiple classifiers on a limited amount of annotated data followed by an interactive approach to gradually update the models by adding the least certain samples (LCS) from the pool of unlabeled data.

Active Learning Transfer Learning

Paper
Add Code

Investigating the Impact of Various Partial Diacritization Schemes on Arabic-English Statistical Machine Translation

no code implementations • AMTA 2016 • Sawsan Alqahtani, Mahmoud Ghoneim, Mona Diab

The absence of these diacritics naturally leads to significant word ambiguity to top the inherent ambiguity present in fully diacritized words.

Machine Translation Translation

Paper
Add Code

Towards Responsible Natural Language Annotation for the Varieties of Arabic

no code implementations • Findings (ACL) 2022 • A. Bergman, Mona Diab

When building NLP models, there is a tendency to aim for broader coverage, often overlooking cultural and (socio)linguistic nuance.

Position

Paper
Add Code

Multitask Learning for Cross-Lingual Transfer of Broad-coverage Semantic Dependencies

no code implementations • EMNLP 2020 • Maryam Aminian, Mohammad Sadegh Rasooli, Mona Diab

We describe a method for developing broad-coverage semantic dependency parsers for languages for which no semantically annotated resource is available.

Cross-Lingual Transfer

Paper
Add Code

Emotion Classification in Low and Moderate Resource Languages

no code implementations • 28 Feb 2024 • Shabnam Tafreshi, Shubham Vatsal, Mona Diab

There are 7100+ active languages spoken around the world and building emotion classification for each language is labor intensive.

Classification Cross-Lingual Transfer +2

Paper
Add Code

Investigating Cultural Alignment of Large Language Models

1 code implementation • 20 Feb 2024 • Badr AlKhamissi, Muhammad ElNokrashy, Mai AlKhamissi, Mona Diab

The intricate relationship between language and culture has long been a subject of exploration within the realm of linguistic anthropology.

Cross-Lingual Transfer

Paper
Code

A Note on Bias to Complete

no code implementations • 18 Feb 2024 • Jia Xu, Mona Diab

Minimizing social bias strengthens societal bonds, promoting shared understanding and better decision-making.

Decision Making

Paper
Add Code

Can Large Language Models Infer Causation from Correlation?

1 code implementation • 9 Jun 2023 • Zhijing Jin, Jiarui Liu, Zhiheng Lyu, Spencer Poff, Mrinmaya Sachan, Rada Mihalcea, Mona Diab, Bernhard Schölkopf

In this work, we propose the first benchmark dataset to test the pure causal inference skills of large language models (LLMs).

Causal Inference

Paper
Code

OPT-R: Exploring the Role of Explanations in Finetuning and Prompting for Reasoning Skills of Large Language Models

no code implementations • 19 May 2023 • Badr AlKhamissi, Siddharth Verma, Ping Yu, Zhijing Jin, Asli Celikyilmaz, Mona Diab

Our study entails finetuning three different sizes of OPT on a carefully curated reasoning corpus, resulting in two sets of finetuned models: OPT-R, finetuned without explanations, and OPT-RE, finetuned with explanations.

Paper
Add Code

ALERT: Adapting Language Models to Reasoning Tasks

no code implementations • 16 Dec 2022 • Ping Yu, Tianlu Wang, Olga Golovneva, Badr Alkhamissy, Gargi Ghosh, Mona Diab, Asli Celikyilmaz

Current large language models can perform reasonably well on complex tasks that require step-by-step reasoning with few-shot learning.

Few-Shot Learning Language Modelling +1

Paper
Add Code

Enabling Classifiers to Make Judgements Explicitly Aligned with Human Values

no code implementations • 14 Oct 2022 • Yejin Bang, Tiezheng Yu, Andrea Madotto, Zhaojiang Lin, Mona Diab, Pascale Fung

Therefore, we introduce a framework for value-aligned classification that performs prediction based on explicitly written human values in the command.

Classification Few-Shot Learning +1

Paper
Add Code

Text Characterization Toolkit

no code implementations • 4 Oct 2022 • Daniel Simig, Tianlu Wang, Verna Dankers, Peter Henderson, Khuyagbaatar Batsuren, Dieuwke Hupkes, Mona Diab

In NLP, models are usually evaluated by reporting single-number performance scores on a number of readily available benchmarks, without much deeper analysis.

Paper
Add Code

Depth-Wise Attention (DWAtt): A Layer Fusion Method for Data-Efficient Classification

no code implementations • 30 Sep 2022 • Muhammad ElNokrashy, Badr AlKhamissi, Mona Diab

To test this, we propose a new layer fusion method: Depth-Wise Attention (DWAtt), to help re-surface signals from non-final layers.

NER

Paper
Add Code

ToKen: Task Decomposition and Knowledge Infusion for Few-Shot Hate Speech Detection

no code implementations • 25 May 2022 • Badr AlKhamissi, Faisal Ladhak, Srini Iyer, Ves Stoyanov, Zornitsa Kozareva, Xian Li, Pascale Fung, Lambert Mathias, Asli Celikyilmaz, Mona Diab

Hate speech detection is complex; it relies on commonsense reasoning, knowledge of stereotypes, and an understanding of social nuance that differs from one culture to the next.

Cultural Vocal Bursts Intensity Prediction Few-Shot Learning +1

Paper
Add Code

GisPy: A Tool for Measuring Gist Inference Score in Text

1 code implementation • NAACL (WNU) 2022 • Pedram Hosseini, Christopher R. Wolfe, Mona Diab, David A. Broniatowski

Decision making theories such as Fuzzy-Trace Theory (FTT) suggest that individuals tend to rely on gist, or bottom-line meaning, in the text when making decisions.

Coherence Evaluation Decision Making +1

Paper
Code

Consistent Human Evaluation of Machine Translation across Language Pairs

no code implementations • AMTA 2022 • Daniel Licht, Cynthia Gao, Janice Lam, Francisco Guzman, Mona Diab, Philipp Koehn

Obtaining meaningful quality scores for machine translation systems through human evaluation remains a challenge given the high variability between human evaluators, partly due to subjective expectations for translation quality for different language pairs.

Machine Translation Translation

Paper
Add Code

Meta AI at Arabic Hate Speech 2022: MultiTask Learning with Self-Correction for Hate Speech Classification

no code implementations • OSACT (LREC) 2022 • Badr AlKhamissi, Mona Diab

The tasks are to predict if a tweet contains (1) Offensive language; and whether it is considered (2) Hate Speech or not and if so, then predict the (3) Fine-Grained Hate Speech label from one of six categories.

Hate Speech Detection

Paper
Add Code

OPT: Open Pre-trained Transformer Language Models

7 code implementations • 2 May 2022 • Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer

Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning.

Ranked #2 on Stereotypical Bias Analysis on CrowS-Pairs

Hate Speech Detection Language Modelling +1

6,376

Paper
Code

A Review on Language Models as Knowledge Bases

no code implementations • 12 Apr 2022 • Badr AlKhamissi, Millicent Li, Asli Celikyilmaz, Mona Diab, Marjan Ghazvininejad

Recently, there has been a surge of interest in the NLP community on the use of pretrained Language Models (LMs) as Knowledge Bases (KBs).

Paper
Add Code

CALCS 2021 Shared Task: Machine Translation for Code-Switched Data

no code implementations • 19 Feb 2022 • Shuguang Chen, Gustavo Aguilar, Anirudh Srinivasan, Mona Diab, Thamar Solorio

For the unsupervised setting, we provide the following language pairs: English and Spanish-English (Eng-Spanglish), and English and Modern Standard Arabic-Egyptian Arabic (Eng-MSAEA) in both directions.

Language Identification Machine Translation +3

Paper
Add Code

A Quantitative and Qualitative Analysis of Schizophrenia Language

no code implementations • 25 Jan 2022 • Amal Alqahtani, Efsun Sarioglu Kay, Sardar Hamidian, Michael Compton, Mona Diab

They score lower in most of the linguistic features of cohesion with significant p-values.

Specificity

Paper
Add Code

Efficient Large Scale Language Modeling with Mixtures of Experts

no code implementations • 20 Dec 2021 • Mikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giri Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Mona Diab, Zornitsa Kozareva, Ves Stoyanov

This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full-shot fine-tuning.

Language Modelling

Paper
Add Code

Few-shot Learning with Multilingual Language Models

2 code implementations • 20 Dec 2021 • Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li

Large-scale generative language models such as GPT-3 are competitive few-shot learners.

Cross-Lingual Transfer Few-Shot Learning +5

29,187

Paper
Code

Knowledge-Augmented Language Models for Cause-Effect Relation Classification

1 code implementation • CSRR (ACL) 2022 • Pedram Hosseini, David A. Broniatowski, Mona Diab

Previous studies have shown the efficacy of knowledge augmentation methods in pretrained language models.

Cause-Effect Relation Classification Classification +3

Paper
Code

Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs

1 code implementation • 26 Nov 2021 • Peter Hase, Mona Diab, Asli Celikyilmaz, Xian Li, Zornitsa Kozareva, Veselin Stoyanov, Mohit Bansal, Srinivasan Iyer

In this paper, we discuss approaches to detecting when models have beliefs about the world, and we improve on methods for updating model beliefs to be more truthful, with a focus on methods based on learned optimizers or hypernetworks.

Paper
Code

AnswerSumm: A Manually-Curated Dataset and Pipeline for Answer Summarization

1 code implementation • NAACL 2022 • Alexander R. Fabbri, Xiaojian Wu, Srini Iyer, Haoran Li, Mona Diab

One goal of answer summarization is to produce a summary that reflects the range of answer perspectives.

Community Question Answering Data Augmentation +1

Paper
Code

Discrete Cosine Transform as Universal Sentence Encoder

no code implementations • ACL 2021 • Nada Almarwani, Mona Diab

Modern sentence encoders are used to generate dense vector representations that capture the underlying linguistic characteristics for a sequence of words, including phrases, sentences, or paragraphs.

Question Answering Sentence +3

Paper
Add Code

Gender Bias Amplification During Speed-Quality Optimization in Neural Machine Translation

no code implementations • ACL 2021 • Adithya Renduchintala, Denise Diaz, Kenneth Heafield, Xian Li, Mona Diab

Is bias amplified when neural machine translation (NMT) models are optimized for speed and evaluated on generic test sets using BLEU?

Machine Translation NMT +2

Paper
Add Code

Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data

1 code implementation • ACL 2021 • Wei-Jen Ko, Ahmed El-Kishky, Adithya Renduchintala, Vishrav Chaudhary, Naman Goyal, Francisco Guzmán, Pascale Fung, Philipp Koehn, Mona Diab

The scarcity of parallel data is a major obstacle for training high-quality machine translation systems for low-resource languages.

Denoising Machine Translation +2

Paper
Code

Multi-Perspective Abstractive Answer Summarization

no code implementations • 17 Apr 2021 • Alexander R. Fabbri, Xiaojian Wu, Srini Iyer, Mona Diab

A major obstacle for multi-perspective, abstractive answer summarization is the absence of a dataset to provide supervision for producing such summaries.

Community Question Answering Sentence

Paper
Add Code

Predicting Directionality in Causal Relations in Text

2 code implementations • 25 Mar 2021 • Pedram Hosseini, David A. Broniatowski, Mona Diab

In this work, we test the performance of two bidirectional transformer-based language models, BERT and SpanBERT, on predicting directionality in causal pairs in the textual content.

Sentence

Paper
Code

White Paper: Challenges and Considerations for the Creation of a Large Labelled Repository of Online Videos with Questionable Content

no code implementations • 25 Jan 2021 • Thamar Solorio, Mahsa Shafaei, Christos Smailis, Mona Diab, Theodore Giannakopoulos, Heng Ji, Yang Liu, Rada Mihalcea, Smaranda Muresan, Ioannis Kakadiaris

This white paper presents a summary of the discussions regarding critical considerations to develop an extensive repository of online videos annotated with labels indicating questionable content.

Paper
Add Code

Detecting Urgency Status of Crisis Tweets: A Transfer Learning Approach for Low Resource Languages

1 code implementation • COLING 2020 • Efsun Sarioglu Kayi, Linyong Nan, Bohan Qu, Mona Diab, Kathleen McKeown

We adopt cross-lingual embeddings constructed using different methods to extract features of the tweets, including a few state-of-the-art contextual embeddings such as BERT, RoBERTa and XLM-R. We train classifiers of different architectures on the extracted features.

Transfer Learning XLM-R

Paper
Code

Detecting Hallucinated Content in Conditional Neural Sequence Generation

2 code implementations • Findings (ACL) 2021 • Chunting Zhou, Graham Neubig, Jiatao Gu, Mona Diab, Paco Guzman, Luke Zettlemoyer, Marjan Ghazvininejad

Neural sequence models can generate highly fluent sentences, but recent studies have also shown that they are also prone to hallucinate additional content not supported by the input.

Abstractive Text Summarization Hallucination +1

Paper
Code

A Multitask Learning Approach for Diacritic Restoration

no code implementations • ACL 2020 • Sawsan Alqahtani, Ajay Mishra, Mona Diab

Such diacritics are often omitted in written text, increasing the number of possible pronunciations and meanings for a word.

Multi-Task Learning Part-Of-Speech Tagging

Paper
Add Code

FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization

2 code implementations • ACL 2020 • Esin Durmus, He He, Mona Diab

We tackle the problem of evaluating faithfulness of a generated summary given its source document.

Abstractive Text Summarization Question Answering +1

104

Paper
Code

Mutlitask Learning for Cross-Lingual Transfer of Semantic Dependencies

no code implementations • 30 Apr 2020 • Maryam Aminian, Mohammad Sadegh Rasooli, Mona Diab

We make use of supervised syntactic parsing as an auxiliary task in a multitask learning framework, and show that with different multitask learning settings, we consistently improve over the single-task baseline.

Cross-Lingual Transfer

Paper
Add Code

DeSePtion: Dual Sequence Prediction and Adversarial Examples for Improved Fact-Checking

1 code implementation • ACL 2020 • Christopher Hidey, Tuhin Chakrabarty, Tariq Alhindi, Siddharth Varia, Kriste Krstovski, Mona Diab, Smaranda Muresan

The increased focus on misinformation has spurred development of data and systems for detecting the veracity of a claim as well as retrieving authoritative evidence.

Fact Checking Misinformation +1

Paper
Code

Learning to Classify Intents and Slot Labels Given a Handful of Examples

no code implementations • WS 2020 • Jason Krone, Yi Zhang, Mona Diab

Prototypical networks achieves significant gains in IC performance on the ATIS and TOP datasets, while both prototypical networks and MAML outperform the baseline with respect to SF on all three datasets.

Few-Shot Learning Goal-Oriented Dialogue Systems +4

Paper
Add Code

Diversity, Density, and Homogeneity: Quantitative Characteristic Metrics for Text Collections

no code implementations • LREC 2020 • Yi-An Lai, Xuan Zhu, Yi Zhang, Mona Diab

Summarizing data samples by quantitative measures has a long history, with descriptive statistics being a case in point.

Descriptive text-classification +1

Paper
Add Code

Efficient Convolutional Neural Networks for Diacritic Restoration

no code implementations • IJCNLP 2019 • Sawsan Alqahtani, Ajay Mishra, Mona Diab

Diacritic restoration has gained importance with the growing need for machines to understand written texts.

Paper
Add Code

Homograph Disambiguation Through Selective Diacritic Restoration

no code implementations • WS 2019 • Sawsan Alqahtani, Hanan Aldarmaki, Mona Diab

Diacritic restoration could theoretically help disambiguate these words, but in practice, the increase in overall sparsity leads to performance degradation in NLP applications.

Machine Translation Part-Of-Speech Tagging +2

Paper
Add Code

Multi-Domain Goal-Oriented Dialogues (MultiDoGO): Strategies toward Curating and Annotating Large Scale Dialogue Data

no code implementations • IJCNLP 2019 • Denis Peskov, Nancy Clarke, Jason Krone, Brigi Fodor, Yi Zhang, Adel Youssef, Mona Diab

With a total of over 81K dialogues harvested across six domains, MultiDoGO is over 8 times the size of MultiWOZ, the other largest comparable dialogue dataset currently available to the public.

Sentence

Paper
Add Code

Identifying Nuances in Fake News vs. Satire: Using Semantic and Linguistic Cues

1 code implementation • WS 2019 • Or Levi, Pedram Hosseini, Mona Diab, David A. Broniatowski

As avenues for future work, we consider studying additional linguistic features related to the humor aspect, and enriching the data with current news events, to help identify a political or social message.

Language Modelling Misinformation

Paper
Code

Creating a Large Multi-Layered Representational Repository of Linguistic Code Switched Arabic Data

no code implementations • LREC 2016 • Mona Diab, Mahmoud Ghoneim, Abdelati Hawwari, Fahad AlGhamdi, Nada Almarwani, Mohamed Al-Badrashiny

We present our effort to create a large Multi-Layered representational repository of Linguistic Code-Switched Arabic data.

Management

Paper
Add Code

WASA: A Web Application for Sequence Annotation

no code implementations • LREC 2018 • Fahad AlGhamdi, Mona Diab

Data annotation is an important and necessary task for all NLP applications.

Management TAG

Paper
Add Code

Overview for the Second Shared Task on Language Identification in Code-Switched Data

no code implementations • WS 2016 • Giovanni Molina, Fahad AlGhamdi, Mahmoud Ghoneim, Abdelati Hawwari, Nicolas Rey-Villamizar, Mona Diab, Thamar Solorio

We present an overview of the second shared task on language identification in code-switched data.

Language Identification Single Particle Analysis

Paper
Add Code

Part of speech tagging for code switched data

no code implementations • WS 2016 • Fahad AlGhamdi, Giovanni Molina, Mona Diab, Thamar Solorio, Abdelati Hawwari, Victor Soto, Julia Hirschberg

We address the problem of Part of Speech tagging (POS) in the context of linguistic code switching (CS).

Part-Of-Speech Tagging POS

Paper
Add Code

CASA-NLU: Context-Aware Self-Attentive Natural Language Understanding for Task-Oriented Chatbots

no code implementations • IJCNLP 2019 • Arshit Gupta, Peng Zhang, Garima Lalwani, Mona Diab

In this work, we propose a context-aware self-attentive NLU (CASA-NLU) model that uses multiple signals, such as previous intents, slots, dialog acts and utterances over a variable context window, in addition to the current user utterance.

Dialogue Management intent-classification +3

Paper
Add Code

Efficient Sentence Embedding using Discrete Cosine Transform

1 code implementation • IJCNLP 2019 • Nada Almarwani, Hanan Aldarmaki, Mona Diab

Vector averaging remains one of the most popular sentence embedding methods in spite of its obvious disregard for syntactic structure.

Classification General Classification +3

Paper
Code

Named Entity Recognition on Code-Switched Data: Overview of the CALCS 2018 Shared Task

no code implementations • WS 2018 • Gustavo Aguilar, Fahad AlGhamdi, Victor Soto, Mona Diab, Julia Hirschberg, Thamar Solorio

In the third shared task of the Computational Approaches to Linguistic Code-Switching (CALCS) workshop, we focus on Named Entity Recognition (NER) on code-switched social-media data.

named-entity-recognition Named Entity Recognition +2

Paper
Add Code

Does Causal Coherence Predict Online Spread of Social Media?

1 code implementation • International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation 2019 • Pedram Hosseini, Mona Diab, David A. Broniatowski

In this paper, we test the hypothesis that causal and semantic coherence are associated with online sharing of misinformative social media content using Coh-Metrix – a widely-used set of psycholinguistic measures.

Decision Making Misinformation

Paper
Code

GWU NLP Lab at SemEval-2019 Task 3 : EmoContext: Effectiveness ofContextual Information in Models for Emotion Detection inSentence-level at Multi-genre Corpus

no code implementations • SEMEVAL 2019 • Shabnam Tafreshi, Mona Diab

Our aim is to build a robust emotion classifier that can generalize emotion detection, which is to learn emotion cues in a noisy training environment.

Word Embeddings

Paper
Add Code

GWU NLP at SemEval-2019 Task 7: Hybrid Pipeline for Rumour Veracity and Stance Classification on Social Media

no code implementations • SEMEVAL 2019 • Sardar Hamidian, Mona Diab

Social media plays a crucial role as the main resource news for information seekers online.

General Classification Stance Classification

Paper
Add Code

Leveraging Pretrained Word Embeddings for Part-of-Speech Tagging of Code Switching Data

no code implementations • WS 2019 • Fahad AlGhamdi, Mona Diab

In this paper, we address the problem of Part-of-Speech tagging (POS) in the context of linguistic code switching (CS).

Part-Of-Speech Tagging POS +3

Paper
Add Code

GWU NLP Lab at SemEval-2019 Task 3: EmoContext: Effective Contextual Information in Models for Emotion Detection in Sentence-level in a Multigenre Corpus

no code implementations • 23 May 2019 • Shabnam Tafreshi, Mona Diab

In this paper we present an emotion classifier model submitted to the SemEval-2019 Task 3: EmoContext.

General Classification Sentence +1

Paper
Add Code

Scalable Cross-Lingual Transfer of Neural Sentence Embeddings

no code implementations • SEMEVAL 2019 • Hanan Aldarmaki, Mona Diab

We develop and investigate several cross-lingual alignment approaches for neural sentence embedding models, such as the supervised inference classifier, InferSent, and sequential encoder-decoder models.

Cross-Lingual Transfer Sentence +3

Paper
Add Code

Cross-Lingual Transfer of Semantic Roles: From Raw Text to Semantic Roles

no code implementations • WS 2019 • Maryam Aminian, Mohammad Sadegh Rasooli, Mona Diab

We describe a transfer method based on annotation projection to develop a dependency-based semantic role labeling system for languages for which no supervised linguistic information other than parallel data is available.

Cross-Lingual Transfer Semantic Role Labeling

Paper
Add Code

Context-Aware Cross-Lingual Mapping

1 code implementation • NAACL 2019 • Hanan Aldarmaki, Mona Diab

Cross-lingual word vectors are typically obtained by fitting an orthogonal matrix that maps the entries of a bilingual dictionary from a source to a target vector space.

Retrieval Sentence +4

Paper
Code

The ARIEL-CMU Systems for LoReHLT18

no code implementations • 24 Feb 2019 • Aditi Chaudhary, Siddharth Dalmia, Junjie Hu, Xinjian Li, Austin Matthews, Aldrian Obaja Muis, Naoki Otani, Shruti Rijhwani, Zaid Sheikh, Nidhi Vyas, Xinyi Wang, Jiateng Xie, Ruochen Xu, Chunting Zhou, Peter J. Jansen, Yiming Yang, Lori Levin, Florian Metze, Teruko Mitamura, David R. Mortensen, Graham Neubig, Eduard Hovy, Alan W. black, Jaime Carbonell, Graham V. Horwood, Shabnam Tafreshi, Mona Diab, Efsun S. Kayi, Noura Farra, Kathleen McKeown

This paper describes the ARIEL-CMU submissions to the Low Resource Human Language Technologies (LoReHLT) 2018 evaluations for the tasks Machine Translation (MT), Entity Discovery and Linking (EDL), and detection of Situation Frames in Text and Speech (SF Text and Speech).

Machine Translation Translation

Paper
Add Code

Team SWEEPer: Joint Sentence Extraction and Fact Checking with Pointer Networks

no code implementations • WS 2018 • Christopher Hidey, Mona Diab

We present experiments on the FEVER (Fact Extraction and VERification) task, a shared task that involves selecting sentences from Wikipedia and predicting whether a claim is supported by those sentences, refuted, or there is not enough information.

Fact Checking Information Retrieval +5

Paper
Add Code

Predictive Linguistic Features of Schizophrenia

no code implementations • SEMEVAL 2017 • Efsun Sarioglu Kayi, Mona Diab, Luca Pauselli, Michael Compton, Glen Coppersmith

As such, we examine the writings of schizophrenia patients analyzing their syntax, semantics and pragmatics.

Paper
Add Code

Emotion Detection and Classification in a Multigenre Corpus with Joint Multi-Task Deep Learning

no code implementations • COLING 2018 • Shabnam Tafreshi, Mona Diab

Detection and classification of emotion categories expressed by a sentence is a challenging task due to subjectivity of emotion.

General Classification Multi-Task Learning +1

Paper
Add Code

Evaluation of Unsupervised Compositional Representations

1 code implementation • COLING 2018 • Hanan Aldarmaki, Mona Diab

We evaluated various compositional models, from bag-of-words representations to compositional RNN-based models, on several extrinsic supervised and unsupervised evaluation benchmarks.

General Classification

Paper
Code

Sentence and Clause Level Emotion Annotation, Detection, and Classification in a Multi-Genre Corpus

no code implementations • LREC 2018 • Shabnam Tafreshi, Mona Diab

Emotion Classification Emotion Recognition +2

Paper
Add Code

Unsupervised Word Mapping Using Structural Similarities in Monolingual Embeddings

no code implementations • TACL 2018 • Hanan Aldarmaki, Mahesh Mohan, Mona Diab

We show empirically that the performance of bilingual correspondents learned using our proposed unsupervised method is comparable to that of using supervised bilingual correspondents from a seed dictionary.

Word Embeddings

Paper
Add Code

Transferring Semantic Roles Using Translation and Syntactic Information

no code implementations • IJCNLP 2017 • Maryam Aminian, Mohammad Sadegh Rasooli, Mona Diab

Our paper addresses the problem of annotation projection for semantic role labeling for resource-poor languages using supervised annotations from a resource-rich language through parallel data.

Semantic Role Labeling Translation +1

Paper
Add Code

SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation

no code implementations • SEMEVAL 2017 • Daniel Cer, Mona Diab, Eneko Agirre, I{\~n}igo Lopez-Gazpio, Lucia Specia

Semantic Textual Similarity (STS) measures the meaning similarity of sentences.

Machine Translation Natural Language Inference +4

Paper
Add Code

GW\_QA at SemEval-2017 Task 3: Question Answer Re-ranking on Arabic Fora

no code implementations • SEMEVAL 2017 • Nada Almarwani, Mona Diab

This paper describes our submission to SemEval-2017 Task 3 Subtask D, {``}Question Answer Ranking in Arabic Community Question Answering{''}.

Answer Selection BIG-bench Machine Learning +2

Paper
Add Code

SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation

3 code implementations • 31 Jul 2017 • Daniel Cer, Mona Diab, Eneko Agirre, Iñigo Lopez-Gazpio, Lucia Specia

Semantic Textual Similarity (STS) measures the meaning similarity of sentences.

Machine Translation Question Answering +3

Paper
Code

Arabic Textual Entailment with Word Embeddings

no code implementations • WS 2017 • Nada Almarwani, Mona Diab

Determining the textual entailment between texts is important in many NLP tasks, such as summarization, question answering, and information extraction and retrieval.

Machine Translation Natural Language Inference +3

Paper
Add Code

A Layered Language Model based Hybrid Approach to Automatic Full Diacritization of Arabic

no code implementations • WS 2017 • Mohamed Al-Badrashiny, Abdelati Hawwari, Mona Diab

In this paper we present a system for automatic Arabic text diacritization using three levels of analysis granularity in a layered back off manner.

Arabic Text Diacritization Language Modelling +3

Paper
Add Code

Processing Dialectal Arabic: Exploiting Variability and Similarity to Overcome Challenges and Discover Opportunities

no code implementations • WS 2016 • Mona Diab

We recently witnessed an exponential growth in dialectal Arabic usage in both textual data and speech recordings especially in social media.

Machine Translation

Paper
Add Code

The GW/LT3 VarDial 2016 Shared Task System for Dialects and Similar Languages Detection

no code implementations • WS 2016 • Ayah Zirikly, Bart Desmet, Mona Diab

This paper describes the GW/LT3 contribution to the 2016 VarDial shared task on the identification of similar languages (task 1) and Arabic dialects (task 2).

Feature Engineering regression +1

Paper
Add Code

LILI: A Simple Language Independent Approach for Language Identification

no code implementations • COLING 2016 • Mohamed Al-Badrashiny, Mona Diab

We introduce a generic Language Independent Framework for Linguistic Code Switch Point Detection.

Language Identification

Paper
Add Code

The Power of Language Music: Arabic Lemmatization through Patterns

no code implementations • WS 2016 • Mohammed Attia, Ayah Zirikly, Mona Diab

The interaction between roots and patterns in Arabic has intrigued lexicographers and morphologists for centuries.

Information Retrieval LEMMA +1

Paper
Add Code

Automatic Verification and Augmentation of Multilingual Lexicons

no code implementations • WS 2016 • Maryam Aminian, Mohamed Al-Badrashiny, Mona Diab

We present an approach for automatic verification and augmentation of multilingual lexica.

Paper
Add Code

SAMER: A Semi-Automatically Created Lexical Resource for Arabic Verbal Multiword Expressions Tokens Paradigm and their Morphosyntactic Features

no code implementations • WS 2016 • Mohamed Al-Badrashiny, Abdelati Hawwari, Mahmoud Ghoneim, Mona Diab

We propose an automated method that identifies the morphological and syntactic flexibility of Arabic Verbal Multiword Expressions (AVMWE).

Machine Translation POS

Paper
Add Code

Using Ambiguity Detection to Streamline Linguistic Annotation

no code implementations • WS 2016 • Wajdi Zaghouani, Abdelati Hawwari, Sawsan Alqahtani, Houda Bouamor, Mahmoud Ghoneim, Mona Diab, Kemal Oflazer

Arabic writing is typically underspecified for short vowels and other markups, referred to as diacritics.

Automatic Speech Recognition (ASR) Machine Translation

Paper
Add Code

The George Washington University System for the Code-Switching Workshop Shared Task 2016

no code implementations • WS 2016 • Mohamed Al-Badrashiny, Mona Diab

Paper
Add Code

Addressing Annotation Complexity: The Case of Annotating Ideological Perspective in Egyptian Social Media

no code implementations • WS 2016 • Heba Elfardy, Mona Diab

Recommendation Systems

Paper
Add Code

CU-GWU Perspective at SemEval-2016 Task 6: Ideological Stance Detection in Informal Text

no code implementations • SEMEVAL 2016 • Heba Elfardy, Mona Diab

Sentiment Analysis Stance Detection

Paper
Add Code

GWU NLP at SemEval-2016 Shared Task 1: Matrix Factorization for Crosslingual STS

no code implementations • SEMEVAL 2016 • Hanan Aldarmaki, Mona Diab

Semantic Textual Similarity STS +1

Paper
Add Code

SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation

no code implementations • SEMEVAL 2016 • Eneko Agirre, Carmen Banea, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Rada Mihalcea, German Rigau, Janyce Wiebe

Machine Translation Natural Language Inference +2

Paper
Add Code

Learning Cross-lingual Representations with Matrix Factorization

no code implementations • WS 2016 • Hanan Aldarmaki, Mona Diab

Cross-Lingual Document Classification Cross-Lingual Semantic Textual Similarity +5

Paper
Add Code

Rumor Identification and Belief Investigation on Twitter

no code implementations • WS 2016 • Sardar Hamidian, Mona Diab

Paper
Add Code

Explicit Fine grained Syntactic and Semantic Annotation of the Idafa Construction in Arabic

no code implementations • LREC 2016 • Abdelati Hawwari, Mohammed Attia, Mahmoud Ghoneim, Mona Diab

Identifying the various types of the Idafa construction (IC) is of importance to Natural Language processing (NLP) applications.

General Classification

Paper
Add Code

SPLIT: Smart Preprocessing (Quasi) Language Independent Tool

no code implementations • LREC 2016 • Mohamed Al-Badrashiny, Arfath Pasha, Mona Diab, Nizar Habash, Owen Rambow, Wael Salloum, Esk, Ramy er

Text preprocessing is an important and necessary task for all NLP applications.

Paper
Add Code

Guidelines and Framework for a Large Scale Arabic Diacritized Corpus

no code implementations • LREC 2016 • Wajdi Zaghouani, Houda Bouamor, Abdelati Hawwari, Mona Diab, Ossama Obeid, Mahmoud Ghoneim, Sawsan Alqahtani, Kemal Oflazer

This paper presents the annotation guidelines developed as part of an effort to create a large scale manually diacritized corpus for various Arabic text genres.

Paper
Add Code

AIDA2: A Hybrid Approach for Token and Sentence Level Dialect Identification in Arabic

no code implementations • CONLL 2015 • Mohamed Al-Badrashiny, Heba Elfardy, Mona Diab

Dialect Identification Sentence +1

Paper
Add Code

GWU-HASP-2015@QALB-2015 Shared Task: Priming Spelling Candidates with Probability

no code implementations • WS 2015 • Mohammed Attia, Mohamed Al-Badrashiny, Mona Diab

Language Modelling

Paper
Add Code

A Pilot Study on Arabic Multi-Genre Corpus Diacritization

no code implementations • WS 2015 • Houda Bouamor, Wajdi Zaghouani, Mona Diab, Ossama Obeid, Kemal Oflazer, Mahmoud Ghoneim, Abdelati Hawwari

Machine Translation Speech Recognition

Paper
Add Code

Robust Part-of-speech Tagging of Arabic Text

no code implementations • WS 2015 • Hanan Aldarmaki, Mona Diab

Machine Translation Morphological Analysis +2

Paper
Add Code

Unsupervised False Friend Disambiguation Using Contextual Word Clusters and Parallel Word Alignments

no code implementations • WS 2015 • Maryam Aminian, Mahmoud Ghoneim, Mona Diab

Machine Translation Semantic Textual Similarity +1

Paper
Add Code

Committed Belief Tagging on the Factbank and LU Corpora: A Comparative Study

no code implementations • WS 2015 • Gregory Werner, Vinodkumar Prabhakaran, Mona Diab, Owen Rambow

Question Answering

Paper
Add Code

Named Entity Recognition for Arabic Social Media

no code implementations • WS 2015 • Ayah Zirikly, Mona Diab

Information Retrieval named-entity-recognition +3

Paper
Add Code

SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability

no code implementations • SEMEVAL 2015 • Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, I{\~n}igo Lopez-Gazpio, Montse Maritxalar, Rada Mihalcea, German Rigau, Larraitz Uria, Janyce Wiebe

Natural Language Inference Question Answering +2

Paper
Add Code

Ideological Perspective Detection Using Semantic Features

no code implementations • SEMEVAL 2015 • Heba Elfardy, Mona Diab, Chris Callison-Burch

Recommendation Systems Sentiment Analysis +1

Paper
Add Code

A New Dataset and Evaluation for Belief/Factuality

no code implementations • SEMEVAL 2015 • Vinodkumar Prabhakaran, Tomas By, Julia Hirschberg, Owen Rambow, Samira Shaikh, Tomek Strzalkowski, Jennifer Tracey, Michael Arrigo, Rupayan Basu, Micah Clark, Adam Dalton, Mona Diab, Louise Guthrie, Anna Prokofieva, Stephanie Strassel, Gregory Werner, Yorick Wilks, Janyce Wiebe

Knowledge Base Population

Paper
Add Code

Statistical modality tagging from rule-based annotations and crowdsourcing

no code implementations • WS 2012 • Vinodkumar Prabhakaran, Michael Bloodgood, Mona Diab, Bonnie Dorr, Lori Levin, Christine D. Piatko, Owen Rambow, Benjamin Van Durme

We explore training an automatic modality tagger.

Paper
Add Code

Named Entity Recognition System for Dialectal Arabic

no code implementations • WS 2014 • Ayah Zirikly, Mona Diab

Information Retrieval named-entity-recognition +3

Paper
Add Code

A Framework for the Classification and Annotation of Multiword Expressions in Dialectal Arabic

no code implementations • WS 2014 • Abdelati Hawwari, Mohammed Attia, Mona Diab

Entity Extraction using GAN General Classification +3

Paper
Add Code

Handling OOV Words in Dialectal Arabic to English Machine Translation

no code implementations • WS 2014 • Maryam Aminian, Mahmoud Ghoneim, Mona Diab

Machine Translation Translation

Paper
Add Code

Overview for the First Shared Task on Language Identification in Code-Switched Data

no code implementations • WS 2014 • Thamar Solorio, Elizabeth Blair, Suraj Maharjan, Steven Bethard, Mona Diab, Mahmoud Ghoneim, Abdelati Hawwari, Fahad AlGhamdi, Julia Hirschberg, Alison Chang, Pascale Fung

Language Identification

Paper
Add Code

AIDA: Identifying Code Switching in Informal Arabic Text

no code implementations • WS 2014 • Heba Elfardy, Mohamed Al-Badrashiny, Mona Diab

Morphological Analysis

Paper
Add Code

GWU-HASP: Hybrid Arabic Spelling and Punctuation Corrector

no code implementations • WS 2014 • Mohammed Attia, Mohamed Al-Badrashiny, Mona Diab

Language Modelling Spelling Correction +1

Paper
Add Code

SemEval-2014 Task 10: Multilingual Semantic Textual Similarity

no code implementations • SEMEVAL 2014 • Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Rada Mihalcea, German Rigau, Janyce Wiebe

Machine Translation Natural Language Inference +2

Paper
Add Code

Fast Tweet Retrieval with Compact Binary Codes

no code implementations • COLING 2014 • Weiwei Guo, Wei Liu, Mona Diab

Information Retrieval Retrieval +1

Paper
Add Code

Sentence Level Dialect Identification for Machine Translation System Selection

no code implementations • ACL 2014 • Wael Salloum, Heba Elfardy, Linda Alamir-Salloum, Nizar Habash, Mona Diab

Dialect Identification Machine Translation +2

Paper
Add Code

SANA: A Large Scale Multi-Genre, Multi-Dialect Lexicon for Arabic Subjectivity and Sentiment Analysis

no code implementations • LREC 2014 • Muhammad Abdul-Mageed, Mona Diab

The computational treatment of subjectivity and sentiment in natural language is usually significantly improved by applying features exploiting lexical resources where entries are tagged with semantic orientation (e. g., positive, negative values).

Arabic Sentiment Analysis Machine Translation

Paper
Add Code

Tharwa: A Large Scale Dialectal Arabic - Standard Arabic - English Lexicon

no code implementations • LREC 2014 • Mona Diab, Mohamed Al-Badrashiny, Maryam Aminian, Mohammed Attia, Heba Elfardy, Nizar Habash, Abdelati Hawwari, Wael Salloum, Pradeep Dasigi, Esk, Ramy er

Multiple levels of quality checks are performed on the output of each step in the creation process.

POS

Paper
Add Code

MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic

no code implementations • LREC 2014 • Arfath Pasha, Mohamed Al-Badrashiny, Mona Diab, Ahmed El Kholy, Esk, Ramy er, Nizar Habash, Manoj Pooleery, Owen Rambow, Ryan Roth

In this paper, we present MADAMIRA, a system for morphological analysis and disambiguation of Arabic that combines some of the best aspects of two previously commonly used systems for Arabic processing, MADA (Habash and Rambow, 2005; Habash et al., 2009; Habash et al., 2013) and AMIRA (Diab et al., 2007).

Chunking Lemmatization +5