Search Results for author: Mona Diab

Found 141 papers, 20 papers with code

Multitask Learning for Cross-Lingual Transfer of Broad-coverage Semantic Dependencies

no code implementations EMNLP 2020 Maryam Aminian, Mohammad Sadegh Rasooli, Mona Diab

We describe a method for developing broad-coverage semantic dependency parsers for languages for which no semantically annotated resource is available.

Cross-Lingual Transfer

Towards Responsible Natural Language Annotation for the Varieties of Arabic

no code implementations Findings (ACL) 2022 A. Bergman, Mona Diab

When building NLP models, there is a tendency to aim for broader coverage, often overlooking cultural and (socio)linguistic nuance.


Active Learning for Rumor Identification on Social Media

no code implementations Findings (EMNLP) 2021 Parsa Farinneya, Mohammad Mahdi Abdollah Pour, Sardar Hamidian, Mona Diab

We discuss the impact of multiple classifiers on a limited amount of annotated data followed by an interactive approach to gradually update the models by adding the least certain samples (LCS) from the pool of unlabeled data.

Active Learning Transfer Learning

Emotion Classification in Low and Moderate Resource Languages

no code implementations28 Feb 2024 Shabnam Tafreshi, Shubham Vatsal, Mona Diab

There are 7100+ active languages spoken around the world and building emotion classification for each language is labor intensive.

Classification Cross-Lingual Transfer +2

Investigating Cultural Alignment of Large Language Models

1 code implementation20 Feb 2024 Badr AlKhamissi, Muhammad ElNokrashy, Mai AlKhamissi, Mona Diab

The intricate relationship between language and culture has long been a subject of exploration within the realm of linguistic anthropology.

Cross-Lingual Transfer

A Note on Bias to Complete

no code implementations18 Feb 2024 Jia Xu, Mona Diab

Minimizing social bias strengthens societal bonds, promoting shared understanding and better decision-making.

Decision Making

Can Large Language Models Infer Causation from Correlation?

1 code implementation9 Jun 2023 Zhijing Jin, Jiarui Liu, Zhiheng Lyu, Spencer Poff, Mrinmaya Sachan, Rada Mihalcea, Mona Diab, Bernhard Schölkopf

In this work, we propose the first benchmark dataset to test the pure causal inference skills of large language models (LLMs).

Causal Inference

OPT-R: Exploring the Role of Explanations in Finetuning and Prompting for Reasoning Skills of Large Language Models

no code implementations19 May 2023 Badr AlKhamissi, Siddharth Verma, Ping Yu, Zhijing Jin, Asli Celikyilmaz, Mona Diab

Our study entails finetuning three different sizes of OPT on a carefully curated reasoning corpus, resulting in two sets of finetuned models: OPT-R, finetuned without explanations, and OPT-RE, finetuned with explanations.

ALERT: Adapting Language Models to Reasoning Tasks

no code implementations16 Dec 2022 Ping Yu, Tianlu Wang, Olga Golovneva, Badr Alkhamissy, Gargi Ghosh, Mona Diab, Asli Celikyilmaz

Current large language models can perform reasonably well on complex tasks that require step-by-step reasoning with few-shot learning.

Few-Shot Learning Language Modelling +1

Enabling Classifiers to Make Judgements Explicitly Aligned with Human Values

no code implementations14 Oct 2022 Yejin Bang, Tiezheng Yu, Andrea Madotto, Zhaojiang Lin, Mona Diab, Pascale Fung

Therefore, we introduce a framework for value-aligned classification that performs prediction based on explicitly written human values in the command.

Classification Few-Shot Learning +1

Text Characterization Toolkit

no code implementations4 Oct 2022 Daniel Simig, Tianlu Wang, Verna Dankers, Peter Henderson, Khuyagbaatar Batsuren, Dieuwke Hupkes, Mona Diab

In NLP, models are usually evaluated by reporting single-number performance scores on a number of readily available benchmarks, without much deeper analysis.

Depth-Wise Attention (DWAtt): A Layer Fusion Method for Data-Efficient Classification

no code implementations30 Sep 2022 Muhammad ElNokrashy, Badr AlKhamissi, Mona Diab

To test this, we propose a new layer fusion method: Depth-Wise Attention (DWAtt), to help re-surface signals from non-final layers.


ToKen: Task Decomposition and Knowledge Infusion for Few-Shot Hate Speech Detection

no code implementations25 May 2022 Badr AlKhamissi, Faisal Ladhak, Srini Iyer, Ves Stoyanov, Zornitsa Kozareva, Xian Li, Pascale Fung, Lambert Mathias, Asli Celikyilmaz, Mona Diab

Hate speech detection is complex; it relies on commonsense reasoning, knowledge of stereotypes, and an understanding of social nuance that differs from one culture to the next.

Cultural Vocal Bursts Intensity Prediction Few-Shot Learning +1

GisPy: A Tool for Measuring Gist Inference Score in Text

1 code implementation NAACL (WNU) 2022 Pedram Hosseini, Christopher R. Wolfe, Mona Diab, David A. Broniatowski

Decision making theories such as Fuzzy-Trace Theory (FTT) suggest that individuals tend to rely on gist, or bottom-line meaning, in the text when making decisions.

Coherence Evaluation Decision Making +1

Consistent Human Evaluation of Machine Translation across Language Pairs

no code implementations AMTA 2022 Daniel Licht, Cynthia Gao, Janice Lam, Francisco Guzman, Mona Diab, Philipp Koehn

Obtaining meaningful quality scores for machine translation systems through human evaluation remains a challenge given the high variability between human evaluators, partly due to subjective expectations for translation quality for different language pairs.

Machine Translation Translation

Meta AI at Arabic Hate Speech 2022: MultiTask Learning with Self-Correction for Hate Speech Classification

no code implementations OSACT (LREC) 2022 Badr AlKhamissi, Mona Diab

The tasks are to predict if a tweet contains (1) Offensive language; and whether it is considered (2) Hate Speech or not and if so, then predict the (3) Fine-Grained Hate Speech label from one of six categories.

Hate Speech Detection

A Review on Language Models as Knowledge Bases

no code implementations12 Apr 2022 Badr AlKhamissi, Millicent Li, Asli Celikyilmaz, Mona Diab, Marjan Ghazvininejad

Recently, there has been a surge of interest in the NLP community on the use of pretrained Language Models (LMs) as Knowledge Bases (KBs).

CALCS 2021 Shared Task: Machine Translation for Code-Switched Data

no code implementations19 Feb 2022 Shuguang Chen, Gustavo Aguilar, Anirudh Srinivasan, Mona Diab, Thamar Solorio

For the unsupervised setting, we provide the following language pairs: English and Spanish-English (Eng-Spanglish), and English and Modern Standard Arabic-Egyptian Arabic (Eng-MSAEA) in both directions.

Language Identification Machine Translation +3

Efficient Large Scale Language Modeling with Mixtures of Experts

no code implementations20 Dec 2021 Mikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giri Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Mona Diab, Zornitsa Kozareva, Ves Stoyanov

This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full-shot fine-tuning.

Language Modelling

Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs

1 code implementation26 Nov 2021 Peter Hase, Mona Diab, Asli Celikyilmaz, Xian Li, Zornitsa Kozareva, Veselin Stoyanov, Mohit Bansal, Srinivasan Iyer

In this paper, we discuss approaches to detecting when models have beliefs about the world, and we improve on methods for updating model beliefs to be more truthful, with a focus on methods based on learned optimizers or hypernetworks.

Discrete Cosine Transform as Universal Sentence Encoder

no code implementations ACL 2021 Nada Almarwani, Mona Diab

Modern sentence encoders are used to generate dense vector representations that capture the underlying linguistic characteristics for a sequence of words, including phrases, sentences, or paragraphs.

Question Answering Sentence +3

Multi-Perspective Abstractive Answer Summarization

no code implementations17 Apr 2021 Alexander R. Fabbri, Xiaojian Wu, Srini Iyer, Mona Diab

A major obstacle for multi-perspective, abstractive answer summarization is the absence of a dataset to provide supervision for producing such summaries.

Community Question Answering Sentence

Predicting Directionality in Causal Relations in Text

2 code implementations25 Mar 2021 Pedram Hosseini, David A. Broniatowski, Mona Diab

In this work, we test the performance of two bidirectional transformer-based language models, BERT and SpanBERT, on predicting directionality in causal pairs in the textual content.


White Paper: Challenges and Considerations for the Creation of a Large Labelled Repository of Online Videos with Questionable Content

no code implementations25 Jan 2021 Thamar Solorio, Mahsa Shafaei, Christos Smailis, Mona Diab, Theodore Giannakopoulos, Heng Ji, Yang Liu, Rada Mihalcea, Smaranda Muresan, Ioannis Kakadiaris

This white paper presents a summary of the discussions regarding critical considerations to develop an extensive repository of online videos annotated with labels indicating questionable content.

Detecting Urgency Status of Crisis Tweets: A Transfer Learning Approach for Low Resource Languages

1 code implementation COLING 2020 Efsun Sarioglu Kayi, Linyong Nan, Bohan Qu, Mona Diab, Kathleen McKeown

We adopt cross-lingual embeddings constructed using different methods to extract features of the tweets, including a few state-of-the-art contextual embeddings such as BERT, RoBERTa and XLM-R. We train classifiers of different architectures on the extracted features.

Transfer Learning XLM-R

Detecting Hallucinated Content in Conditional Neural Sequence Generation

2 code implementations Findings (ACL) 2021 Chunting Zhou, Graham Neubig, Jiatao Gu, Mona Diab, Paco Guzman, Luke Zettlemoyer, Marjan Ghazvininejad

Neural sequence models can generate highly fluent sentences, but recent studies have also shown that they are also prone to hallucinate additional content not supported by the input.

Abstractive Text Summarization Hallucination +1

A Multitask Learning Approach for Diacritic Restoration

no code implementations ACL 2020 Sawsan Alqahtani, Ajay Mishra, Mona Diab

Such diacritics are often omitted in written text, increasing the number of possible pronunciations and meanings for a word.

Multi-Task Learning Part-Of-Speech Tagging

Mutlitask Learning for Cross-Lingual Transfer of Semantic Dependencies

no code implementations30 Apr 2020 Maryam Aminian, Mohammad Sadegh Rasooli, Mona Diab

We make use of supervised syntactic parsing as an auxiliary task in a multitask learning framework, and show that with different multitask learning settings, we consistently improve over the single-task baseline.

Cross-Lingual Transfer

DeSePtion: Dual Sequence Prediction and Adversarial Examples for Improved Fact-Checking

1 code implementation ACL 2020 Christopher Hidey, Tuhin Chakrabarty, Tariq Alhindi, Siddharth Varia, Kriste Krstovski, Mona Diab, Smaranda Muresan

The increased focus on misinformation has spurred development of data and systems for detecting the veracity of a claim as well as retrieving authoritative evidence.

Fact Checking Misinformation +1

Learning to Classify Intents and Slot Labels Given a Handful of Examples

no code implementations WS 2020 Jason Krone, Yi Zhang, Mona Diab

Prototypical networks achieves significant gains in IC performance on the ATIS and TOP datasets, while both prototypical networks and MAML outperform the baseline with respect to SF on all three datasets.

Few-Shot Learning Goal-Oriented Dialogue Systems +4

Efficient Convolutional Neural Networks for Diacritic Restoration

no code implementations IJCNLP 2019 Sawsan Alqahtani, Ajay Mishra, Mona Diab

Diacritic restoration has gained importance with the growing need for machines to understand written texts.

Homograph Disambiguation Through Selective Diacritic Restoration

no code implementations WS 2019 Sawsan Alqahtani, Hanan Aldarmaki, Mona Diab

Diacritic restoration could theoretically help disambiguate these words, but in practice, the increase in overall sparsity leads to performance degradation in NLP applications.

Machine Translation Part-Of-Speech Tagging +2

Multi-Domain Goal-Oriented Dialogues (MultiDoGO): Strategies toward Curating and Annotating Large Scale Dialogue Data

no code implementations IJCNLP 2019 Denis Peskov, Nancy Clarke, Jason Krone, Brigi Fodor, Yi Zhang, Adel Youssef, Mona Diab

With a total of over 81K dialogues harvested across six domains, MultiDoGO is over 8 times the size of MultiWOZ, the other largest comparable dialogue dataset currently available to the public.


Identifying Nuances in Fake News vs. Satire: Using Semantic and Linguistic Cues

1 code implementation WS 2019 Or Levi, Pedram Hosseini, Mona Diab, David A. Broniatowski

As avenues for future work, we consider studying additional linguistic features related to the humor aspect, and enriching the data with current news events, to help identify a political or social message.

Language Modelling Misinformation

CASA-NLU: Context-Aware Self-Attentive Natural Language Understanding for Task-Oriented Chatbots

no code implementations IJCNLP 2019 Arshit Gupta, Peng Zhang, Garima Lalwani, Mona Diab

In this work, we propose a context-aware self-attentive NLU (CASA-NLU) model that uses multiple signals, such as previous intents, slots, dialog acts and utterances over a variable context window, in addition to the current user utterance.

Dialogue Management intent-classification +3

Efficient Sentence Embedding using Discrete Cosine Transform

1 code implementation IJCNLP 2019 Nada Almarwani, Hanan Aldarmaki, Mona Diab

Vector averaging remains one of the most popular sentence embedding methods in spite of its obvious disregard for syntactic structure.

Classification General Classification +3

Named Entity Recognition on Code-Switched Data: Overview of the CALCS 2018 Shared Task

no code implementations WS 2018 Gustavo Aguilar, Fahad AlGhamdi, Victor Soto, Mona Diab, Julia Hirschberg, Thamar Solorio

In the third shared task of the Computational Approaches to Linguistic Code-Switching (CALCS) workshop, we focus on Named Entity Recognition (NER) on code-switched social-media data.

named-entity-recognition Named Entity Recognition +2

Does Causal Coherence Predict Online Spread of Social Media?

1 code implementation International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation 2019 Pedram Hosseini, Mona Diab, David A. Broniatowski

In this paper, we test the hypothesis that causal and semantic coherence are associated with online sharing of misinformative social media content using Coh-Metrix – a widely-used set of psycholinguistic measures.

Decision Making Misinformation

Leveraging Pretrained Word Embeddings for Part-of-Speech Tagging of Code Switching Data

no code implementations WS 2019 Fahad AlGhamdi, Mona Diab

In this paper, we address the problem of Part-of-Speech tagging (POS) in the context of linguistic code switching (CS).

Part-Of-Speech Tagging POS +3

Scalable Cross-Lingual Transfer of Neural Sentence Embeddings

no code implementations SEMEVAL 2019 Hanan Aldarmaki, Mona Diab

We develop and investigate several cross-lingual alignment approaches for neural sentence embedding models, such as the supervised inference classifier, InferSent, and sequential encoder-decoder models.

Cross-Lingual Transfer Sentence +3

Cross-Lingual Transfer of Semantic Roles: From Raw Text to Semantic Roles

no code implementations WS 2019 Maryam Aminian, Mohammad Sadegh Rasooli, Mona Diab

We describe a transfer method based on annotation projection to develop a dependency-based semantic role labeling system for languages for which no supervised linguistic information other than parallel data is available.

Cross-Lingual Transfer Semantic Role Labeling

Context-Aware Cross-Lingual Mapping

1 code implementation NAACL 2019 Hanan Aldarmaki, Mona Diab

Cross-lingual word vectors are typically obtained by fitting an orthogonal matrix that maps the entries of a bilingual dictionary from a source to a target vector space.

Retrieval Sentence +4

The ARIEL-CMU Systems for LoReHLT18

no code implementations24 Feb 2019 Aditi Chaudhary, Siddharth Dalmia, Junjie Hu, Xinjian Li, Austin Matthews, Aldrian Obaja Muis, Naoki Otani, Shruti Rijhwani, Zaid Sheikh, Nidhi Vyas, Xinyi Wang, Jiateng Xie, Ruochen Xu, Chunting Zhou, Peter J. Jansen, Yiming Yang, Lori Levin, Florian Metze, Teruko Mitamura, David R. Mortensen, Graham Neubig, Eduard Hovy, Alan W. black, Jaime Carbonell, Graham V. Horwood, Shabnam Tafreshi, Mona Diab, Efsun S. Kayi, Noura Farra, Kathleen McKeown

This paper describes the ARIEL-CMU submissions to the Low Resource Human Language Technologies (LoReHLT) 2018 evaluations for the tasks Machine Translation (MT), Entity Discovery and Linking (EDL), and detection of Situation Frames in Text and Speech (SF Text and Speech).

Machine Translation Translation

Team SWEEPer: Joint Sentence Extraction and Fact Checking with Pointer Networks

no code implementations WS 2018 Christopher Hidey, Mona Diab

We present experiments on the FEVER (Fact Extraction and VERification) task, a shared task that involves selecting sentences from Wikipedia and predicting whether a claim is supported by those sentences, refuted, or there is not enough information.

Fact Checking Information Retrieval +5

Predictive Linguistic Features of Schizophrenia

no code implementations SEMEVAL 2017 Efsun Sarioglu Kayi, Mona Diab, Luca Pauselli, Michael Compton, Glen Coppersmith

As such, we examine the writings of schizophrenia patients analyzing their syntax, semantics and pragmatics.

Evaluation of Unsupervised Compositional Representations

1 code implementation COLING 2018 Hanan Aldarmaki, Mona Diab

We evaluated various compositional models, from bag-of-words representations to compositional RNN-based models, on several extrinsic supervised and unsupervised evaluation benchmarks.

General Classification

Unsupervised Word Mapping Using Structural Similarities in Monolingual Embeddings

no code implementations TACL 2018 Hanan Aldarmaki, Mahesh Mohan, Mona Diab

We show empirically that the performance of bilingual correspondents learned using our proposed unsupervised method is comparable to that of using supervised bilingual correspondents from a seed dictionary.

Word Embeddings

Transferring Semantic Roles Using Translation and Syntactic Information

no code implementations IJCNLP 2017 Maryam Aminian, Mohammad Sadegh Rasooli, Mona Diab

Our paper addresses the problem of annotation projection for semantic role labeling for resource-poor languages using supervised annotations from a resource-rich language through parallel data.

Semantic Role Labeling Translation +1

GW\_QA at SemEval-2017 Task 3: Question Answer Re-ranking on Arabic Fora

no code implementations SEMEVAL 2017 Nada Almarwani, Mona Diab

This paper describes our submission to SemEval-2017 Task 3 Subtask D, {``}Question Answer Ranking in Arabic Community Question Answering{''}.

Answer Selection BIG-bench Machine Learning +2

A Layered Language Model based Hybrid Approach to Automatic Full Diacritization of Arabic

no code implementations WS 2017 Mohamed Al-Badrashiny, Abdelati Hawwari, Mona Diab

In this paper we present a system for automatic Arabic text diacritization using three levels of analysis granularity in a layered back off manner.

Arabic Text Diacritization Language Modelling +3

Arabic Textual Entailment with Word Embeddings

no code implementations WS 2017 Nada Almarwani, Mona Diab

Determining the textual entailment between texts is important in many NLP tasks, such as summarization, question answering, and information extraction and retrieval.

Machine Translation Natural Language Inference +3

The Power of Language Music: Arabic Lemmatization through Patterns

no code implementations WS 2016 Mohammed Attia, Ayah Zirikly, Mona Diab

The interaction between roots and patterns in Arabic has intrigued lexicographers and morphologists for centuries.

Information Retrieval LEMMA +1

Automatic Verification and Augmentation of Multilingual Lexicons

no code implementations WS 2016 Maryam Aminian, Mohamed Al-Badrashiny, Mona Diab

We present an approach for automatic verification and augmentation of multilingual lexica.

Processing Dialectal Arabic: Exploiting Variability and Similarity to Overcome Challenges and Discover Opportunities

no code implementations WS 2016 Mona Diab

We recently witnessed an exponential growth in dialectal Arabic usage in both textual data and speech recordings especially in social media.

Machine Translation

The GW/LT3 VarDial 2016 Shared Task System for Dialects and Similar Languages Detection

no code implementations WS 2016 Ayah Zirikly, Bart Desmet, Mona Diab

This paper describes the GW/LT3 contribution to the 2016 VarDial shared task on the identification of similar languages (task 1) and Arabic dialects (task 2).

Feature Engineering regression +1

Guidelines and Framework for a Large Scale Arabic Diacritized Corpus

no code implementations LREC 2016 Wajdi Zaghouani, Houda Bouamor, Abdelati Hawwari, Mona Diab, Ossama Obeid, Mahmoud Ghoneim, Sawsan Alqahtani, Kemal Oflazer

This paper presents the annotation guidelines developed as part of an effort to create a large scale manually diacritized corpus for various Arabic text genres.

SANA: A Large Scale Multi-Genre, Multi-Dialect Lexicon for Arabic Subjectivity and Sentiment Analysis

no code implementations LREC 2014 Muhammad Abdul-Mageed, Mona Diab

The computational treatment of subjectivity and sentiment in natural language is usually significantly improved by applying features exploiting lexical resources where entries are tagged with semantic orientation (e. g., positive, negative values).

Arabic Sentiment Analysis Machine Translation

MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic

no code implementations LREC 2014 Arfath Pasha, Mohamed Al-Badrashiny, Mona Diab, Ahmed El Kholy, Esk, Ramy er, Nizar Habash, Manoj Pooleery, Owen Rambow, Ryan Roth

In this paper, we present MADAMIRA, a system for morphological analysis and disambiguation of Arabic that combines some of the best aspects of two previously commonly used systems for Arabic processing, MADA (Habash and Rambow, 2005; Habash et al., 2009; Habash et al., 2013) and AMIRA (Diab et al., 2007).

Chunking Lemmatization +5

LDC Arabic Treebanks and Associated Corpora: Data Divisions Manual

no code implementations22 Sep 2013 Mona Diab, Nizar Habash, Owen Rambow, Ryan Roth

The Linguistic Data Consortium (LDC) has developed hundreds of data corpora for natural language processing (NLP) research.

Annotations for Power Relations on Email Threads

no code implementations LREC 2012 Vinodkumar Prabhakaran, Huzaifa Neralwala, Owen Rambow, Mona Diab

In this paper, we describe a multi-layer annotation scheme for social power relations that are recognizable from online written interactions.

Simplified guidelines for the creation of Large Scale Dialectal Arabic Annotations

no code implementations LREC 2012 Heba Elfardy, Mona Diab

In this paper, we present a simplified Set of guidelines for detecting code switching in Arabic on the word/token level.

Speech Recognition

AWATIF: A Multi-Genre Corpus for Modern Standard Arabic Subjectivity and Sentiment Analysis

no code implementations LREC 2012 Muhammad Abdul-Mageed, Mona Diab

We present AWATIF, a multi-genre corpus of Modern Standard Arabic (MSA) labeled for subjectivity and sentiment analysis (SSA) at the sentence level.

Opinion Mining Sentence +1

Cannot find the paper you are looking for? You can Submit a new open access paper.