no code implementations • AMTA 2016 • Sawsan Alqahtani, Mahmoud Ghoneim, Mona Diab
The absence of these diacritics naturally leads to significant word ambiguity to top the inherent ambiguity present in fully diacritized words.
no code implementations • LREC 2022 • Jennifer Tracey, Owen Rambow, Claire Cardie, Adam Dalton, Hoa Trang Dang, Mona Diab, Bonnie Dorr, Louise Guthrie, Magdalena Markowska, Smaranda Muresan, Vinodkumar Prabhakaran, Samira Shaikh, Tomek Strzalkowski
We present the BeSt corpus, which records cognitive state: who believes what (i. e., factuality), and who has what sentiment towards what.
no code implementations • Findings (ACL) 2022 • A. Bergman, Mona Diab
When building NLP models, there is a tendency to aim for broader coverage, often overlooking cultural and (socio)linguistic nuance.
no code implementations • EMNLP 2020 • Maryam Aminian, Mohammad Sadegh Rasooli, Mona Diab
We describe a method for developing broad-coverage semantic dependency parsers for languages for which no semantically annotated resource is available.
no code implementations • Findings (EMNLP) 2021 • Parsa Farinneya, Mohammad Mahdi Abdollah Pour, Sardar Hamidian, Mona Diab
We discuss the impact of multiple classifiers on a limited amount of annotated data followed by an interactive approach to gradually update the models by adding the least certain samples (LCS) from the pool of unlabeled data.
no code implementations • 16 Dec 2022 • Ping Yu, Tianlu Wang, Olga Golovneva, Badr Alkhamissy, Gargi Ghosh, Mona Diab, Asli Celikyilmaz
Current large language models can perform reasonably well on complex tasks that require step-by-step reasoning with few-shot learning.
no code implementations • 14 Oct 2022 • Yejin Bang, Tiezheng Yu, Andrea Madotto, Zhaojiang Lin, Mona Diab, Pascale Fung
Therefore, we introduce a framework for value-aligned classification that performs prediction based on explicitly written human values in the command.
no code implementations • 4 Oct 2022 • Daniel Simig, Tianlu Wang, Verna Dankers, Peter Henderson, Khuyagbaatar Batsuren, Dieuwke Hupkes, Mona Diab
In NLP, models are usually evaluated by reporting single-number performance scores on a number of readily available benchmarks, without much deeper analysis.
no code implementations • 30 Sep 2022 • Muhammad ElNokrashy, Badr AlKhamissi, Mona Diab
To test this, we propose a new layer fusion method: Depth-Wise Attention (DWAtt), to help re-surface signals from non-final layers.
no code implementations • 25 May 2022 • Badr AlKhamissi, Faisal Ladhak, Srini Iyer, Ves Stoyanov, Zornitsa Kozareva, Xian Li, Pascale Fung, Lambert Mathias, Asli Celikyilmaz, Mona Diab
Hate speech detection is complex; it relies on commonsense reasoning, knowledge of stereotypes, and an understanding of social nuance that differs from one culture to the next.
1 code implementation • NAACL (WNU) 2022 • Pedram Hosseini, Christopher R. Wolfe, Mona Diab, David A. Broniatowski
Decision making theories such as Fuzzy-Trace Theory (FTT) suggest that individuals tend to rely on gist, or bottom-line meaning, in the text when making decisions.
no code implementations • AMTA 2022 • Daniel Licht, Cynthia Gao, Janice Lam, Francisco Guzman, Mona Diab, Philipp Koehn
Obtaining meaningful quality scores for machine translation systems through human evaluation remains a challenge given the high variability between human evaluators, partly due to subjective expectations for translation quality for different language pairs.
no code implementations • OSACT (LREC) 2022 • Badr AlKhamissi, Mona Diab
The tasks are to predict if a tweet contains (1) Offensive language; and whether it is considered (2) Hate Speech or not and if so, then predict the (3) Fine-Grained Hate Speech label from one of six categories.
2 code implementations • 2 May 2022 • Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer
Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning.
Ranked #2 on
Stereotypical Bias Analysis
on CrowS-Pairs
no code implementations • 12 Apr 2022 • Badr AlKhamissi, Millicent Li, Asli Celikyilmaz, Mona Diab, Marjan Ghazvininejad
Recently, there has been a surge of interest in the NLP community on the use of pretrained Language Models (LMs) as Knowledge Bases (KBs).
no code implementations • 19 Feb 2022 • Shuguang Chen, Gustavo Aguilar, Anirudh Srinivasan, Mona Diab, Thamar Solorio
For the unsupervised setting, we provide the following language pairs: English and Spanish-English (Eng-Spanglish), and English and Modern Standard Arabic-Egyptian Arabic (Eng-MSAEA) in both directions.
no code implementations • 25 Jan 2022 • Amal Alqahtani, Efsun Sarioglu Kay, Sardar Hamidian, Michael Compton, Mona Diab
They score lower in most of the linguistic features of cohesion with significant p-values.
no code implementations • 20 Dec 2021 • Mikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giri Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Mona Diab, Zornitsa Kozareva, Ves Stoyanov
This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full-shot fine-tuning.
1 code implementation • 20 Dec 2021 • Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li
Large-scale generative language models such as GPT-3 are competitive few-shot learners.
1 code implementation • CSRR (ACL) 2022 • Pedram Hosseini, David A. Broniatowski, Mona Diab
Previous studies have shown the efficacy of knowledge augmentation methods in pretrained language models.
1 code implementation • 26 Nov 2021 • Peter Hase, Mona Diab, Asli Celikyilmaz, Xian Li, Zornitsa Kozareva, Veselin Stoyanov, Mohit Bansal, Srinivasan Iyer
In this paper, we discuss approaches to detecting when models have beliefs about the world, and we improve on methods for updating model beliefs to be more truthful, with a focus on methods based on learned optimizers or hypernetworks.
1 code implementation • NAACL 2022 • Alexander R. Fabbri, Xiaojian Wu, Srini Iyer, Haoran Li, Mona Diab
One goal of answer summarization is to produce a summary that reflects the range of answer perspectives.
no code implementations • ACL 2021 • Nada Almarwani, Mona Diab
Modern sentence encoders are used to generate dense vector representations that capture the underlying linguistic characteristics for a sequence of words, including phrases, sentences, or paragraphs.
no code implementations • ACL 2021 • Adithya Renduchintala, Denise Diaz, Kenneth Heafield, Xian Li, Mona Diab
Is bias amplified when neural machine translation (NMT) models are optimized for speed and evaluated on generic test sets using BLEU?
1 code implementation • ACL 2021 • Wei-Jen Ko, Ahmed El-Kishky, Adithya Renduchintala, Vishrav Chaudhary, Naman Goyal, Francisco Guzmán, Pascale Fung, Philipp Koehn, Mona Diab
The scarcity of parallel data is a major obstacle for training high-quality machine translation systems for low-resource languages.
no code implementations • 17 Apr 2021 • Alexander R. Fabbri, Xiaojian Wu, Srini Iyer, Mona Diab
A major obstacle for multi-perspective, abstractive answer summarization is the absence of a dataset to provide supervision for producing such summaries.
2 code implementations • 25 Mar 2021 • Pedram Hosseini, David A. Broniatowski, Mona Diab
In this work, we test the performance of two bidirectional transformer-based language models, BERT and SpanBERT, on predicting directionality in causal pairs in the textual content.
no code implementations • 25 Jan 2021 • Thamar Solorio, Mahsa Shafaei, Christos Smailis, Mona Diab, Theodore Giannakopoulos, Heng Ji, Yang Liu, Rada Mihalcea, Smaranda Muresan, Ioannis Kakadiaris
This white paper presents a summary of the discussions regarding critical considerations to develop an extensive repository of online videos annotated with labels indicating questionable content.
1 code implementation • COLING 2020 • Efsun Sarioglu Kayi, Linyong Nan, Bohan Qu, Mona Diab, Kathleen McKeown
We adopt cross-lingual embeddings constructed using different methods to extract features of the tweets, including a few state-of-the-art contextual embeddings such as BERT, RoBERTa and XLM-R. We train classifiers of different architectures on the extracted features.
2 code implementations • Findings (ACL) 2021 • Chunting Zhou, Graham Neubig, Jiatao Gu, Mona Diab, Paco Guzman, Luke Zettlemoyer, Marjan Ghazvininejad
Neural sequence models can generate highly fluent sentences, but recent studies have also shown that they are also prone to hallucinate additional content not supported by the input.
no code implementations • ACL 2020 • Sawsan Alqahtani, Ajay Mishra, Mona Diab
Such diacritics are often omitted in written text, increasing the number of possible pronunciations and meanings for a word.
2 code implementations • ACL 2020 • Esin Durmus, He He, Mona Diab
We tackle the problem of evaluating faithfulness of a generated summary given its source document.
no code implementations • 30 Apr 2020 • Maryam Aminian, Mohammad Sadegh Rasooli, Mona Diab
We make use of supervised syntactic parsing as an auxiliary task in a multitask learning framework, and show that with different multitask learning settings, we consistently improve over the single-task baseline.
1 code implementation • ACL 2020 • Christopher Hidey, Tuhin Chakrabarty, Tariq Alhindi, Siddharth Varia, Kriste Krstovski, Mona Diab, Smaranda Muresan
The increased focus on misinformation has spurred development of data and systems for detecting the veracity of a claim as well as retrieving authoritative evidence.
no code implementations • WS 2020 • Jason Krone, Yi Zhang, Mona Diab
Prototypical networks achieves significant gains in IC performance on the ATIS and TOP datasets, while both prototypical networks and MAML outperform the baseline with respect to SF on all three datasets.
no code implementations • LREC 2020 • Yi-An Lai, Xuan Zhu, Yi Zhang, Mona Diab
Summarizing data samples by quantitative measures has a long history, with descriptive statistics being a case in point.
no code implementations • IJCNLP 2019 • Sawsan Alqahtani, Ajay Mishra, Mona Diab
Diacritic restoration has gained importance with the growing need for machines to understand written texts.
no code implementations • WS 2019 • Sawsan Alqahtani, Hanan Aldarmaki, Mona Diab
Diacritic restoration could theoretically help disambiguate these words, but in practice, the increase in overall sparsity leads to performance degradation in NLP applications.
no code implementations • IJCNLP 2019 • Denis Peskov, Nancy Clarke, Jason Krone, Brigi Fodor, Yi Zhang, Adel Youssef, Mona Diab
With a total of over 81K dialogues harvested across six domains, MultiDoGO is over 8 times the size of MultiWOZ, the other largest comparable dialogue dataset currently available to the public.
1 code implementation • WS 2019 • Or Levi, Pedram Hosseini, Mona Diab, David A. Broniatowski
As avenues for future work, we consider studying additional linguistic features related to the humor aspect, and enriching the data with current news events, to help identify a political or social message.
no code implementations • LREC 2018 • Fahad AlGhamdi, Mona Diab
Data annotation is an important and necessary task for all NLP applications.
no code implementations • LREC 2016 • Mona Diab, Mahmoud Ghoneim, Abdelati Hawwari, Fahad AlGhamdi, Nada Almarwani, Mohamed Al-Badrashiny
We present our effort to create a large Multi-Layered representational repository of Linguistic Code-Switched Arabic data.
no code implementations • WS 2016 • Fahad AlGhamdi, Giovanni Molina, Mona Diab, Thamar Solorio, Abdelati Hawwari, Victor Soto, Julia Hirschberg
We address the problem of Part of Speech tagging (POS) in the context of linguistic code switching (CS).
no code implementations • WS 2016 • Giovanni Molina, Fahad AlGhamdi, Mahmoud Ghoneim, Abdelati Hawwari, Nicolas Rey-Villamizar, Mona Diab, Thamar Solorio
We present an overview of the second shared task on language identification in code-switched data.
no code implementations • IJCNLP 2019 • Arshit Gupta, Peng Zhang, Garima Lalwani, Mona Diab
In this work, we propose a context-aware self-attentive NLU (CASA-NLU) model that uses multiple signals, such as previous intents, slots, dialog acts and utterances over a variable context window, in addition to the current user utterance.
1 code implementation • IJCNLP 2019 • Nada Almarwani, Hanan Aldarmaki, Mona Diab
Vector averaging remains one of the most popular sentence embedding methods in spite of its obvious disregard for syntactic structure.
no code implementations • WS 2018 • Gustavo Aguilar, Fahad AlGhamdi, Victor Soto, Mona Diab, Julia Hirschberg, Thamar Solorio
In the third shared task of the Computational Approaches to Linguistic Code-Switching (CALCS) workshop, we focus on Named Entity Recognition (NER) on code-switched social-media data.
1 code implementation • International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation 2019 • Pedram Hosseini, Mona Diab, David A. Broniatowski
In this paper, we test the hypothesis that causal and semantic coherence are associated with online sharing of misinformative social media content using Coh-Metrix – a widely-used set of psycholinguistic measures.
no code implementations • SEMEVAL 2019 • Shabnam Tafreshi, Mona Diab
Our aim is to build a robust emotion classifier that can generalize emotion detection, which is to learn emotion cues in a noisy training environment.
no code implementations • SEMEVAL 2019 • Sardar Hamidian, Mona Diab
Social media plays a crucial role as the main resource news for information seekers online.
no code implementations • WS 2019 • Fahad AlGhamdi, Mona Diab
In this paper, we address the problem of Part-of-Speech tagging (POS) in the context of linguistic code switching (CS).
no code implementations • 23 May 2019 • Shabnam Tafreshi, Mona Diab
In this paper we present an emotion classifier model submitted to the SemEval-2019 Task 3: EmoContext.
no code implementations • SEMEVAL 2019 • Hanan Aldarmaki, Mona Diab
We develop and investigate several cross-lingual alignment approaches for neural sentence embedding models, such as the supervised inference classifier, InferSent, and sequential encoder-decoder models.
no code implementations • WS 2019 • Maryam Aminian, Mohammad Sadegh Rasooli, Mona Diab
We describe a transfer method based on annotation projection to develop a dependency-based semantic role labeling system for languages for which no supervised linguistic information other than parallel data is available.
1 code implementation • NAACL 2019 • Hanan Aldarmaki, Mona Diab
Cross-lingual word vectors are typically obtained by fitting an orthogonal matrix that maps the entries of a bilingual dictionary from a source to a target vector space.
no code implementations • 24 Feb 2019 • Aditi Chaudhary, Siddharth Dalmia, Junjie Hu, Xinjian Li, Austin Matthews, Aldrian Obaja Muis, Naoki Otani, Shruti Rijhwani, Zaid Sheikh, Nidhi Vyas, Xinyi Wang, Jiateng Xie, Ruochen Xu, Chunting Zhou, Peter J. Jansen, Yiming Yang, Lori Levin, Florian Metze, Teruko Mitamura, David R. Mortensen, Graham Neubig, Eduard Hovy, Alan W. black, Jaime Carbonell, Graham V. Horwood, Shabnam Tafreshi, Mona Diab, Efsun S. Kayi, Noura Farra, Kathleen McKeown
This paper describes the ARIEL-CMU submissions to the Low Resource Human Language Technologies (LoReHLT) 2018 evaluations for the tasks Machine Translation (MT), Entity Discovery and Linking (EDL), and detection of Situation Frames in Text and Speech (SF Text and Speech).
no code implementations • WS 2018 • Christopher Hidey, Mona Diab
We present experiments on the FEVER (Fact Extraction and VERification) task, a shared task that involves selecting sentences from Wikipedia and predicting whether a claim is supported by those sentences, refuted, or there is not enough information.
no code implementations • SEMEVAL 2017 • Efsun Sarioglu Kayi, Mona Diab, Luca Pauselli, Michael Compton, Glen Coppersmith
As such, we examine the writings of schizophrenia patients analyzing their syntax, semantics and pragmatics.
no code implementations • COLING 2018 • Shabnam Tafreshi, Mona Diab
Detection and classification of emotion categories expressed by a sentence is a challenging task due to subjectivity of emotion.
1 code implementation • COLING 2018 • Hanan Aldarmaki, Mona Diab
We evaluated various compositional models, from bag-of-words representations to compositional RNN-based models, on several extrinsic supervised and unsupervised evaluation benchmarks.
no code implementations • TACL 2018 • Hanan Aldarmaki, Mahesh Mohan, Mona Diab
We show empirically that the performance of bilingual correspondents learned using our proposed unsupervised method is comparable to that of using supervised bilingual correspondents from a seed dictionary.
no code implementations • IJCNLP 2017 • Maryam Aminian, Mohammad Sadegh Rasooli, Mona Diab
Our paper addresses the problem of annotation projection for semantic role labeling for resource-poor languages using supervised annotations from a resource-rich language through parallel data.
no code implementations • SEMEVAL 2017 • Nada Almarwani, Mona Diab
This paper describes our submission to SemEval-2017 Task 3 Subtask D, {``}Question Answer Ranking in Arabic Community Question Answering{''}.
no code implementations • SEMEVAL 2017 • Daniel Cer, Mona Diab, Eneko Agirre, I{\~n}igo Lopez-Gazpio, Lucia Specia
Semantic Textual Similarity (STS) measures the meaning similarity of sentences.
3 code implementations • 31 Jul 2017 • Daniel Cer, Mona Diab, Eneko Agirre, Iñigo Lopez-Gazpio, Lucia Specia
Semantic Textual Similarity (STS) measures the meaning similarity of sentences.
no code implementations • WS 2017 • Mohamed Al-Badrashiny, Abdelati Hawwari, Mona Diab
In this paper we present a system for automatic Arabic text diacritization using three levels of analysis granularity in a layered back off manner.
no code implementations • WS 2017 • Nada Almarwani, Mona Diab
Determining the textual entailment between texts is important in many NLP tasks, such as summarization, question answering, and information extraction and retrieval.
no code implementations • COLING 2016 • Mohamed Al-Badrashiny, Mona Diab
We introduce a generic Language Independent Framework for Linguistic Code Switch Point Detection.
no code implementations • WS 2016 • Mohammed Attia, Ayah Zirikly, Mona Diab
The interaction between roots and patterns in Arabic has intrigued lexicographers and morphologists for centuries.
no code implementations • WS 2016 • Wajdi Zaghouani, Abdelati Hawwari, Sawsan Alqahtani, Houda Bouamor, Mahmoud Ghoneim, Mona Diab, Kemal Oflazer
Arabic writing is typically underspecified for short vowels and other markups, referred to as diacritics.
no code implementations • WS 2016 • Ayah Zirikly, Bart Desmet, Mona Diab
This paper describes the GW/LT3 contribution to the 2016 VarDial shared task on the identification of similar languages (task 1) and Arabic dialects (task 2).
no code implementations • WS 2016 • Mona Diab
We recently witnessed an exponential growth in dialectal Arabic usage in both textual data and speech recordings especially in social media.
no code implementations • WS 2016 • Maryam Aminian, Mohamed Al-Badrashiny, Mona Diab
We present an approach for automatic verification and augmentation of multilingual lexica.
no code implementations • WS 2016 • Mohamed Al-Badrashiny, Abdelati Hawwari, Mahmoud Ghoneim, Mona Diab
We propose an automated method that identifies the morphological and syntactic flexibility of Arabic Verbal Multiword Expressions (AVMWE).
no code implementations • LREC 2016 • Abdelati Hawwari, Mohammed Attia, Mahmoud Ghoneim, Mona Diab
Identifying the various types of the Idafa construction (IC) is of importance to Natural Language processing (NLP) applications.
no code implementations • LREC 2016 • Mohamed Al-Badrashiny, Arfath Pasha, Mona Diab, Nizar Habash, Owen Rambow, Wael Salloum, Esk, Ramy er
Text preprocessing is an important and necessary task for all NLP applications.
no code implementations • LREC 2016 • Wajdi Zaghouani, Houda Bouamor, Abdelati Hawwari, Mona Diab, Ossama Obeid, Mahmoud Ghoneim, Sawsan Alqahtani, Kemal Oflazer
This paper presents the annotation guidelines developed as part of an effort to create a large scale manually diacritized corpus for various Arabic text genres.
no code implementations • SEMEVAL 2015 • Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, I{\~n}igo Lopez-Gazpio, Montse Maritxalar, Rada Mihalcea, German Rigau, Larraitz Uria, Janyce Wiebe
no code implementations • SEMEVAL 2015 • Vinodkumar Prabhakaran, Tomas By, Julia Hirschberg, Owen Rambow, Samira Shaikh, Tomek Strzalkowski, Jennifer Tracey, Michael Arrigo, Rupayan Basu, Micah Clark, Adam Dalton, Mona Diab, Louise Guthrie, Anna Prokofieva, Stephanie Strassel, Gregory Werner, Yorick Wilks, Janyce Wiebe
no code implementations • WS 2012 • Vinodkumar Prabhakaran, Michael Bloodgood, Mona Diab, Bonnie Dorr, Lori Levin, Christine D. Piatko, Owen Rambow, Benjamin Van Durme
We explore training an automatic modality tagger.
no code implementations • LREC 2014 • Muhammad Abdul-Mageed, Mona Diab
The computational treatment of subjectivity and sentiment in natural language is usually significantly improved by applying features exploiting lexical resources where entries are tagged with semantic orientation (e. g., positive, negative values).
no code implementations • LREC 2014 • Mona Diab, Mohamed Al-Badrashiny, Maryam Aminian, Mohammed Attia, Heba Elfardy, Nizar Habash, Abdelati Hawwari, Wael Salloum, Pradeep Dasigi, Esk, Ramy er
Multiple levels of quality checks are performed on the output of each step in the creation process.
no code implementations • LREC 2014 • Arfath Pasha, Mohamed Al-Badrashiny, Mona Diab, Ahmed El Kholy, Esk, Ramy er, Nizar Habash, Manoj Pooleery, Owen Rambow, Ryan Roth
In this paper, we present MADAMIRA, a system for morphological analysis and disambiguation of Arabic that combines some of the best aspects of two previously commonly used systems for Arabic processing, MADA (Habash and Rambow, 2005; Habash et al., 2009; Habash et al., 2013) and AMIRA (Diab et al., 2007).
no code implementations • 22 Sep 2013 • Mona Diab, Nizar Habash, Owen Rambow, Ryan Roth
The Linguistic Data Consortium (LDC) has developed hundreds of data corpora for natural language processing (NLP) research.
no code implementations • LREC 2012 • Nizar Habash, Mona Diab, Owen Rambow
Dialectal Arabic (DA) refers to the day-to-day vernaculars spoken in the Arab world.
no code implementations • LREC 2012 • Muhammad Abdul-Mageed, Mona Diab
We present AWATIF, a multi-genre corpus of Modern Standard Arabic (MSA) labeled for subjectivity and sentiment analysis (SSA) at the sentence level.
no code implementations • LREC 2012 • Heba Elfardy, Mona Diab
In this paper, we present a simplified Set of guidelines for detecting code switching in Arabic on the word/token level.
no code implementations • LREC 2012 • Vinodkumar Prabhakaran, Huzaifa Neralwala, Owen Rambow, Mona Diab
In this paper, we describe a multi-layer annotation scheme for social power relations that are recognizable from online written interactions.