no code implementations • AMTA 2016 • Sawsan Alqahtani, Mahmoud Ghoneim, Mona Diab
The absence of these diacritics naturally leads to significant word ambiguity to top the inherent ambiguity present in fully diacritized words.
no code implementations • LREC 2022 • Jennifer Tracey, Owen Rambow, Claire Cardie, Adam Dalton, Hoa Trang Dang, Mona Diab, Bonnie Dorr, Louise Guthrie, Magdalena Markowska, Smaranda Muresan, Vinodkumar Prabhakaran, Samira Shaikh, Tomek Strzalkowski
We present the BeSt corpus, which records cognitive state: who believes what (i. e., factuality), and who has what sentiment towards what.
no code implementations • Findings (EMNLP) 2021 • Parsa Farinneya, Mohammad Mahdi Abdollah Pour, Sardar Hamidian, Mona Diab
We discuss the impact of multiple classifiers on a limited amount of annotated data followed by an interactive approach to gradually update the models by adding the least certain samples (LCS) from the pool of unlabeled data.
no code implementations • EMNLP 2020 • Maryam Aminian, Mohammad Sadegh Rasooli, Mona Diab
We describe a method for developing broad-coverage semantic dependency parsers for languages for which no semantically annotated resource is available.
no code implementations • Findings (ACL) 2022 • A. Bergman, Mona Diab
When building NLP models, there is a tendency to aim for broader coverage, often overlooking cultural and (socio)linguistic nuance.
no code implementations • 11 Apr 2025 • Aashiq Muhamed, Jacopo Bonato, Mona Diab, Virginia Smith
Machine unlearning is a promising approach to improve LLM safety by removing unwanted knowledge from the model.
no code implementations • 2 Apr 2025 • Aashiq Muhamed, Mona Diab, Virginia Smith
We introduce CoRAG, a framework extending RAG to collaborative settings, where clients jointly train a shared model using a collaborative passage store.
1 code implementation • 11 Feb 2025 • Kshitish Ghate, Isaac Slaughter, Kyra Wilson, Mona Diab, Aylin Caliskan
Studying 131 unique CLIP models, trained on 26 datasets, using 55 architectures, and in a variety of sizes, we evaluate bias in each model using 26 well-established unimodal and cross-modal principled Embedding Association Tests.
1 code implementation • 24 Dec 2024 • Jiarui Liu, Iman Ouzzani, Wenkai Li, Lechen Zhang, Tianyue Ou, Houda Bouamor, Zhijing Jin, Mona Diab
This work aims to address critical gaps in AI terminology resources and fosters global inclusivity and collaboration in AI research.
no code implementations • 1 Nov 2024 • Aashiq Muhamed, Mona Diab, Virginia Smith
Understanding and mitigating the potential risks associated with foundation models (FMs) hinges on developing effective interpretability methods.
no code implementations • 21 Oct 2024 • Wenkai Li, Jiarui Liu, Andy Liu, Xuhui Zhou, Mona Diab, Maarten Sap
In this work, we tackle the challenge of embedding realistic human personality traits into LLMs.
no code implementations • 25 Jul 2024 • Wajdi Zaghouani, Mustafa Jarrar, Nizar Habash, Houda Bouamor, Imed Zitouni, Mona Diab, Samhaa R. El-Beltagy, Muhammed AbuOdeh
The shared task addresses bias and propaganda annotation in multilingual news posts.
1 code implementation • 25 Jun 2024 • Aashiq Muhamed, Oscar Li, David Woodruff, Mona Diab, Virginia Smith
Large language model (LLM) training and finetuning are often bottlenecked by limited GPU memory.
1 code implementation • 30 May 2024 • Andy Liu, Mona Diab, Daniel Fried
Models that we evaluate that are fine-tuned with Reinforcement Learning from Human Feedback (RLHF) are more steerable, especially towards stances associated with political liberals and women, but present significantly less diverse views of personas.
1 code implementation • 10 May 2024 • Jiarui Liu, Wenkai Li, Zhijing Jin, Mona Diab
In an era of model and data proliferation in machine learning/AI especially marked by the rapid advancement of open-sourced technologies, there arises a critical need for standardized consistent documentation.
1 code implementation • 2 May 2024 • Zhijing Jin, Yuen Chen, Fernando Gonzalez, Jiarui Liu, Jiayi Zhang, Julian Michael, Bernhard Schölkopf, Mona Diab
We find that it is difficult to predict which input examples AMR may help or hurt on, but errors tend to arise with multi-word expressions, named entities, and in the final inference step where the LLM must connect its reasoning over the AMR to its prediction.
no code implementations • 28 Feb 2024 • Shabnam Tafreshi, Shubham Vatsal, Mona Diab
There are 7100+ active languages spoken around the world and building emotion classification for each language is labor intensive.
1 code implementation • 20 Feb 2024 • Badr AlKhamissi, Muhammad ElNokrashy, Mai AlKhamissi, Mona Diab
The intricate relationship between language and culture has long been a subject of exploration within the realm of linguistic anthropology.
no code implementations • 18 Feb 2024 • Jia Xu, Mona Diab
Minimizing social bias strengthens societal bonds, promoting shared understanding and better decision-making.
1 code implementation • 9 Jun 2023 • Zhijing Jin, Jiarui Liu, Zhiheng Lyu, Spencer Poff, Mrinmaya Sachan, Rada Mihalcea, Mona Diab, Bernhard Schölkopf
In this work, we propose the first benchmark dataset to test the pure causal inference skills of large language models (LLMs).
no code implementations • 19 May 2023 • Badr AlKhamissi, Siddharth Verma, Ping Yu, Zhijing Jin, Asli Celikyilmaz, Mona Diab
Our study entails finetuning three different sizes of OPT on a carefully curated reasoning corpus, resulting in two sets of finetuned models: OPT-R, finetuned without explanations, and OPT-RE, finetuned with explanations.
no code implementations • 16 Dec 2022 • Ping Yu, Tianlu Wang, Olga Golovneva, Badr Alkhamissy, Gargi Ghosh, Mona Diab, Asli Celikyilmaz
Current large language models can perform reasonably well on complex tasks that require step-by-step reasoning with few-shot learning.
no code implementations • 14 Oct 2022 • Yejin Bang, Tiezheng Yu, Andrea Madotto, Zhaojiang Lin, Mona Diab, Pascale Fung
Therefore, we introduce a framework for value-aligned classification that performs prediction based on explicitly written human values in the command.
no code implementations • 4 Oct 2022 • Daniel Simig, Tianlu Wang, Verna Dankers, Peter Henderson, Khuyagbaatar Batsuren, Dieuwke Hupkes, Mona Diab
In NLP, models are usually evaluated by reporting single-number performance scores on a number of readily available benchmarks, without much deeper analysis.
no code implementations • 30 Sep 2022 • Muhammad ElNokrashy, Badr AlKhamissi, Mona Diab
To test this, we propose a new layer fusion method: Depth-Wise Attention (DWAtt), to help re-surface signals from non-final layers.
no code implementations • 25 May 2022 • Badr AlKhamissi, Faisal Ladhak, Srini Iyer, Ves Stoyanov, Zornitsa Kozareva, Xian Li, Pascale Fung, Lambert Mathias, Asli Celikyilmaz, Mona Diab
Hate speech detection is complex; it relies on commonsense reasoning, knowledge of stereotypes, and an understanding of social nuance that differs from one culture to the next.
Cultural Vocal Bursts Intensity Prediction
Few-Shot Learning
+1
1 code implementation • NAACL (WNU) 2022 • Pedram Hosseini, Christopher R. Wolfe, Mona Diab, David A. Broniatowski
Decision making theories such as Fuzzy-Trace Theory (FTT) suggest that individuals tend to rely on gist, or bottom-line meaning, in the text when making decisions.
no code implementations • AMTA 2022 • Daniel Licht, Cynthia Gao, Janice Lam, Francisco Guzman, Mona Diab, Philipp Koehn
Obtaining meaningful quality scores for machine translation systems through human evaluation remains a challenge given the high variability between human evaluators, partly due to subjective expectations for translation quality for different language pairs.
no code implementations • OSACT (LREC) 2022 • Badr AlKhamissi, Mona Diab
The tasks are to predict if a tweet contains (1) Offensive language; and whether it is considered (2) Hate Speech or not and if so, then predict the (3) Fine-Grained Hate Speech label from one of six categories.
11 code implementations • 2 May 2022 • Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer
Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning.
Ranked #2 on
Stereotypical Bias Analysis
on CrowS-Pairs
no code implementations • 12 Apr 2022 • Badr AlKhamissi, Millicent Li, Asli Celikyilmaz, Mona Diab, Marjan Ghazvininejad
Recently, there has been a surge of interest in the NLP community on the use of pretrained Language Models (LMs) as Knowledge Bases (KBs).
no code implementations • 19 Feb 2022 • Shuguang Chen, Gustavo Aguilar, Anirudh Srinivasan, Mona Diab, Thamar Solorio
For the unsupervised setting, we provide the following language pairs: English and Spanish-English (Eng-Spanglish), and English and Modern Standard Arabic-Egyptian Arabic (Eng-MSAEA) in both directions.
no code implementations • 25 Jan 2022 • Amal Alqahtani, Efsun Sarioglu Kay, Sardar Hamidian, Michael Compton, Mona Diab
They score lower in most of the linguistic features of cohesion with significant p-values.
2 code implementations • 20 Dec 2021 • Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li
Large-scale generative language models such as GPT-3 are competitive few-shot learners.
no code implementations • 20 Dec 2021 • Mikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giri Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Mona Diab, Zornitsa Kozareva, Ves Stoyanov
This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full-shot fine-tuning.
1 code implementation • CSRR (ACL) 2022 • Pedram Hosseini, David A. Broniatowski, Mona Diab
Previous studies have shown the efficacy of knowledge augmentation methods in pretrained language models.
1 code implementation • 26 Nov 2021 • Peter Hase, Mona Diab, Asli Celikyilmaz, Xian Li, Zornitsa Kozareva, Veselin Stoyanov, Mohit Bansal, Srinivasan Iyer
In this paper, we discuss approaches to detecting when models have beliefs about the world, and we improve on methods for updating model beliefs to be more truthful, with a focus on methods based on learned optimizers or hypernetworks.
1 code implementation • NAACL 2022 • Alexander R. Fabbri, Xiaojian Wu, Srini Iyer, Haoran Li, Mona Diab
One goal of answer summarization is to produce a summary that reflects the range of answer perspectives.
no code implementations • ACL 2021 • Nada Almarwani, Mona Diab
Modern sentence encoders are used to generate dense vector representations that capture the underlying linguistic characteristics for a sequence of words, including phrases, sentences, or paragraphs.
no code implementations • ACL 2021 • Adithya Renduchintala, Denise Diaz, Kenneth Heafield, Xian Li, Mona Diab
Is bias amplified when neural machine translation (NMT) models are optimized for speed and evaluated on generic test sets using BLEU?
1 code implementation • ACL 2021 • Wei-Jen Ko, Ahmed El-Kishky, Adithya Renduchintala, Vishrav Chaudhary, Naman Goyal, Francisco Guzmán, Pascale Fung, Philipp Koehn, Mona Diab
The scarcity of parallel data is a major obstacle for training high-quality machine translation systems for low-resource languages.
no code implementations • 17 Apr 2021 • Alexander R. Fabbri, Xiaojian Wu, Srini Iyer, Mona Diab
A major obstacle for multi-perspective, abstractive answer summarization is the absence of a dataset to provide supervision for producing such summaries.
2 code implementations • 25 Mar 2021 • Pedram Hosseini, David A. Broniatowski, Mona Diab
In this work, we test the performance of two bidirectional transformer-based language models, BERT and SpanBERT, on predicting directionality in causal pairs in the textual content.
no code implementations • 25 Jan 2021 • Thamar Solorio, Mahsa Shafaei, Christos Smailis, Mona Diab, Theodore Giannakopoulos, Heng Ji, Yang Liu, Rada Mihalcea, Smaranda Muresan, Ioannis Kakadiaris
This white paper presents a summary of the discussions regarding critical considerations to develop an extensive repository of online videos annotated with labels indicating questionable content.
1 code implementation • COLING 2020 • Efsun Sarioglu Kayi, Linyong Nan, Bohan Qu, Mona Diab, Kathleen McKeown
We adopt cross-lingual embeddings constructed using different methods to extract features of the tweets, including a few state-of-the-art contextual embeddings such as BERT, RoBERTa and XLM-R. We train classifiers of different architectures on the extracted features.
2 code implementations • Findings (ACL) 2021 • Chunting Zhou, Graham Neubig, Jiatao Gu, Mona Diab, Paco Guzman, Luke Zettlemoyer, Marjan Ghazvininejad
Neural sequence models can generate highly fluent sentences, but recent studies have also shown that they are also prone to hallucinate additional content not supported by the input.
no code implementations • ACL 2020 • Sawsan Alqahtani, Ajay Mishra, Mona Diab
Such diacritics are often omitted in written text, increasing the number of possible pronunciations and meanings for a word.
2 code implementations • ACL 2020 • Esin Durmus, He He, Mona Diab
We tackle the problem of evaluating faithfulness of a generated summary given its source document.
no code implementations • 30 Apr 2020 • Maryam Aminian, Mohammad Sadegh Rasooli, Mona Diab
We make use of supervised syntactic parsing as an auxiliary task in a multitask learning framework, and show that with different multitask learning settings, we consistently improve over the single-task baseline.
1 code implementation • ACL 2020 • Christopher Hidey, Tuhin Chakrabarty, Tariq Alhindi, Siddharth Varia, Kriste Krstovski, Mona Diab, Smaranda Muresan
The increased focus on misinformation has spurred development of data and systems for detecting the veracity of a claim as well as retrieving authoritative evidence.
no code implementations • WS 2020 • Jason Krone, Yi Zhang, Mona Diab
Prototypical networks achieves significant gains in IC performance on the ATIS and TOP datasets, while both prototypical networks and MAML outperform the baseline with respect to SF on all three datasets.
no code implementations • LREC 2020 • Yi-An Lai, Xuan Zhu, Yi Zhang, Mona Diab
Summarizing data samples by quantitative measures has a long history, with descriptive statistics being a case in point.
no code implementations • IJCNLP 2019 • Sawsan Alqahtani, Ajay Mishra, Mona Diab
Diacritic restoration has gained importance with the growing need for machines to understand written texts.
no code implementations • WS 2019 • Sawsan Alqahtani, Hanan Aldarmaki, Mona Diab
Diacritic restoration could theoretically help disambiguate these words, but in practice, the increase in overall sparsity leads to performance degradation in NLP applications.
no code implementations • IJCNLP 2019 • Denis Peskov, Nancy Clarke, Jason Krone, Brigi Fodor, Yi Zhang, Adel Youssef, Mona Diab
With a total of over 81K dialogues harvested across six domains, MultiDoGO is over 8 times the size of MultiWOZ, the other largest comparable dialogue dataset currently available to the public.
1 code implementation • WS 2019 • Or Levi, Pedram Hosseini, Mona Diab, David A. Broniatowski
As avenues for future work, we consider studying additional linguistic features related to the humor aspect, and enriching the data with current news events, to help identify a political or social message.
no code implementations • LREC 2016 • Mona Diab, Mahmoud Ghoneim, Abdelati Hawwari, Fahad AlGhamdi, Nada Almarwani, Mohamed Al-Badrashiny
We present our effort to create a large Multi-Layered representational repository of Linguistic Code-Switched Arabic data.
no code implementations • WS 2016 • Fahad AlGhamdi, Giovanni Molina, Mona Diab, Thamar Solorio, Abdelati Hawwari, Victor Soto, Julia Hirschberg
We address the problem of Part of Speech tagging (POS) in the context of linguistic code switching (CS).
no code implementations • WS 2016 • Giovanni Molina, Fahad AlGhamdi, Mahmoud Ghoneim, Abdelati Hawwari, Nicolas Rey-Villamizar, Mona Diab, Thamar Solorio
We present an overview of the second shared task on language identification in code-switched data.
no code implementations • LREC 2018 • Fahad AlGhamdi, Mona Diab
Data annotation is an important and necessary task for all NLP applications.
no code implementations • IJCNLP 2019 • Arshit Gupta, Peng Zhang, Garima Lalwani, Mona Diab
In this work, we propose a context-aware self-attentive NLU (CASA-NLU) model that uses multiple signals, such as previous intents, slots, dialog acts and utterances over a variable context window, in addition to the current user utterance.
1 code implementation • IJCNLP 2019 • Nada Almarwani, Hanan Aldarmaki, Mona Diab
Vector averaging remains one of the most popular sentence embedding methods in spite of its obvious disregard for syntactic structure.
no code implementations • WS 2018 • Gustavo Aguilar, Fahad AlGhamdi, Victor Soto, Mona Diab, Julia Hirschberg, Thamar Solorio
In the third shared task of the Computational Approaches to Linguistic Code-Switching (CALCS) workshop, we focus on Named Entity Recognition (NER) on code-switched social-media data.
1 code implementation • International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation 2019 • Pedram Hosseini, Mona Diab, David A. Broniatowski
In this paper, we test the hypothesis that causal and semantic coherence are associated with online sharing of misinformative social media content using Coh-Metrix – a widely-used set of psycholinguistic measures.
no code implementations • SEMEVAL 2019 • Sardar Hamidian, Mona Diab
Social media plays a crucial role as the main resource news for information seekers online.
no code implementations • SEMEVAL 2019 • Shabnam Tafreshi, Mona Diab
Our aim is to build a robust emotion classifier that can generalize emotion detection, which is to learn emotion cues in a noisy training environment.
no code implementations • WS 2019 • Fahad AlGhamdi, Mona Diab
In this paper, we address the problem of Part-of-Speech tagging (POS) in the context of linguistic code switching (CS).
no code implementations • 23 May 2019 • Shabnam Tafreshi, Mona Diab
In this paper we present an emotion classifier model submitted to the SemEval-2019 Task 3: EmoContext.
no code implementations • SEMEVAL 2019 • Hanan Aldarmaki, Mona Diab
We develop and investigate several cross-lingual alignment approaches for neural sentence embedding models, such as the supervised inference classifier, InferSent, and sequential encoder-decoder models.
no code implementations • WS 2019 • Maryam Aminian, Mohammad Sadegh Rasooli, Mona Diab
We describe a transfer method based on annotation projection to develop a dependency-based semantic role labeling system for languages for which no supervised linguistic information other than parallel data is available.
1 code implementation • NAACL 2019 • Hanan Aldarmaki, Mona Diab
Cross-lingual word vectors are typically obtained by fitting an orthogonal matrix that maps the entries of a bilingual dictionary from a source to a target vector space.
no code implementations • 24 Feb 2019 • Aditi Chaudhary, Siddharth Dalmia, Junjie Hu, Xinjian Li, Austin Matthews, Aldrian Obaja Muis, Naoki Otani, Shruti Rijhwani, Zaid Sheikh, Nidhi Vyas, Xinyi Wang, Jiateng Xie, Ruochen Xu, Chunting Zhou, Peter J. Jansen, Yiming Yang, Lori Levin, Florian Metze, Teruko Mitamura, David R. Mortensen, Graham Neubig, Eduard Hovy, Alan W. black, Jaime Carbonell, Graham V. Horwood, Shabnam Tafreshi, Mona Diab, Efsun S. Kayi, Noura Farra, Kathleen McKeown
This paper describes the ARIEL-CMU submissions to the Low Resource Human Language Technologies (LoReHLT) 2018 evaluations for the tasks Machine Translation (MT), Entity Discovery and Linking (EDL), and detection of Situation Frames in Text and Speech (SF Text and Speech).
no code implementations • WS 2018 • Christopher Hidey, Mona Diab
We present experiments on the FEVER (Fact Extraction and VERification) task, a shared task that involves selecting sentences from Wikipedia and predicting whether a claim is supported by those sentences, refuted, or there is not enough information.
no code implementations • SEMEVAL 2017 • Efsun Sarioglu Kayi, Mona Diab, Luca Pauselli, Michael Compton, Glen Coppersmith
As such, we examine the writings of schizophrenia patients analyzing their syntax, semantics and pragmatics.
no code implementations • COLING 2018 • Shabnam Tafreshi, Mona Diab
Detection and classification of emotion categories expressed by a sentence is a challenging task due to subjectivity of emotion.
Emotion Detection and Classification
General Classification
+2
1 code implementation • COLING 2018 • Hanan Aldarmaki, Mona Diab
We evaluated various compositional models, from bag-of-words representations to compositional RNN-based models, on several extrinsic supervised and unsupervised evaluation benchmarks.
no code implementations • TACL 2018 • Hanan Aldarmaki, Mahesh Mohan, Mona Diab
We show empirically that the performance of bilingual correspondents learned using our proposed unsupervised method is comparable to that of using supervised bilingual correspondents from a seed dictionary.
no code implementations • IJCNLP 2017 • Maryam Aminian, Mohammad Sadegh Rasooli, Mona Diab
Our paper addresses the problem of annotation projection for semantic role labeling for resource-poor languages using supervised annotations from a resource-rich language through parallel data.
no code implementations • SEMEVAL 2017 • Nada Almarwani, Mona Diab
This paper describes our submission to SemEval-2017 Task 3 Subtask D, {``}Question Answer Ranking in Arabic Community Question Answering{''}.
no code implementations • SEMEVAL 2017 • Daniel Cer, Mona Diab, Eneko Agirre, I{\~n}igo Lopez-Gazpio, Lucia Specia
Semantic Textual Similarity (STS) measures the meaning similarity of sentences.
3 code implementations • 31 Jul 2017 • Daniel Cer, Mona Diab, Eneko Agirre, Iñigo Lopez-Gazpio, Lucia Specia
Semantic Textual Similarity (STS) measures the meaning similarity of sentences.
no code implementations • WS 2017 • Nada Almarwani, Mona Diab
Determining the textual entailment between texts is important in many NLP tasks, such as summarization, question answering, and information extraction and retrieval.
no code implementations • WS 2017 • Mohamed Al-Badrashiny, Abdelati Hawwari, Mona Diab
In this paper we present a system for automatic Arabic text diacritization using three levels of analysis granularity in a layered back off manner.
no code implementations • WS 2016 • Mohammed Attia, Ayah Zirikly, Mona Diab
The interaction between roots and patterns in Arabic has intrigued lexicographers and morphologists for centuries.
no code implementations • COLING 2016 • Mohamed Al-Badrashiny, Mona Diab
We introduce a generic Language Independent Framework for Linguistic Code Switch Point Detection.
no code implementations • WS 2016 • Wajdi Zaghouani, Abdelati Hawwari, Sawsan Alqahtani, Houda Bouamor, Mahmoud Ghoneim, Mona Diab, Kemal Oflazer
Arabic writing is typically underspecified for short vowels and other markups, referred to as diacritics.
no code implementations • WS 2016 • Mohamed Al-Badrashiny, Abdelati Hawwari, Mahmoud Ghoneim, Mona Diab
We propose an automated method that identifies the morphological and syntactic flexibility of Arabic Verbal Multiword Expressions (AVMWE).
no code implementations • WS 2016 • Maryam Aminian, Mohamed Al-Badrashiny, Mona Diab
We present an approach for automatic verification and augmentation of multilingual lexica.
no code implementations • WS 2016 • Mona Diab
We recently witnessed an exponential growth in dialectal Arabic usage in both textual data and speech recordings especially in social media.
no code implementations • WS 2016 • Ayah Zirikly, Bart Desmet, Mona Diab
This paper describes the GW/LT3 contribution to the 2016 VarDial shared task on the identification of similar languages (task 1) and Arabic dialects (task 2).
no code implementations • LREC 2016 • Mohamed Al-Badrashiny, Arfath Pasha, Mona Diab, Nizar Habash, Owen Rambow, Wael Salloum, Esk, Ramy er
Text preprocessing is an important and necessary task for all NLP applications.
no code implementations • LREC 2016 • Wajdi Zaghouani, Houda Bouamor, Abdelati Hawwari, Mona Diab, Ossama Obeid, Mahmoud Ghoneim, Sawsan Alqahtani, Kemal Oflazer
This paper presents the annotation guidelines developed as part of an effort to create a large scale manually diacritized corpus for various Arabic text genres.
no code implementations • LREC 2016 • Abdelati Hawwari, Mohammed Attia, Mahmoud Ghoneim, Mona Diab
Identifying the various types of the Idafa construction (IC) is of importance to Natural Language processing (NLP) applications.
no code implementations • SEMEVAL 2015 • Vinodkumar Prabhakaran, Tomas By, Julia Hirschberg, Owen Rambow, Samira Shaikh, Tomek Strzalkowski, Jennifer Tracey, Michael Arrigo, Rupayan Basu, Micah Clark, Adam Dalton, Mona Diab, Louise Guthrie, Anna Prokofieva, Stephanie Strassel, Gregory Werner, Yorick Wilks, Janyce Wiebe
no code implementations • SEMEVAL 2015 • Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, I{\~n}igo Lopez-Gazpio, Montse Maritxalar, Rada Mihalcea, German Rigau, Larraitz Uria, Janyce Wiebe
no code implementations • WS 2012 • Vinodkumar Prabhakaran, Michael Bloodgood, Mona Diab, Bonnie Dorr, Lori Levin, Christine D. Piatko, Owen Rambow, Benjamin Van Durme
We explore training an automatic modality tagger.
no code implementations • LREC 2014 • Muhammad Abdul-Mageed, Mona Diab
The computational treatment of subjectivity and sentiment in natural language is usually significantly improved by applying features exploiting lexical resources where entries are tagged with semantic orientation (e. g., positive, negative values).
no code implementations • LREC 2014 • Arfath Pasha, Mohamed Al-Badrashiny, Mona Diab, Ahmed El Kholy, Esk, Ramy er, Nizar Habash, Manoj Pooleery, Owen Rambow, Ryan Roth
In this paper, we present MADAMIRA, a system for morphological analysis and disambiguation of Arabic that combines some of the best aspects of two previously commonly used systems for Arabic processing, MADA (Habash and Rambow, 2005; Habash et al., 2009; Habash et al., 2013) and AMIRA (Diab et al., 2007).
no code implementations • LREC 2014 • Mona Diab, Mohamed Al-Badrashiny, Maryam Aminian, Mohammed Attia, Heba Elfardy, Nizar Habash, Abdelati Hawwari, Wael Salloum, Pradeep Dasigi, Esk, Ramy er
Multiple levels of quality checks are performed on the output of each step in the creation process.
no code implementations • 22 Sep 2013 • Mona Diab, Nizar Habash, Owen Rambow, Ryan Roth
The Linguistic Data Consortium (LDC) has developed hundreds of data corpora for natural language processing (NLP) research.
no code implementations • LREC 2012 • Vinodkumar Prabhakaran, Huzaifa Neralwala, Owen Rambow, Mona Diab
In this paper, we describe a multi-layer annotation scheme for social power relations that are recognizable from online written interactions.
no code implementations • LREC 2012 • Nizar Habash, Mona Diab, Owen Rambow
Dialectal Arabic (DA) refers to the day-to-day vernaculars spoken in the Arab world.
no code implementations • LREC 2012 • Muhammad Abdul-Mageed, Mona Diab
We present AWATIF, a multi-genre corpus of Modern Standard Arabic (MSA) labeled for subjectivity and sentiment analysis (SSA) at the sentence level.
no code implementations • LREC 2012 • Heba Elfardy, Mona Diab
In this paper, we present a simplified Set of guidelines for detecting code switching in Arabic on the word/token level.