Search Results for author: Younes Samih

Found 36 papers, 4 papers with code

Implicit representations of event properties within contextual language models: Searching for “causativity neurons”

1 code implementation • IWCS (ACL) 2021 • Esther Seyffarth, Younes Samih, Laura Kallmeyer, Hassan Sajjad

This paper addresses the question to which extent neural contextual language models such as BERT implicitly represent complex semantic properties.

Sentence

Paper
Code

QADI: Arabic Dialect Identification in the Wild

no code implementations • EACL (WANLP) 2021 • Ahmed Abdelali, Hamdy Mubarak, Younes Samih, Sabit Hassan, Kareem Darwish

For extrinsic evaluation, we are able to build effective country level dialect identification on tweets with a macro-averaged F1-score of 60. 6% across 18 classes.

Dialect Identification

Paper
Add Code

Multilingual Nonce Dependency Treebanks: Understanding how LLMs represent and process syntactic structure

no code implementations • 13 Nov 2023 • David Arps, Laura Kallmeyer, Younes Samih, Hassan Sajjad

We replicate the findings of M\"uller-Eberstein et al. (2022) on nonce test data and show that the performance declines on both MLMs and ALMs wrt.

Paper
Add Code

Probing for Constituency Structure in Neural Language Models

1 code implementation • 13 Apr 2022 • David Arps, Younes Samih, Laura Kallmeyer, Hassan Sajjad

We find that 4 pretrained transfomer LMs obtain high performance on our probing tasks even on manipulated data, suggesting that semantic and syntactic knowledge in their representations can be separated and that constituency information is in fact learned by the LM.

Paper
Code

Automatic Expansion and Retargeting of Arabic Offensive Language Training

no code implementations • 18 Nov 2021 • Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish, Younes Samih

Rampant use of offensive language on social media led to recent efforts on automatic identification of such language.

Paper
Add Code

A Few Topical Tweets are Enough for Effective User Stance Detection

no code implementations • EACL 2021 • Younes Samih, Kareem Darwish

We show that this approach outperforms two strong baselines and achieves 89. 6{\%} accuracy and 91. 3{\%} macro F-measure on eight controversial topics.

Clustering Stance Detection

Paper
Add Code

Pre-Training BERT on Arabic Tweets: Practical Considerations

no code implementations • 21 Feb 2021 • Ahmed Abdelali, Sabit Hassan, Hamdy Mubarak, Kareem Darwish, Younes Samih

The experiments highlight the centrality of data diversity and the efficacy of linguistically aware segmentation.

Paper
Add Code

ALT at SemEval-2020 Task 12: Arabic and English Offensive Language Identification in Social Media

no code implementations • SEMEVAL 2020 • Sabit Hassan, Younes Samih, Hamdy Mubarak, Ahmed Abdelali

This paper describes the systems submitted by the Arabic Language Technology group (ALT) at SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media.

Language Identification

Paper
Add Code

Arabic Dialect Identification in the Wild

no code implementations • 13 May 2020 • Ahmed Abdelali, Hamdy Mubarak, Younes Samih, Sabit Hassan, Kareem Darwish

We present QADI, an automatically collected dataset of tweets belonging to a wide range of country-level Arabic dialects -covering 18 different countries in the Middle East and North Africa region.

Dialect Identification

Paper
Add Code

ALT Submission for OSACT Shared Task on Offensive Language Detection

no code implementations • LREC 2020 • Sabit Hassan, Younes Samih, Hamdy Mubarak, Ahmed Abdelali, Ammar Rashed, Shammur Absar Chowdhury

In this paper, we describe our efforts at OSACT Shared Task on Offensive Language Detection.

Hate Speech Detection

Paper
Add Code

A Few Topical Tweets are Enough for Effective User-Level Stance Detection

no code implementations • 7 Apr 2020 • Younes Samih, Kareem Darwish

We show that this approach outperforms two strong baselines and achieves 89. 6% accuracy and 91. 3% macro F-measure on eight controversial topics.

Clustering General Classification +1

Paper
Add Code

Arabic Offensive Language on Twitter: Analysis and Experiments

no code implementations • EACL (WANLP) 2021 • Hamdy Mubarak, Ammar Rashed, Kareem Darwish, Younes Samih, Ahmed Abdelali

Detecting offensive language on Twitter has many applications ranging from detecting/predicting bullying to measuring polarization.

Paper
Add Code

A System for Diacritizing Four Varieties of Arabic

no code implementations • IJCNLP 2019 • Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish, Mohamed Eldesouki, Younes Samih, Hassan Sajjad

Short vowels, aka diacritics, are more often omitted when writing different varieties of Arabic including Modern Standard Arabic (MSA), Classical Arabic (CA), and Dialectal Arabic (DA).

Feature Engineering

Paper
Add Code

QC-GO Submission for MADAR Shared Task: Arabic Fine-Grained Dialect Identification

no code implementations • WS 2019 • Younes Samih, Hamdy Mubarak, Ahmed Abdelali, Mohammed Attia, Mohamed Eldesouki, Kareem Darwish

This paper describes the QC-GO team submission to the MADAR Shared Task Subtask 1 (travel domain dialect identification) and Subtask 2 (Twitter user location identification).

Dialect Identification

Paper
Add Code

POS Tagging for Improving Code-Switching Identification in Arabic

no code implementations • WS 2019 • Mohammed Attia, Younes Samih, Ali Elkahky, Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish

When speakers code-switch between their native language and a second language or language variant, they follow a syntactic pattern where words and phrases from the embedded language are inserted into the matrix language.

POS POS Tagging

Paper
Add Code

Highly Effective Arabic Diacritization using Sequence to Sequence Modeling

no code implementations • NAACL 2019 • Hamdy Mubarak, Ahmed Abdelali, Hassan Sajjad, Younes Samih, Kareem Darwish

Arabic text is typically written without short vowels (or diacritics).

Feature Engineering Machine Translation +1

Paper
Add Code

Diacritization of Maghrebi Arabic Sub-Dialects

no code implementations • 15 Oct 2018 • Ahmed Abdelali, Mohammed Attia, Younes Samih, Kareem Darwish, Hamdy Mubarak

Diacritization process attempt to restore the short vowels in Arabic written text; which typically are omitted.

Paper
Add Code

Mumpitz at PARSEME Shared Task 2018: A Bidirectional LSTM for the Identification of Verbal Multiword Expressions

no code implementations • COLING 2018 • Rafael Ehren, Timm Lichte, Younes Samih

We submitted results for seven languages in the closed track of the task and for one language in the open track.

Machine Translation Sentence +1

Paper
Add Code

GHHT at CALCS 2018: Named Entity Recognition for Dialectal Arabic Using Neural Networks

no code implementations • WS 2018 • Mohammed Attia, Younes Samih, Wolfgang Maier

This paper describes our system submission to the CALCS 2018 shared task on named entity recognition on code-switched data for the language variant pair of Modern Standard Arabic and Egyptian dialectal Arabic.

named-entity-recognition Named Entity Recognition +1

Paper
Add Code

German and French Neural Supertagging Experiments for LTAG Parsing

no code implementations • ACL 2018 • Tatiana Bladier, Andreas van Cranenburgh, Younes Samih, Laura Kallmeyer

We present ongoing work on data-driven parsing of German and French with Lexicalized Tree Adjoining Grammars.

Dependency Parsing Semantic Parsing +1

Paper
Add Code

GHH at SemEval-2018 Task 10: Discovering Discriminative Attributes in Distributional Semantics

no code implementations • SEMEVAL 2018 • Mohammed Attia, Younes Samih, Manaal Faruqui, Wolfgang Maier

This paper describes our system submission to the SemEval 2018 Task 10 on Capturing Discriminative Attributes.

Attribute Word Embeddings

Paper
Add Code

Multilingual Multi-class Sentiment Classification Using Convolutional Neural Networks

1 code implementation • LREC 2018 • Mohammed Attia, Younes Samih, Ali Elkahky, Laura Kallmeyer

Classification Document Classification +6

Paper
Code

Multi-Dialect Arabic POS Tagging: A CRF Approach

no code implementations • LREC 2018 • Kareem Darwish, Hamdy Mubarak, Ahmed Abdelali, Mohamed Eldesouki, Younes Samih, R Alharbi, ah, Mohammed Attia, Walid Magdy, Laura Kallmeyer

Machine Translation Part-Of-Speech Tagging +2

Paper
Add Code

Arabic Multi-Dialect Segmentation: bi-LSTM-CRF vs. SVM

2 code implementations • 19 Aug 2017 • Mohamed Eldesouki, Younes Samih, Ahmed Abdelali, Mohammed Attia, Hamdy Mubarak, Kareem Darwish, Kallmeyer Laura

Arabic word segmentation is essential for a variety of NLP applications such as machine translation and information retrieval.

Ranked #1 on Sentiment Analysis on DynaSent (using extra training data)

Domain Adaptation Information Retrieval +5

Paper
Code

Learning from Relatives: Unified Dialectal Arabic Segmentation

no code implementations • CONLL 2017 • Younes Samih, Mohamed Eldesouki, Mohammed Attia, Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak, Laura Kallmeyer

Arabic dialects do not just share a common koin{\'e}, but there are shared pan-dialectal linguistic phenomena that allow computational models for dialects to learn from each other.

Dialect Identification Information Retrieval +2

Paper
Add Code

A Neural Architecture for Dialectal Arabic Segmentation

no code implementations • WS 2017 • Younes Samih, Mohammed Attia, Mohamed Eldesouki, Ahmed Abdelali, Hamdy Mubarak, Laura Kallmeyer, Kareem Darwish

The automated processing of Arabic Dialects is challenging due to the lack of spelling standards and to the scarcity of annotated data and resources in general.

Machine Translation Morphological Analysis +2

Paper
Add Code

CogALex-V Shared Task: GHHH - Detecting Semantic Relations via Word Embeddings

no code implementations • WS 2016 • Mohammed Attia, Suraj Maharjan, Younes Samih, Laura Kallmeyer, Thamar Solorio

The evaluation results of our system on the test set is 88. 1{\%} (79. 0{\%} for TRUE only) f-measure for Task-1 on detecting semantic similarity, and 76. 0{\%} (42. 3{\%} when excluding RANDOM) for Task-2 on identifying finer-grained semantic relations.

Binary Classification General Classification +7

Paper
Add Code

Multilingual Code-switching Identification via LSTM Recurrent Neural Networks

no code implementations • WS 2016 • Younes Samih, Suraj Maharjan, Mohammed Attia, Laura Kallmeyer, Thamar Solorio

Language Identification

Paper
Add Code

SAWT: Sequence Annotation Web Tool

no code implementations • WS 2016 • Younes Samih, Wolfgang Maier, Laura Kallmeyer

Paper
Add Code

An Arabic-Moroccan Darija Code-Switched Corpus

no code implementations • LREC 2016 • Younes Samih, Wolfgang Maier

In this paper, we describe our effort in the development and annotation of a large scale corpus containing code-switched data.

Paper
Add Code

Une m\'etagrammaire de l'interface morpho-s\'emantique dans les verbes en arabe

no code implementations • JEPTALNRECITAL 2015 • Simon Petitjean, Younes Samih, Timm Lichte

Dans cet article, nous pr{\'e}sentons une mod{\'e}lisation de la morphologie d{\'e}rivationnelle de l{'}arabe utilisant le cadre m{\'e}tagrammatical offert par XMG.

MORPH

Paper
Add Code

Synchronous Regular Relations and Morphological Analysis

no code implementations • WS 2013 • Christian Wurm, Younes Samih

Morphological Analysis

Paper
Add Code

Improved Spelling Error Detection and Correction for Arabic

no code implementations • COLING 2012 • Mohammed Attia, Pavel Pecina, Younes Samih, Khaled Shaalan, Josef van Genabith

Language Modelling

Paper
Add Code

The Floating Arabic Dictionary: An Automatic Method for Updating a Lexical Database through the Detection and Lemmatization of Unknown Words

no code implementations • COLING 2012 • Mohammed Attia, Younes Samih, Khaled Shaalan, Josef van Genabith

Lemmatization

Paper
Add Code

Conversion of Procedural Morphologies to Finite-State Morphologies: A Case Study of Arabic

no code implementations • WS 2012 • Mans Hulden, Younes Samih

Morphological Analysis

Paper
Add Code

Arabic Word Generation and Modelling for Spell Checking

no code implementations • LREC 2012 • Khaled Shaalan, Mohammed Attia, Pavel Pecina, Younes Samih, Josef van Genabith

Furthermore, from a large list of valid forms and invalid forms we create a character-based tri-gram language model to approximate knowledge about permissible character clusters in Arabic, creating a novel method for detecting spelling errors.

Language Modelling Morphological Analysis +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.