no code implementations • LREC 2020 • Wafia Adouane, Samia Touileb, Jean-Philippe Bernardy
We present in this paper our work on Algerian language, an under-resourced North African colloquial Arabic variety, for which we built a comparably large corpus of more than 36, 000 code-switched user-generated comments annotated for sentiments.
no code implementations • LREC 2020 • Wafia Adouane, Jean-Philippe Bernardy
Our empirical results show that multi-task learning is beneficial for some tasks in particular settings, and that the effect of each task on another, the order of the tasks, and the size of the training data of the task with more data do matter.
no code implementations • WS 2019 • Wafia Adouane, Jean-Philippe Bernardy, Simon Dobnik
We work with Algerian, an under-resourced non-standardised Arabic variety, for which we compile a new parallel corpus consisting of user-generated textual data matched with normalised and corrected human annotations following data-driven and our linguistically motivated standard.
no code implementations • WS 2019 • Wafia Adouane, Jean-Philippe Bernardy, Simon Dobnik
We explore the extent to which neural networks can learn to identify semantically equivalent sentences from a small variable dataset using an end-to-end training.
no code implementations • WS 2018 • Wafia Adouane, Jean-Philippe Bernardy, Simon Dobnik
We explore the effect of injecting background knowledge to different deep neural network (DNN) configurations in order to mitigate the problem of the scarcity of annotated data when applying these models on datasets of low-resourced languages.
no code implementations • WS 2018 • Wafia Adouane, Simon Dobnik, Jean-Philippe Bernardy, Nasredine Semmar
This paper seeks to examine the effect of including background knowledge in the form of character pre-trained neural language model (LM), and data bootstrapping to overcome the problem of unbalanced limited resources.
no code implementations • WS 2017 • Wafia Adouane, Simon Dobnik
This paper presents a language identification system designed to detect the language of each word, in its context, in a multilingual documents as generated in social media by bilingual/multilingual communities, in our case speakers of Algerian Arabic.
no code implementations • WS 2016 • Wafia Adouane, Nasredine Semmar, Richard Johansson
In sub-task 2, which deals with Arabic dialect identification, the system achieved its best performance using character-based n-grams (49. 67{\%} accuracy), ranking fourth in the closed track (the best result being 51. 16{\%}), and an accuracy of 53. 18{\%}, ranking first in the open track.
no code implementations • WS 2016 • Wafia Adouane, Nasredine Semmar, Richard Johansson, Victoria Bobicev
Automatic Language Identification (ALI) is the detection of the natural language of an input text by a machine.
no code implementations • WS 2016 • Wafia Adouane, Nasredine Semmar, Richard Johansson
The ALI standard methods require datasets for training and use character/word-based n-gram models.
no code implementations • LREC 2016 • Wafia Adouane, Richard Johansson
To fill this gap, we created these two main linguistic resources.