no code implementations • WS 2020 • Gosse Bouma, Djam{\'e} Seddah, Daniel Zeman
This overview introduces the task of parsing into enhanced universal dependencies, describes the datasets used for training and evaluation, and evaluation metrics.
no code implementations • ACL 2020 • Djam{\'e} Seddah, Farah Essaidi, Amal Fethi, Matthieu Futeral, Benjamin Muller, Pedro Javier Ortiz Su{\'a}rez, Beno{\^\i}t Sagot, Abhishek Srivastava
We introduce the first treebank for a romanized user-generated content variety of Algerian, a North-African Arabic dialect known for its frequent usage of code-switching.
no code implementations • JEPTALNRECITAL 2020 • Louis Martin, Benjamin Muller, Pedro Javier Ortiz Su{\'a}rez, Yoann Dupont, Laurent Romary, {\'E}ric Villemonte de la Clergerie, Beno{\^\i}t Sagot, Djam{\'e} Seddah
L{'}utilisation pratique de ces mod{\`e}les {---} dans toutes les langues sauf l{'}anglais {---} {\'e}tait donc limit{\'e}e. La sortie r{\'e}cente de plusieurs mod{\`e}les monolingues fond{\'e}s sur BERT (Devlin et al., 2019), notamment pour le fran{\c{c}}ais, a d{\'e}montr{\'e} l{'}int{\'e}r{\^e}t de ces mod{\`e}les en am{\'e}liorant l{'}{\'e}tat de l{'}art pour toutes les t{\^a}ches {\'e}valu{\'e}es.
no code implementations • LREC 2020 • Manuela Sanguinetti, Cristina Bosco, Lauren Cassidy, {\"O}zlem {\c{C}}etino{\u{g}}lu, Aless Cignarella, ra Teresa, Teresa Lynn, Ines Rehbein, Josef Ruppenhofer, Djam{\'e} Seddah, Amir Zeldes
The paper presents a discussion on the main linguistic phenomena of user-generated texts found in web and social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework.
no code implementations • WS 2019 • Jos{\'e} Carlos Rosales N{\'u}{\~n}ez, Djam{\'e} Seddah, Guillaume Wisniewski
We present an approach to correct noisy User Generated Content (UGC) in French aiming to produce a pretreatement pipeline to improve Machine Translation for this kind of non-canonical corpora.
no code implementations • WS 2019 • Benjamin Muller, Benoit Sagot, Djam{\'e} Seddah
In this article, focusing on User Generated Content (UGC), we study the ability of BERT to perform lexical normalisation.
1 code implementation • WS 2019 • Ganesh Jawahar, Djam{\'e} Seddah
We devise a novel attentional model, based on Bernoulli word embeddings, that are conditioned on contextual extra-linguistic (social) features such as network, spatial and socio-economic variables, which are associated with Twitter users, as well as topic-based features.
1 code implementation • ACL 2019 • Ganesh Jawahar, Beno{\^\i}t Sagot, Djam{\'e} Seddah
BERT is a recent language representation model that has surprisingly performed well in diverse language understanding benchmarks.
no code implementations • CONLL 2018 • Ganesh Jawahar, Benjamin Muller, Amal Fethi, Louis Martin, {\'E}ric Villemonte de la Clergerie, Beno{\^\i}t Sagot, Djam{\'e} Seddah
We augment the deep Biaffine (BiAF) parser (Dozat and Manning, 2016) with novel features to perform competitively: we utilize an indomain version of ELMo features (Peters et al., 2018) which provide context-dependent word representations; we utilize disambiguated, embedded, morphosyntactic features from lexicons (Sagot, 2018), which complements the existing feature set.
no code implementations • CONLL 2017 • {\'E}ric de la Clergerie, Beno{\^\i}t Sagot, Djam{\'e} Seddah
We present the ParisNLP entry at the UD CoNLL 2017 parsing shared task.
no code implementations • WS 2016 • H{\'e}ctor Mart{\'\i}nez Alonso, Djam{\'e} Seddah, Beno{\^\i}t Sagot
User-generated content presents many challenges for its automatic processing.
no code implementations • LREC 2016 • Djam{\'e} Seddah, C, Marie ito
We present the French Question Bank, a treebank of 2600 questions.
no code implementations • LREC 2016 • Corentin Ribeyre, Eric Villemonte de la Clergerie, Djam{\'e} Seddah
Parsing predicate-argument structures in a deep syntax framework requires graphs to be predicted.
no code implementations • LREC 2014 • C, Marie ito, Guy Perrier, Bruno Guillaume, Corentin Ribeyre, Kar{\"e}n Fort, Djam{\'e} Seddah, {\'E}ric de la Clergerie
We define a deep syntactic representation scheme for French, which abstracts away from surface syntactic variation and diathesis alternations, and describe the annotation of deep syntactic representations on top of the surface dependency trees of the Sequoia corpus.
no code implementations • WS 2013 • Djam{\'e} Seddah, Reut Tsarfaty, S K{\"u}bler, ra, C, Marie ito, Jinho D. Choi, Rich{\'a}rd Farkas, Jennifer Foster, Iakes Goenaga, Koldo Gojenola Galletebeitia, Yoav Goldberg, Spence Green, Nizar Habash, Marco Kuhlmann, Wolfgang Maier, Joakim Nivre, Adam Przepi{\'o}rkowski, Ryan Roth, Wolfgang Seeker, Yannick Versley, Veronika Vincze, Marcin Woli{\'n}ski, Alina Wr{\'o}blewska, Eric Villemonte de la Clergerie
no code implementations • LREC 2012 • Djam{\'e} Seddah, C, Marie ito, Benoit Crabb{\'e}, Enrique Henestroza Anguiano
In this paper, we introduce a set of resources that we have derived from the EST R{\'E}PUBLICAIN CORPUS, a large, freely-available collection of regional newspaper articles in French, totaling 150 million words.