no code implementations • LREC 2022 • Elisa Gugliotta, Marco Dinarelli
In this paper we present the final result of a project focused on Tunisian Arabic encoded in Arabizi, the Latin-based writing system for digital conversations.
no code implementations • JEP/TALN/RECITAL 2022 • Marco Naguib, François Portet, Marco Dinarelli
Les approches de compréhension automatique de la parole ont récemment bénéficié de l’apport de modèles préappris par autosupervision sur de gros corpus de parole.
no code implementations • 4 Sep 2024 • Ryan Whetten, Titouan Parcollet, Adel Moumen, Marco Dinarelli, Yannick Estève
Self-Supervised Learning (SSL) has proven to be effective in various domains, including speech processing.
1 code implementation • 7 May 2024 • Ryan Whetten, Titouan Parcollet, Marco Dinarelli, Yannick Estève
BERT-based Speech pre-Training with Random-projection Quantizer (BEST-RQ), is an SSL method that has shown great performance on Automatic Speech Recognition (ASR) while being simpler than other SSL methods, such as wav2vec 2. 0.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 11 Sep 2023 • Titouan Parcollet, Ha Nguyen, Solene Evain, Marcely Zanon Boito, Adrien Pupier, Salima Mdhaffar, Hang Le, Sina Alisamir, Natalia Tomashenko, Marco Dinarelli, Shucong Zhang, Alexandre Allauzen, Maximin Coavoux, Yannick Esteve, Mickael Rouvier, Jerome Goulian, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier
Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing.
1 code implementation • 13 Feb 2023 • Lorenzo Lupo, Marco Dinarelli, Laurent Besacier
Context-aware translation can be achieved by processing a concatenation of consecutive sentences with the standard Transformer architecture.
1 code implementation • 24 Oct 2022 • Lorenzo Lupo, Marco Dinarelli, Laurent Besacier
A straightforward approach to context-aware neural machine translation consists in feeding the standard encoder-decoder architecture with a window of consecutive sentences, formed by the current sentence and a number of sentences from its context concatenated to it.
no code implementations • 11 Jul 2022 • Elisa Gugliotta, Marco Dinarelli
In this paper we present the final result of a project on Tunisian Arabic encoded in Arabizi, the Latin-based writing system for digital conversations.
no code implementations • 1 Jul 2022 • Marco Naguib, François Portet, Marco Dinarelli
Recent advances in spoken language understanding benefited from Self-Supervised models trained on large speech corpora.
no code implementations • 1 Jul 2022 • Marco Dinarelli, Marco Naguib, François Portet
Recent advances in spoken language understanding benefited from Self-Supervised models trained on large speech corpora.
1 code implementation • 23 Apr 2021 • Solene Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia Tomashenko, Marco Dinarelli, Titouan Parcollet, Alexandre Allauzen, Yannick Esteve, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier
In this paper, we propose LeBenchmark: a reproducible framework for assessing SSL from speech.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6
1 code implementation • ACL 2022 • Lorenzo Lupo, Marco Dinarelli, Laurent Besacier
Multi-encoder models are a broad family of context-aware neural machine translation systems that aim to improve translation quality by encoding document-level contextual information alongside the current sentence.
1 code implementation • COLING (WANLP) 2020 • Elisa Gugliotta, Marco Dinarelli, Olivier Kraif
We show also how we used the system in order to annotate a Tunisian Arabizi corpus, which has been afterwards manually corrected and used to further evaluate sequence models on Tunisian data.
no code implementations • JEPTALNRECITAL 2020 • Elisa Gugliotta, Marco Dinarelli
TArC : Incrementally and Semi-Automatically Collecting a Tunisian arabish Corpus This article describes the collection process of the first morpho-syntactically annotated Tunisian arabish Corpus (TArC).
no code implementations • LREC 2020 • Elisa Gugliotta, Marco Dinarelli
This article describes the constitution process of the first morpho-syntactically annotated Tunisian Arabish Corpus (TArC).
no code implementations • 14 Feb 2020 • Marco Dinarelli, Nikita Kapoor, Bassam Jabaian, Laurent Besacier
For that, in many cases, models are combined with an external language model to enhance their performance.
no code implementations • 16 Sep 2019 • Marco Dinarelli, Loïc Grobol
We propose a neural architecture with the main characteristics of the most successful neural models of the last years: bidirectional RNNs, encoder-decoder, and the Transformer model.
no code implementations • JEPTALNRECITAL 2019 • Marco Dinarelli, Lo{\"\i}c Grobol
Nous proposons une architecture neuronale avec les caract{\'e}ristiques principales des mod{\`e}les neuronaux de ces derni{\`e}res ann{\'e}es : les r{\'e}seaux neuronaux r{\'e}currents bidirectionnels, les mod{\`e}les encodeur-d{\'e}codeur, et le mod{\`e}le Transformer.
no code implementations • 9 Apr 2019 • Marco Dinarelli, Loïc Grobol
During the last couple of years, Recurrent Neural Networks (RNN) have reached state-of-the-art performances on most of the sequence modelling problems.
no code implementations • 20 Jun 2017 • Marco Dinarelli, Yoann Dupont, Isabelle Tellier
Understanding spoken language is a highly complex problem, which can be decomposed into several simpler tasks.
no code implementations • 6 Jun 2017 • Yoann Dupont, Marco Dinarelli, Isabelle Tellier
In this work we propose a solution far simpler but very effective: an evolution of the simple Jordan RNN, where labels are re-injected as input into the network, and converted into embeddings, in the same way as words.
no code implementations • JEPTALNRECITAL 2017 • Yoann Dupont, Marco Dinarelli, Isabelle Tellier
R{\'e}cemment, une variante de r{\'e}seau neuronal particuli{\`e}rement adapt{\'e} {\`a} l{'}{\'e}tiquetage de s{\'e}quences textuelles a {\'e}t{\'e} propos{\'e}e, utilisant des repr{\'e}sentations distributionnelles des {\'e}tiquettes.
no code implementations • JEPTALNRECITAL 2017 • Lo{\"\i}c Grobol, Isabelle Tellier, {\'E}ric de la Clergerie, Marco Dinarelli, L, Fr{\'e}d{\'e}ric ragin
Cet article pr{\'e}sente trois exp{\'e}riences de d{\'e}tection de mentions dans un corpus de fran{\c{c}}ais oral : ANCOR.
no code implementations • JEPTALNRECITAL 2017 • Tian Tian, Isabelle Tellier, Marco Dinarelli, Pedro Cardoso
Dans cet article, nous proposons un mod{\`e}le pour d{\'e}tecter dans les textes g{\'e}n{\'e}r{\'e}s par des utilisateurs (en particulier les tweets), les mots non-standards {\`a} corriger.
no code implementations • JEPTALNRECITAL 2016 • Marco Dinarelli, Isabelle Tellier
Dans cet article nous {\'e}tudions plusieurs types de r{\'e}seaux neuronaux r{\'e}currents (RNN) pour l{'}{\'e}tiquetage de s{\'e}quences.
no code implementations • 8 Jun 2016 • Marco Dinarelli, Isabelle Tellier
In this paper we study different types of Recurrent Neural Networks (RNN) for sequence labeling tasks.
no code implementations • LREC 2016 • Tian Tian, Marco Dinarelli, Isabelle Tellier, Pedro Dias Cardoso
We explain the specificities of this corpus with examples and describe some baseline experiments.
no code implementations • LREC 2014 • Garcia-Fern, Anne ez, Olivier Ferret, Marco Dinarelli
The work presented in this article takes place in the field of opinion mining and aims more particularly at finding the polarity of a text by relying on machine learning methods.
no code implementations • LREC 2012 • Marco Dinarelli, Sophie Rosset
We evaluate our procedure for preprocessing OCR-ized data in two ways: in terms of perplexity and OOV rate of a language model on development and evaluation data, and in terms of the performance of the named entity detection system on the preprocessed data.