no code implementations • ALTA 2021 • Eric Le Ferrand, Steven Bird, Laurent Besacier
We investigate the efficiency of two very different spoken term detection approaches for transcription when the available data is insufficient to train a robust speech recognition system.
no code implementations • IWSLT 2016 • Ngoc-Tien Le, Benjamin Lecouteux, Laurent Besacier
This paper aims to unravel the automatic quality assessment for spoken language translation (SLT).
no code implementations • COLING 2022 • Éric Le Ferrand, Steven Bird, Laurent Besacier
An increasing number of papers have been addressing issues related to low-resource languages and the transcription bottleneck paradigm.
no code implementations • ACL 2022 • Eric Le Ferrand, Steven Bird, Laurent Besacier
Most low resource language technology development is premised on the need to collect data for training statistical models.
1 code implementation • ACL 2022 • Shu Okabe, Laurent Besacier, François Yvon
Word and morpheme segmentation are fundamental steps of language documentation as they allow to discover lexical units in a language for which the lexicon is unknown.
1 code implementation • CODI 2021 • Zae Myung Kim, Vassilina Nikoulina, Dongyeop Kang, Didier Schwab, Laurent Besacier
This paper presents an interactive data dashboard that provides users with an overview of the preservation of discourse relations among 28 language pairs.
no code implementations • WMT (EMNLP) 2021 • Md Mahfuz ibn Alam, Ivana Kvapilíková, Antonios Anastasopoulos, Laurent Besacier, Georgiana Dinu, Marcello Federico, Matthias Gallé, Kweonwoo Jung, Philipp Koehn, Vassilina Nikoulina
Language domains that require very careful use of terminology are abundant and reflect a significant part of the translation industry.
no code implementations • JEP/TALN/RECITAL 2021 • Diana Nicoleta Popa, William N. Havard, Maximin Coavoux, Eric Gaussier, Laurent Besacier
Le jeu de données SCAN, constitué d’un ensemble de commandes en langage naturel associées à des séquences d’action, a été spécifiquement conçu pour évaluer les capacités des réseaux de neurones à apprendre ce type de généralisation compositionnelle.
no code implementations • ACL (GeBNLP) 2021 • Mahault Garnerin, Solange Rossato, Laurent Besacier
In this paper we question the impact of gender representation in training data on the performance of an end-to-end ASR system.
no code implementations • CoNLL (EMNLP) 2021 • Siddique Latif, Inyoung Kim, Ioan Calapodescu, Laurent Besacier
In this paper, we investigate whether we can control prosody directly from the input text, in order to code information related to contrastive focus which emphasizes a specific word that is contrary to the presuppositions of the interlocutor.
no code implementations • EMNLP 2020 • Jerin Philip, Alexandre Berard, Matthias Gall{\'e}, Laurent Besacier
We propose a novel adapter layer formalism for adapting multilingual models.
no code implementations • BigScience (ACL) 2022 • Nicolas Hervé, Valentin Pelloin, Benoit Favre, Franck Dary, Antoine Laurent, Sylvain Meignier, Laurent Besacier
This papers aims at improving spoken language modeling (LM) using very large amount of automatically transcribed speech.
no code implementations • 16 Dec 2024 • Beomseok Lee, Marco Gaido, Ioan Calapodescu, Laurent Besacier, Matteo Negri
While crowdsourcing is an established solution for facilitating and scaling the collection of speech data, the involvement of non-experts necessitates protocols to ensure final data quality.
1 code implementation • 7 Aug 2024 • Beomseok Lee, Ioan Calapodescu, Marco Gaido, Matteo Negri, Laurent Besacier
We present Speech-MASSIVE, a multilingual Spoken Language Understanding (SLU) dataset comprising the speech counterpart for a portion of the MASSIVE textual corpus.
1 code implementation • 10 Jun 2024 • Marcely Zanon Boito, Vivek Iyer, Nikolaos Lagos, Laurent Besacier, Ioan Calapodescu
We present mHuBERT-147, the first general-purpose massively multilingual HuBERT speech representation model trained on 90K hours of clean, open-license data.
1 code implementation • 29 Mar 2024 • Thibaut Thonet, Jos Rozen, Laurent Besacier
Research on Large Language Models (LLMs) has recently witnessed an increasing interest in extending the models' context size to better capture dependencies within long documents.
no code implementations • 11 Sep 2023 • Titouan Parcollet, Ha Nguyen, Solene Evain, Marcely Zanon Boito, Adrien Pupier, Salima Mdhaffar, Hang Le, Sina Alisamir, Natalia Tomashenko, Marco Dinarelli, Shucong Zhang, Alexandre Allauzen, Maximin Coavoux, Yannick Esteve, Mickael Rouvier, Jerome Goulian, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier
Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing.
1 code implementation • 13 Feb 2023 • Lorenzo Lupo, Marco Dinarelli, Laurent Besacier
Context-aware translation can be achieved by processing a concatenation of consecutive sentences with the standard Transformer architecture.
1 code implementation • 24 Oct 2022 • Lorenzo Lupo, Marco Dinarelli, Laurent Besacier
A straightforward approach to context-aware neural machine translation consists in feeding the standard encoder-decoder architecture with a window of consecutive sentences, formed by the current sentence and a number of sentences from its context concatenated to it.
1 code implementation • 21 Oct 2022 • Laurent Besacier, Swen Ribeiro, Olivier Galibert, Ioan Calapodescu
In this paper, we introduce a new and simple method for comparing speech utterances without relying on text transcripts.
3 code implementations • 20 Oct 2022 • Alireza Mohammadshahi, Vassilina Nikoulina, Alexandre Berard, Caroline Brun, James Henderson, Laurent Besacier
In recent years, multilingual machine translation models have achieved promising performance on low-resource language pairs by sharing information between similar languages, thus enabling zero-shot translation.
no code implementations • 5 Jul 2022 • Valentin Pelloin, Franck Dary, Nicolas Herve, Benoit Favre, Nathalie Camelin, Antoine Laurent, Laurent Besacier
We aim at improving spoken language modeling (LM) using very large amount of automatically transcribed speech.
no code implementations • 4 Jul 2022 • Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber
We collect a corpus of utterances containing contrastive focus and we evaluate the accuracy of a BERT model, finetuned to predict quantized acoustic prominence features, on these samples.
1 code implementation • 22 May 2022 • Alireza Mohammadshahi, Vassilina Nikoulina, Alexandre Berard, Caroline Brun, James Henderson, Laurent Besacier
In this work, we assess the impact of compression methods on Multilingual Neural Machine Translation models (MNMT) for various language groups, gender, and semantic biases by extensive analysis of compressed models on different machine translation benchmarks, i. e. FLORES-101, MT-Gender, and DiBiMT.
no code implementations • 4 Apr 2022 • Marcely Zanon Boito, Laurent Besacier, Natalia Tomashenko, Yannick Estève
These models are pre-trained on unlabeled audio data and then used in speech processing downstream tasks such as automatic speech recognition (ASR) or speech translation (ST).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • EMNLP 2021 • Ahmet Üstün, Alexandre Bérard, Laurent Besacier, Matthias Gallé
We consider the problem of multilingual unsupervised machine translation, translating to and from languages that only have monolingual data by using auxiliary parallel language pairs.
1 code implementation • 22 Jun 2021 • Md Mahfuz ibn Alam, Antonios Anastasopoulos, Laurent Besacier, James Cross, Matthias Gallé, Philipp Koehn, Vassilina Nikoulina
As neural machine translation (NMT) systems become an important part of professional translator pipelines, a growing body of work focuses on combining NMT with terminologies.
no code implementations • 11 Jun 2021 • Éric Le Ferrand, Steven Bird, Laurent Besacier
We investigate the efficiency of two very different spoken term detection approaches for transcription when the available data is insufficient to train a robust ASR system.
no code implementations • SIGUL (LREC) 2022 • Marcely Zanon Boito, Bolaji Yusuf, Lucas Ondel, Aline Villavicencio, Laurent Besacier
Our results suggest that neural models for speech discretization are difficult to exploit in our setting, and that it might be necessary to adapt them to limit sequence length.
2 code implementations • ACL 2021 • Hang Le, Juan Pino, Changhan Wang, Jiatao Gu, Didier Schwab, Laurent Besacier
Adapter modules were recently introduced as an efficient alternative to fine-tuning in NLP.
Ranked #1 on
Speech-to-Text Translation
on MuST-C EN->ES
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • Findings (ACL) 2021 • Zae Myung Kim, Laurent Besacier, Vassilina Nikoulina, Didier Schwab
Recent studies on the analysis of the multilingual representations focus on identifying whether there is an emergence of language-independent representations, or whether a multilingual model partitions its weights among different languages.
no code implementations • 29 Apr 2021 • Ha Nguyen, Yannick Estève, Laurent Besacier
Boosted by the simultaneous translation shared task at IWSLT 2020, promising end-to-end online speech translation approaches were recently proposed.
1 code implementation • 23 Apr 2021 • Solene Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia Tomashenko, Marco Dinarelli, Titouan Parcollet, Alexandre Allauzen, Yannick Esteve, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier
In this paper, we propose LeBenchmark: a reproducible framework for assessing SSL from speech.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+6
1 code implementation • ACL 2022 • Lorenzo Lupo, Marco Dinarelli, Laurent Besacier
Multi-encoder models are a broad family of context-aware neural machine translation systems that aim to improve translation quality by encoding document-level contextual information alongside the current sentence.
1 code implementation • 16 Mar 2021 • Jama Hussein Mohamud, Lloyd Acquaye Thompson, Aissatou Ndoye, Laurent Besacier
This paper describes the results of an informal collaboration launched during the African Master of Machine Intelligence (AMMI) in June 2020.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 4 Mar 2021 • Ha Nguyen, Yannick Estève, Laurent Besacier
This paper proposes a decoding strategy for end-to-end simultaneous speech translation.
no code implementations • 19 Feb 2021 • Brooke Stephenson, Thomas Hueber, Laurent Girin, Laurent Besacier
The prosody of a spoken word is determined by its surrounding context.
no code implementations • ComputEL 2021 • Oliver Adams, Benjamin Galliot, Guillaume Wisniewski, Nicholas Lambourne, Ben Foley, Rahasya Sanders-Dwyer, Janet Wiles, Alexis Michaud, Séverine Guillaume, Laurent Besacier, Christopher Cox, Katya Aplonova, Guillaume Jacques, Nathan Hill
This paper reports on progress integrating the speech recognition toolkit ESPnet into Elpis, a web front-end originally designed to provide access to the Kaldi automatic speech recognition toolkit.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • COLING 2020 • Éric Le Ferrand, Steven Bird, Laurent Besacier
We propose a novel transcription workflow which combines spoken term detection and human-in-the-loop, together with a pilot experiment.
1 code implementation • COLING 2020 • Hang Le, Juan Pino, Changhan Wang, Jiatao Gu, Didier Schwab, Laurent Besacier
We propose two variants of these architectures corresponding to two different levels of dependencies between the decoders, called the parallel and cross dual-decoder Transformers, respectively.
Ranked #1 on
Speech-to-Text Translation
on MuST-C EN->FR
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 12 Oct 2020 • Ewan Dunbar, Julien Karadayi, Mathieu Bernard, Xuan-Nga Cao, Robin Algayres, Lucas Ondel, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux
We present the Zero Resource Speech Challenge 2020, which aims at learning speech representations from raw audio signals without any labels.
no code implementations • 4 Sep 2020 • Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber
In this paper, we study the behavior of a neural sequence-to-sequence TTS system when used in an incremental mode, i. e. when generating speech output for token n, the system has access to n + k tokens from the text sequence.
no code implementations • CONLL 2020 • William N. Havard, Jean-Pierre Chevrot, Laurent Besacier
The language acquisition literature shows that children do not build their lexicon by segmenting the spoken input into phonemes and then building up words from them, but rather adopt a top-down approach and start by segmenting word-like units and then break them down into smaller units.
1 code implementation • 9 Jun 2020 • Vaishali Pal, Manish Shrivastava, Laurent Besacier
This is the first attempt towards generating full-length natural answers from a graph input(confusion network) to the best of our knowledge.
1 code implementation • JEPTALNRECITAL 2020 • Hang Le, Lo{\"\i}c Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alex Allauzen, re, Beno{\^\i}t Crabb{\'e}, Laurent Besacier, Didier Schwab
Les mod{\`e}les de langue pr{\'e}-entra{\^\i}n{\'e}s sont d{\'e}sormais indispensables pour obtenir des r{\'e}sultats {\`a} l{'}{\'e}tat-de-l{'}art dans de nombreuses t{\^a}ches du TALN.
no code implementations • JEPTALNRECITAL 2020 • Mahault Garnerin, Solange Rossato, Laurent Besacier
Nous proposons une r{\'e}flexion sur les pratiques d{'}{\'e}valuation des syst{\`e}mes de reconnaissance automatique de la parole (ASR).
no code implementations • JEPTALNRECITAL 2020 • Mahault Garnerin, Solange Rossato, Laurent Besacier
Avec l{'}essor de l{'}intelligence artificielle (IA) et l{'}utilisation croissante des architectures d{'}apprentissage profond, la question de l{'}{\'e}thique et de la transparence des syst{\`e}mes d{'}IA est devenue une pr{\'e}occupation centrale au sein de la communaut{\'e} de recherche.
1 code implementation • COLING 2020 • Maha Elbayad, Michael Ustaszewski, Emmanuelle Esperança-Rodier, Francis Brunet Manquat, Jakob Verbeek, Laurent Besacier
We conduct in this work an evaluation study comparing offline and online neural machine translation architectures.
no code implementations • WS 2020 • Maha Elbayad, Ha Nguyen, Fethi Bougares, Natalia Tomashenko, Antoine Caubrière, Benjamin Lecouteux, Yannick Estève, Laurent Besacier
This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2020, offline speech translation and simultaneous speech translation.
1 code implementation • 18 May 2020 • Maha Elbayad, Laurent Besacier, Jakob Verbeek
We also show that the 2D-convolution architecture is competitive with Transformers for simultaneous translation of spoken language.
no code implementations • LREC 2020 • Marcely Zanon Boito, Aline Villavicencio, Laurent Besacier
For answering this question, we use the MaSS multilingual speech corpus (Boito et al., 2020) for creating 56 bilingual pairs that we apply to the task of low-resource unsupervised word segmentation and alignment.
no code implementations • LREC 2020 • Mahault Garnerin, Solange Rossato, Laurent Besacier
With the rise of artificial intelligence (AI) and the growing use of deep-learning architectures, the question of ethics, transparency and fairness of AI systems has become a central concern within the research community.
no code implementations • 14 Feb 2020 • Marco Dinarelli, Nikita Kapoor, Bassam Jabaian, Laurent Besacier
For that, in many cases, models are combined with an external language model to enhance their performance.
1 code implementation • 3 Feb 2020 • Vaishali Pal, Fabien Guillot, Manish Shrivastava, Jean-Michel Renders, Laurent Besacier
Spoken dialogue systems typically use a list of top-N ASR hypotheses for inferring the semantic meaning and tracking the state of the dialogue.
7 code implementations • LREC 2020 • Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab
Language models have become a key step to achieve state-of-the art results in many different Natural Language Processing (NLP) tasks.
Ranked #2 on
Natural Language Inference
on XNLI French
no code implementations • 12 Nov 2019 • Rohit Gupta, Laurent Besacier, Marc Dymetman, Matthias Gallé
Character-based translation has several appealing advantages, but its performance is in general worse than a carefully tuned BPE baseline.
no code implementations • EMNLP (IWSLT) 2019 • Loïc Vial, Benjamin Lecouteux, Didier Schwab, Hang Le, Laurent Besacier
Therefore, we implemented a Transformer-based encoder-decoder neural system which is able to use the output of a pre-trained language model as input embeddings, and we compared its performance under three configurations: 1) without any pre-trained language model (constrained), 2) using a language model trained on the monolingual parts of the allowed English-Czech data (constrained), and 3) using a language model trained on a large quantity of external monolingual data (unconstrained).
no code implementations • WS 2019 • Fahimeh Saleh, Alexandre Bérard, Ioan Calapodescu, Laurent Besacier
To address these challenges, we propose to leverage data from both tasks and do transfer learning between MT, NLG, and MT with source-side metadata (MT+NLG).
no code implementations • EMNLP (IWSLT) 2019 • Ha Nguyen, Natalia Tomashenko, Marcely Zanon Boito, Antoine Caubriere, Fethi Bougares, Mickael Rouvier, Laurent Besacier, Yannick Esteve
This paper describes the ON-TRAC Consortium translation systems developed for the end-to-end model task of IWSLT Evaluation 2019 for the English-to-Portuguese language pair.
no code implementations • EMNLP (IWSLT) 2019 • Pierre Godard, Laurent Besacier, Francois Yvon
One of the basic tasks of computational language documentation (CLD) is to identify word boundaries in an unsegmented phonemic stream.
1 code implementation • 11 Oct 2019 • Marcely Zanon Boito, Aline Villavicencio, Laurent Besacier
For language documentation initiatives, transcription is an expensive resource: one minute of audio is estimated to take one hour and a half on average of a linguist's work (Austin and Sallabank, 2013).
no code implementations • 25 Sep 2019 • Maha Elbayad, Laurent Besacier, Jakob Verbeek
We investigate the sensitivity of such models to the value of k that is used during training and when deploying the model, and the effect of updating the hidden states in transformer models as new source tokens are read.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • CONLL 2019 • William N. Havard, Jean-Pierre Chevrot, Laurent Besacier
In this paper, we study how word-like units are represented and activated in a recurrent neural model of visually grounded speech.
no code implementations • 23 Aug 2019 • Mahault Garnerin, Solange Rossato, Laurent Besacier
The disparity of available data for both gender causes performance to decrease on women.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • LREC 2020 • Marcely Zanon Boito, William N. Havard, Mahault Garnerin, Éric Le Ferrand, Laurent Besacier
However, the fact that the source content (the Bible) is the same for all the languages is not exploited to date. Therefore, this article proposes to add multilingual links between speech segments in different languages, and shares a large and clean dataset of 8, 130 parallel spoken utterances across 8 languages (56 language pairs).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
1 code implementation • 29 Jun 2019 • Marcely Zanon Boito, Aline Villavicencio, Laurent Besacier
This task consists in aligning word sequences in a source language with phoneme sequences in a target language, inferring from it word segmentation on the target side [5].
no code implementations • 25 Apr 2019 • Ewan Dunbar, Robin Algayres, Julien Karadayi, Mathieu Bernard, Juan Benjumea, Xuan-Nga Cao, Lucie Miskic, Charlotte Dugrain, Lucas Ondel, Alan W. black, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux
We present the Zero Resource Speech Challenge 2019, which proposes to build a speech synthesizer without any text or phonetic labels: hence, TTS without T (text-to-speech without text).
1 code implementation • 8 Feb 2019 • William N. Havard, Jean-Pierre Chevrot, Laurent Besacier
We investigate the behaviour of attention in neural models of visually grounded speech trained on two languages: English and Japanese.
no code implementations • ALTA 2018 • Xuanli He, Quan Hung Tran, William Havard, Laurent Besacier, Ingrid Zukerman, Gholamreza Haffari
In spite of the recent success of Dialogue Act (DA) classification, the majority of prior works focus on text-based classification with oracle transcriptions, i. e. human transcriptions, instead of Automatic Speech Recognition (ASR)'s transcriptions.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • WS 2018 • Zied Elloumi, Laurent Besacier, Olivier Galibert, Benjamin Lecouteux
In a previous paper, we presented an ASR performance prediction system using CNNs that encode both text (ASR transcript) and speech, in order to predict word error rate.
3 code implementations • CONLL 2018 • Maha Elbayad, Laurent Besacier, Jakob Verbeek
Current state-of-the-art machine translation systems are based on encoder-decoder architectures, that first encode the input sequence, and then generate an output sequence based on the input encoding.
Ranked #2 on
Machine Translation
on IWSLT2015 German-English
no code implementations • 27 Jul 2018 • Marcely Zanon Boito, Antonios Anastasopoulos, Marika Lekakou, Aline Villavicencio, Laurent Besacier
This paper presents an extension to a very low-resource parallel corpus collected in an endangered language, Griko, making it useful for computational research.
no code implementations • 18 Jun 2018 • Pierre Godard, Marcely Zanon-Boito, Lucas Ondel, Alexandre Berard, François Yvon, Aline Villavicencio, Laurent Besacier
We present a first attempt to perform attentional word segmentation directly from the speech signal, with the final goal to automatically identify lexical units in a low-resource, unwritten language (UL).
1 code implementation • ACL 2018 • Maha Elbayad, Laurent Besacier, Jakob Verbeek
We extend this approach to token-level loss smoothing, and propose improvements to the sequence-level smoothing approach.
no code implementations • LREC 2018 • Annie Rialland, Martine Adda-Decker, Guy-Noël Kouarata, Gilles Adda, Laurent Besacier, Lori Lamel, Elodie Gauthier, Pierre Godard, Jamison Cooper-Leavitt
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
1 code implementation • LREC 2018 • Pierre Godard, Gilles Adda, Martine Adda-Decker, Juan Benjumea, Laurent Besacier, Jamison Cooper-Leavitt, Guy-Noel Kouarata, Lori Lamel, Hélène Maynard, Markus Mueller, Annie Rialland, Sebastian Stueker, François Yvon, Marcely Zanon Boito
no code implementations • 23 Apr 2018 • Zied Elloumi, Laurent Besacier, Olivier Galibert, Juliette Kahn, Benjamin Lecouteux
In this paper, we address a relatively new task: prediction of ASR performance on unseen broadcast programs.
no code implementations • 16 Feb 2018 • Lucas Ondel, Pierre Godard, Laurent Besacier, Elin Larsen, Mark Hasegawa-Johnson, Odette Scharenborg, Emmanuel Dupoux, Lukas Burget, François Yvon, Sanjeev Khudanpur
Developing speech technologies for low-resource languages has become a very active research field over the last decade.
no code implementations • 14 Feb 2018 • Odette Scharenborg, Laurent Besacier, Alan Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stueker, Pierre Godard, Markus Mueller, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux
We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding the discovery of linguistic units (subwords and words) in a language without orthography.
1 code implementation • 12 Feb 2018 • Alexandre Bérard, Laurent Besacier, Ali Can Kocabiyikoglu, Olivier Pietquin
We investigate end-to-end speech-to-text translation on a corpus of audiobooks specifically augmented for this task.
1 code implementation • LREC 2018 • Ali Can Kocabiyikoglu, Laurent Besacier, Olivier Kraif
However, while large quantities of parallel texts (such as Europarl, OpenSubtitles) are available for training machine translation systems, there are no large (100h) and open source parallel corpora that include speech in a source language aligned to text in a target language.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • 12 Dec 2017 • Ewan Dunbar, Xuan Nga Cao, Juan Benjumea, Julien Karadayi, Mathieu Bernard, Laurent Besacier, Xavier Anguera, Emmanuel Dupoux
We describe a new challenge aimed at discovering subword and word units from raw speech.
no code implementations • 17 Sep 2017 • Marcely Zanon Boito, Alexandre Berard, Aline Villavicencio, Laurent Besacier
Word discovery is the task of extracting words from unsegmented text.
no code implementations • MTSummit 2017 • Ngoc-Tien Le, Benjamin Lecouteux, Laurent Besacier
This enables - as a by-product - qualitative analysis on the SLT errors and their origin (are they due to transcription or to translation step?)
no code implementations • WS 2017 • Michael Melese, Laurent Besacier, Million Meshesha
This paper describes speech translation from Amharic-to-English, particularly Automatic Speech Recognition (ASR) with post-editing feature and Amharic-English Statistical Machine Translation (SMT).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
1 code implementation • 26 Jul 2017 • William Havard, Laurent Besacier, Olivier Rosec
Disfluencies and speed perturbation are added to the signal in order to sound more natural.
no code implementations • 17 Jul 2017 • Alexandre Berard, Olivier Pietquin, Laurent Besacier
This paper presents the LIG-CRIStAL submission to the shared Automatic Post- Editing task of WMT 2017.
no code implementations • JEPTALNRECITAL 2017 • Kamel Bouzidi, Zied Elloumi, Laurent Besacier, Benjamin Lecouteux, Mohamed-Faouzi Benzeghiba
Les exp{\'e}rimentations sont r{\'e}alis{\'e}s sur un corpus de journaux num{\'e}ris{\'e}s en arabe et permettent d{'}obtenir des am{\'e}liorations en score BLEU de 3, 73 et 5, 5 sur les corpus de d{\'e}veloppement et de test respectivement.
no code implementations • 1 Jun 2017 • Elodie Gauthier, Laurent Besacier, Sylvie Voisin
Growing digital archives and improving algorithms for automatic analysis of text and speech create new research opportunities for fundamental research in phonetics.
1 code implementation • WS 2017 • Jeremy Ferrero, Laurent Besacier, Didier Schwab, Frederic Agnes
This paper is a deep investigation of cross-language plagiarism detection methods on a new recently introduced open dataset, which contains parallel and comparable collections of documents with multiple characteristics (different genres, languages and sizes of texts).
1 code implementation • SEMEVAL 2017 • Jeremy Ferrero, Frederic Agnes, Laurent Besacier, Didier Schwab
We present our submitted systems for Semantic Textual Similarity (STS) Track 4 at SemEval-2017.
no code implementations • EACL 2017 • J{\'e}r{\'e}my Ferrero, Laurent Besacier, Didier Schwab, Fr{\'e}d{\'e}ric Agn{\`e}s
This paper proposes to use distributed representation of words (word embeddings) in cross-language textual similarity detection.
1 code implementation • 21 Jan 2017 • Fréjus Laleye, Laurent Besacier, Eugène Ezin, Cina Motamed.
This paper reports our efforts toward an ASR system for a new under-resourced language (Fongbe).
Ranked #1 on
Speech Recognition
on Fongbe audio
1 code implementation • 6 Dec 2016 • Alexandre Berard, Olivier Pietquin, Christophe Servan, Laurent Besacier
This paper proposes a first attempt to build an end-to-end speech-to-text translation system, which does not use source language transcription during learning or decoding.
1 code implementation • COLING 2016 • Christophe Servan, Alexandre Berard, Zied Elloumi, Hervé Blanchon, Laurent Besacier
This paper presents an approach combining lexico-semantic resources and distributed representations of words applied to the evaluation in machine translation (MT).
no code implementations • COLING 2016 • Othman Zennaki, Nasredine Semmar, Laurent Besacier
This work focuses on the rapid development of linguistic annotation tools for resource-poor languages.
1 code implementation • 20 Sep 2016 • Ngoc-Tien Le, Benjamin Lecouteux, Laurent Besacier
This paper addresses automatic quality assessment of spoken language translation (SLT).
no code implementations • JEPTALNRECITAL 2016 • Christophe Servan, Zied Elloumi, Herv{\'e} Blanchon, Laurent Besacier
Cet article pr{\'e}sente une approche associant r{\'e}seaux lexico-s{\'e}mantiques et repr{\'e}sentations distribu{\'e}es de mots appliqu{\'e}e {\`a} l{'}{\'e}valuation de la traduction automatique.
no code implementations • JEPTALNRECITAL 2016 • Othman Zennaki, Nasredine Semmar, Laurent Besacier
Dans une pr{\'e}c{\'e}dente contribution, nous avons propos{\'e} une m{\'e}thode pour la construction automatique d{'}un analyseur morpho-syntaxique via une projection interlingue d{'}annotations linguistiques {\`a} partir de corpus parall{\`e}les (m{\'e}thode fond{\'e}e sur les r{\'e}seaux de neurones r{\'e}currents).
1 code implementation • LREC 2016 • Alex B{\'e}rard, re, Christophe Servan, Olivier Pietquin, Laurent Besacier
We present MultiVec, a new toolkit for computing continuous representations for text at different granularity levels (word-level or sequences of words).
1 code implementation • LREC 2016 • J{\'e}r{\'e}my Ferrero, Fr{\'e}d{\'e}ric Agn{\`e}s, Laurent Besacier, Didier Schwab
In this paper we describe our effort to create a dataset for the evaluation of cross-language textual similarity detection.
1 code implementation • LREC 2016 • Elodie Gauthier, Laurent Besacier, Sylvie Voisin, Michael Melese, Uriel Pascal Elingui
This article presents the data collected and ASR systems developped for 4 sub-saharan african languages (Swahili, Hausa, Amharic and Wolof).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • LREC 2016 • Johann Poignant, Mateusz Budnik, Herv{\'e} Bredin, Claude Barras, Mickael Stefas, Pierrick Bruneau, Gilles Adda, Laurent Besacier, Hazim Ekenel, Gil Francopoulo, Hern, Javier o, Joseph Mariani, Ramon Morros, Georges Qu{\'e}not, Sophie Rosset, Thomas Tamisier
In this paper, we describe the organization and the implementation of the CAMOMILE collaborative annotation framework for multimodal, multimedia, multilingual (3M) data.
no code implementations • JEPTALNRECITAL 2015 • Othman Zennaki, Nasredine Semmar, Laurent Besacier
La construction d{'}outils d{'}analyse linguistique pour les langues faiblement dot{\'e}es est limit{\'e}e, entre autres, par le manque de corpus annot{\'e}s. Dans cet article, nous proposons une m{\'e}thode pour construire automatiquement des outils d{'}analyse via une projection interlingue d{'}annotations linguistiques en utilisant des corpus parall{\`e}les.
no code implementations • JEPTALNRECITAL 2015 • Laurent Besacier, Benjamin Lecouteux, Luong Ngoc Quang
Les mesures de confiance au niveau mot (Word Confidence Estimation - WCE) pour la traduction auto- matique (TA) ou pour la reconnaissance automatique de la parole (RAP) attribuent un score de confiance {\`a} chaque mot dans une hypoth{\`e}se de transcription ou de traduction.
no code implementations • JEPTALNRECITAL 2012 • Fabrice Lef{\`e}vre, Djamel Mostefa, Laurent Besacier, Yannick Est{\`e}ve, Matthieu Quignard, Nathalie Camelin, Benoit Favre, Bassam Jabaian, Lina Rojas-Barahona
no code implementations • JEPTALNRECITAL 2012 • Hadrien Gelas, Laurent Besacier, Fran{\c{c}}ois Pellegrino
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • JEPTALNRECITAL 2012 • Hadrien Gelas, Solomon Teferra Abate, Laurent Besacier, Fran{\c{c}}ois Pellegrino
no code implementations • LREC 2012 • Marion Potet, Emmanuelle Esperan{\c{c}}a-Rodier, Laurent Besacier, Herv{\'e} Blanchon
We also post-edited 1, 500 gold-standard reference translations (of bilingual parallel corpora generated by professional) and noticed that 72 {\%} of these translations needed to be corrected during post-edition.
no code implementations • LREC 2012 • Fabrice Lef{\`e}vre, Djamel Mostefa, Laurent Besacier, Yannick Est{\`e}ve, Matthieu Quignard, Nathalie Camelin, Benoit Favre, Bassam Jabaian, Lina M. Rojas-Barahona
The PORTMEDIA project is intended to develop new corpora for the evaluation of spoken language understanding systems.