no code implementations • 4 Oct 2024 • Anastasiia Filippova, Angelos Katharopoulos, David Grangier, Ronan Collobert
We introduce SmallTalk LM, an innovative method for training a mixture of language models in an almost asynchronous manner.
no code implementations • 24 May 2024 • Zijin Gu, Tatiana Likhomanenko, He Bai, Erik McDermott, Ronan Collobert, Navdeep Jaitly
Language models (LMs) have long been used to improve results of automatic speech recognition (ASR) systems, but they are unaware of the errors that ASR systems make.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 29 Sep 2023 • Andrew Rouditchenko, Ronan Collobert, Tatiana Likhomanenko
Audio-visual speech contains synchronized audio and visual information that provides cross-modal supervision to learn representations for both automatic speech recognition (ASR) and visual speech recognition (VSR).
Audio-Visual Speech Recognition
Automatic Speech Recognition
+4
no code implementations • 19 May 2023 • Tatiana Likhomanenko, Loren Lugosch, Ronan Collobert
Here, "unsupervised" means no labeled audio is available for the $\textit{target}$ language.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 11 Nov 2022 • Tatiana Likhomanenko, Ronan Collobert, Navdeep Jaitly, Samy Bengio
Continuous pseudo-labeling (PL) algorithms such as slimIPL have recently emerged as a powerful strategy for semi-supervised learning in speech recognition.
no code implementations • 2 Nov 2022 • Dan Berrebbi, Ronan Collobert, Navdeep Jaitly, Tatiana Likhomanenko
We perform a systematic analysis on both labeled and unlabeled data by varying the number of speakers while keeping the number of hours fixed and vice versa.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 17 Oct 2022 • Dan Berrebbi, Ronan Collobert, Samy Bengio, Navdeep Jaitly, Tatiana Likhomanenko
Nevertheless, these approaches still rely on bootstrapping the ST using an initial supervised learning phase where the model is trained on labeled data alone.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
2 code implementations • 29 Jan 2022 • Jacob Kahn, Vineel Pratap, Tatiana Likhomanenko, Qiantong Xu, Awni Hannun, Jeff Cai, Paden Tomasello, Ann Lee, Edouard Grave, Gilad Avidov, Benoit Steiner, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert
This is in part due to the difficulties involved in prototyping new computational paradigms with existing frameworks.
1 code implementation • 28 Jan 2022 • Vineel Pratap, Awni Hannun, Gabriel Synnaeve, Ronan Collobert
These experiments show that STC can recover most of the performance of supervised baseline when up to 70% of the labels are missing.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 30 Oct 2021 • Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert
Semi-supervised learning through pseudo-labeling has become a staple of state-of-the-art monolingual speech recognition systems.
no code implementations • 12 Oct 2021 • Vineel Pratap, Qiantong Xu, Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert
In this paper, we study training of automatic speech recognition system in a weakly supervised setting where the order of words in transcript labels of the audio training data is not known.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 14 Jun 2021 • Vimal Manohar, Tatiana Likhomanenko, Qiantong Xu, Wei-Ning Hsu, Ronan Collobert, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed
In this paper, we introduce the Kaizen framework that uses a continuously improving teacher to generate pseudo-labels for semi-supervised speech recognition (ASR).
1 code implementation • NeurIPS 2021 • Tatiana Likhomanenko, Qiantong Xu, Gabriel Synnaeve, Ronan Collobert, Alex Rogozhnikov
Absolute or relative positional embeddings are the most popular ways to feed Transformer models with positional information.
3 code implementations • 2 Apr 2021 • Wei-Ning Hsu, Anuroop Sriram, Alexei Baevski, Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Jacob Kahn, Ann Lee, Ronan Collobert, Gabriel Synnaeve, Michael Auli
On a large-scale competitive setup, we show that pre-training on unlabeled in-domain data reduces the gap between models trained on in-domain and out-of-domain labeled data by 66%-73%.
1 code implementation • 7 Dec 2020 • Vineel Pratap, Qiantong Xu, Anuroop Sriram, Gabriel Synnaeve, Ronan Collobert
This paper introduces Multilingual LibriSpeech (MLS) dataset, a large multilingual corpus suitable for speech research.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 30 Oct 2020 • Chaitanya Talnikar, Tatiana Likhomanenko, Ronan Collobert, Gabriel Synnaeve
Self-supervised learning (SSL) has shown promise in learning representations of audio that are useful for automatic speech recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 22 Oct 2020 • Tatiana Likhomanenko, Qiantong Xu, Jacob Kahn, Gabriel Synnaeve, Ronan Collobert
We improve upon the IPL algorithm: as the model learns, we propose to iteratively re-generate transcriptions with hard labels (the most probable tokens), that is, without a language model.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
1 code implementation • 22 Oct 2020 • Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Paden Tomasello, Jacob Kahn, Gilad Avidov, Ronan Collobert, Gabriel Synnaeve
Finally, we show that training a single acoustic model on the most widely-used datasets - combined - reaches competitive performance on both research and real-world benchmarks.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
3 code implementations • 22 Oct 2020 • Qiantong Xu, Alexei Baevski, Tatiana Likhomanenko, Paden Tomasello, Alexis Conneau, Ronan Collobert, Gabriel Synnaeve, Michael Auli
Self-training and unsupervised pre-training have emerged as effective approaches to improve speech recognition systems using unlabeled data.
Ranked #1 on
Speech Recognition
on LibriSpeech train-clean-100 test-other
(using extra training data)
no code implementations • 6 Jul 2020 • Vineel Pratap, Anuroop Sriram, Paden Tomasello, Awni Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert
We study training a single acoustic model for multiple languages with the aim of improving automatic speech recognition (ASR) performance on low-resource languages, and over-all simplifying deployment of ASR systems that support diverse languages.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
8 code implementations • 24 Jun 2020 • Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdel-rahman Mohamed, Michael Auli
This paper presents XLSR which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.
1 code implementation • 19 May 2020 • Qiantong Xu, Tatiana Likhomanenko, Jacob Kahn, Awni Hannun, Gabriel Synnaeve, Ronan Collobert
In particular, IPL fine-tunes an existing model at each iteration using both labeled data and a subset of unlabeled data.
Ranked #13 on
Speech Recognition
on LibriSpeech test-other
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 1 May 2020 • Sandeep Subramanian, Ronan Collobert, Marc'Aurelio Ranzato, Y-Lan Boureau
We investigate multi-scale transformer language models that learn representations of text at multiple scales, and present three different architectures that have an inductive bias to handle the hierarchical nature of language.
no code implementations • 27 Jan 2020 • Vineel Pratap, Qiantong Xu, Jacob Kahn, Gilad Avidov, Tatiana Likhomanenko, Awni Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert
We design an online end-to-end speech recognition system based on Time-Depth Separable (TDS) convolutions and Connectionist Temporal Classification (CTC).
2 code implementations • 17 Dec 2019 • Jacob Kahn, Morgane Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdel-rahman Mohamed, Emmanuel Dupoux
Additionally, we provide baseline systems and evaluation metrics working under three settings: (1) the zero resource/unsupervised setting (ABX), (2) the semi-supervised setting (PER, CER) and (3) the distant supervision setting (WER).
Ranked #1 on
Speech Recognition
on Libri-Light test-other
(ABX-within metric)
1 code implementation • 19 Nov 2019 • Gabriel Synnaeve, Qiantong Xu, Jacob Kahn, Tatiana Likhomanenko, Edouard Grave, Vineel Pratap, Anuroop Sriram, Vitaliy Liptchinsky, Ronan Collobert
We study pseudo-labeling for the semi-supervised training of ResNet, Time-Depth Separable ConvNets, and Transformers for speech recognition, with either CTC or Seq2Seq loss functions.
Ranked #19 on
Speech Recognition
on LibriSpeech test-other
(using extra training data)
no code implementations • ICML 2020 • Ronan Collobert, Awni Hannun, Gabriel Synnaeve
We propose a direct-to-word sequence model which uses a word network to learn word embeddings from letters.
7 code implementations • 11 Apr 2019 • Steffen Schneider, Alexei Baevski, Ronan Collobert, Michael Auli
Our experiments on WSJ reduce WER of a strong character-based log-mel filterbank baseline by up to 36% when only a few hours of transcribed data is available.
Ranked #5 on
Speech Recognition
on TIMIT
(using extra training data)
no code implementations • 9 Apr 2019 • Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert
Lexicon-free speech recognition naturally deals with the problem of out-of-vocabulary (OOV) words.
no code implementations • 4 Apr 2019 • Awni Hannun, Ann Lee, Qiantong Xu, Ronan Collobert
Coupled with a convolutional language model, our time-depth separable convolution architecture improves by more than 22% relative WER over the best previously reported sequence-to-sequence results on the noisy LibriSpeech test set.
1 code implementation • 16 Feb 2019 • Ronan Collobert, Awni Hannun, Gabriel Synnaeve
We demonstrate our approach scales by applying it to speech recognition, jointly training acoustic and word-level language models.
8 code implementations • 18 Dec 2018 • Vineel Pratap, Awni Hannun, Qiantong Xu, Jeff Cai, Jacob Kahn, Gabriel Synnaeve, Vitaliy Liptchinsky, Ronan Collobert
This paper introduces wav2letter++, the fastest open-source deep learning speech recognition framework.
no code implementations • 17 Dec 2018 • Neil Zeghidour, Qiantong Xu, Vitaliy Liptchinsky, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert
In this paper we present an alternative approach based solely on convolutional neural networks, leveraging recent advances in acoustic models from the raw waveform and language modeling.
Ranked #3 on
Speech Recognition
on WSJ eval93
no code implementations • 9 Dec 2018 • Yossi Adi, Neil Zeghidour, Ronan Collobert, Nicolas Usunier, Vitaliy Liptchinsky, Gabriel Synnaeve
In multi-task learning, the goal is speaker prediction; we expect a performance improvement with this joint training if the two tasks of speech recognition and speaker recognition share a common set of underlying features.
1 code implementation • 19 Jun 2018 • Neil Zeghidour, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert, Emmanuel Dupoux
In this paper, we study end-to-end systems trained directly from the raw waveform, building on two alternatives for trainable replacements of mel-filterbanks that use a convolutional architecture.
no code implementations • ICLR 2018 • Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert
In this paper we introduce a new speech recognition system, leveraging a simple letter-based ConvNet acoustic model.
2 code implementations • 22 Dec 2017 • Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert
In the recent literature, "end-to-end" speech systems often refer to letter-based acoustic models trained in a sequence-to-sequence manner, either via a recurrent model or via a structured output learning approach (such as CTC).
Ranked #55 on
Speech Recognition
on LibriSpeech test-clean
9 code implementations • arXiv 2016 • Ronan Collobert, Christian Puhrsch, Gabriel Synnaeve
This paper presents a simple end-to-end model for speech recognition, combining a convolutional network based acoustic model and a graph decoding.
no code implementations • WS 2016 • Joel Legrand, Michael Auli, Ronan Collobert
We present a simple neural network for word alignment that builds source and target word window representations to compute alignment scores for sentence pairs.
2 code implementations • 29 Mar 2016 • Pedro O. Pinheiro, Tsung-Yi Lin, Ronan Collobert, Piotr Dollàr
In this work we propose to augment feedforward nets for object segmentation with a novel top-down refinement approach.
Ranked #4 on
Region Proposal
on COCO test-dev
no code implementations • CVPR 2016 • Chen Sun, Manohar Paluri, Ronan Collobert, Ram Nevatia, Lubomir Bourdev
This paper aims to classify and locate objects accurately and efficiently, without using bounding box annotations.
Ranked #5 on
Weakly Supervised Object Detection
on MS COCO
2 code implementations • NeurIPS 2015 • Pedro O. Pinheiro, Ronan Collobert, Piotr Dollar
Recent object detection systems rely on two critical steps: (1) a set of object proposals is predicted as efficiently as possible, and (2) this set of candidate proposals is then passed to an object classifier.
no code implementations • 18 Jun 2015 • Rémi Lebret, Ronan Collobert
We evaluate the quality of the word representations on several classical word evaluation tasks, and we introduce a novel task to evaluate the quality of the phrase representations.
no code implementations • 12 Feb 2015 • Rémi Lebret, Pedro O. Pinheiro, Ronan Collobert
Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing.
no code implementations • 29 Dec 2014 • Remi Lebret, Pedro O. Pinheiro, Ronan Collobert
Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing.
no code implementations • 22 Dec 2014 • Dimitri Palaz, Mathew Magimai. -Doss, Ronan Collobert
This system was shown to yield similar or better performance than HMM/ANN based system on phoneme recognition task and on large scale continuous speech recognition task, using less parameters.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 22 Dec 2014 • Joël Legrand, Ronan Collobert
This paper introduces a greedy parser based on neural networks, which leverages a new compositional sub-tree representation.
1 code implementation • 20 Dec 2014 • MarcAurelio Ranzato, Arthur Szlam, Joan Bruna, Michael Mathieu, Ronan Collobert, Sumit Chopra
We propose a strong baseline model for unsupervised feature learning using video data.
no code implementations • 19 Dec 2014 • Rémi Lebret, Ronan Collobert
The number of features is therefore dramatically reduced and documents can be represented as bag of semantic concepts.
no code implementations • 16 Dec 2014 • Rémi Lebret, Ronan Collobert
We present a systematic study of the use of the Hellinger distance to extract semantic representations from the word co-occurence statistics of large text corpora.
no code implementations • CVPR 2015 • Pedro O. Pinheiro, Ronan Collobert
We are interested in inferring object segmentation by leveraging only object class information, and by considering only minimal priors on the object segmentation task.
no code implementations • 19 Dec 2013 • Rémi Lebret, Ronan Collobert
Word embeddings resulting from neural language models have been shown to be successful for a large variety of NLP tasks.
1 code implementation • 7 Dec 2013 • Dimitri Palaz, Ronan Collobert, Mathew Magimai. -Doss
Most phoneme recognition state-of-the-art systems rely on a classical neural network classifiers, fed with highly tuned features, such as MFCC or PLP features.
no code implementations • 12 Jun 2013 • Pedro H. O. Pinheiro, Ronan Collobert
Scene parsing is a technique that consist on giving a label to all pixels in an image according to the class they belong to.
no code implementations • 3 Apr 2013 • Dimitri Palaz, Ronan Collobert, Mathew Magimai. -Doss
Motivated from these studies, in the framework of convolutional neural networks (CNNs), this paper investigates a novel approach, where the input to the ANN is raw speech signal and the output is phoneme class conditional probability estimates.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 2 Mar 2011 • Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, Pavel Kuksa
We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including: part-of-speech tagging, chunking, named entity recognition, and semantic role labeling.
no code implementations • NeurIPS 2009 • Bing Bai, Jason Weston, David Grangier, Ronan Collobert, Kunihiko Sadamasa, Yanjun Qi, Corinna Cortes, Mehryar Mohri
We present a class of nonlinear (polynomial) models that are discriminatively trained to directly map from the word content in a query-document or document-document pair to a ranking score.