1 code implementation • 29 Jan 2022 • Jacob Kahn, Vineel Pratap, Tatiana Likhomanenko, Qiantong Xu, Awni Hannun, Jeff Cai, Paden Tomasello, Ann Lee, Edouard Grave, Gilad Avidov, Benoit Steiner, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert
As the computational requirements for machine learning systems and the size and complexity of machine learning frameworks increases, essential framework innovation has become challenging.
1 code implementation • 28 Jan 2022 • Vineel Pratap, Awni Hannun, Gabriel Synnaeve, Ronan Collobert
These experiments show that STC can recover most of the performance of supervised baseline when up to 70% of the labels are missing.
no code implementations • 30 Oct 2021 • Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert
Semi-supervised learning through pseudo-labeling has become a staple of state-of-the-art monolingual speech recognition systems.
no code implementations • 12 Oct 2021 • Vineel Pratap, Qiantong Xu, Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert
In this paper, we study training of automatic speech recognition system in a weakly supervised setting where the order of words in transcript labels of the audio training data is not known.
no code implementations • 14 Jun 2021 • Vimal Manohar, Tatiana Likhomanenko, Qiantong Xu, Wei-Ning Hsu, Ronan Collobert, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed
In this paper, we introduce the Kaizen framework that uses a continuously improving teacher to generate pseudo-labels for semi-supervised speech recognition (ASR).
1 code implementation • NeurIPS 2021 • Tatiana Likhomanenko, Qiantong Xu, Gabriel Synnaeve, Ronan Collobert, Alex Rogozhnikov
Absolute or relative positional embeddings are the most popular ways to feed Transformer models with positional information.
2 code implementations • 2 Apr 2021 • Wei-Ning Hsu, Anuroop Sriram, Alexei Baevski, Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Jacob Kahn, Ann Lee, Ronan Collobert, Gabriel Synnaeve, Michael Auli
On a large-scale competitive setup, we show that pre-training on unlabeled in-domain data reduces the gap between models trained on in-domain and out-of-domain labeled data by 66%-73%.
1 code implementation • 7 Dec 2020 • Vineel Pratap, Qiantong Xu, Anuroop Sriram, Gabriel Synnaeve, Ronan Collobert
This paper introduces Multilingual LibriSpeech (MLS) dataset, a large multilingual corpus suitable for speech research.
1 code implementation • 30 Oct 2020 • Chaitanya Talnikar, Tatiana Likhomanenko, Ronan Collobert, Gabriel Synnaeve
Self-supervised learning (SSL) has shown promise in learning representations of audio that are useful for automatic speech recognition (ASR).
3 code implementations • 22 Oct 2020 • Qiantong Xu, Alexei Baevski, Tatiana Likhomanenko, Paden Tomasello, Alexis Conneau, Ronan Collobert, Gabriel Synnaeve, Michael Auli
Self-training and unsupervised pre-training have emerged as effective approaches to improve speech recognition systems using unlabeled data.
Ranked #1 on
Speech Recognition
on LibriSpeech train-clean-100 test-other
(using extra training data)
1 code implementation • 22 Oct 2020 • Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Paden Tomasello, Jacob Kahn, Gilad Avidov, Ronan Collobert, Gabriel Synnaeve
Finally, we show that training a single acoustic model on the most widely-used datasets - combined - reaches competitive performance on both research and real-world benchmarks.
no code implementations • 22 Oct 2020 • Tatiana Likhomanenko, Qiantong Xu, Jacob Kahn, Gabriel Synnaeve, Ronan Collobert
We improve upon the IPL algorithm: as the model learns, we propose to iteratively re-generate transcriptions with hard labels (the most probable tokens), that is, without a language model.
no code implementations • 6 Jul 2020 • Vineel Pratap, Anuroop Sriram, Paden Tomasello, Awni Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert
We study training a single acoustic model for multiple languages with the aim of improving automatic speech recognition (ASR) performance on low-resource languages, and over-all simplifying deployment of ASR systems that support diverse languages.
4 code implementations • 24 Jun 2020 • Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdel-rahman Mohamed, Michael Auli
This paper presents XLSR which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.
1 code implementation • 19 May 2020 • Qiantong Xu, Tatiana Likhomanenko, Jacob Kahn, Awni Hannun, Gabriel Synnaeve, Ronan Collobert
In particular, IPL fine-tunes an existing model at each iteration using both labeled data and a subset of unlabeled data.
Ranked #9 on
Speech Recognition
on LibriSpeech test-other
(using extra training data)
no code implementations • 1 May 2020 • Sandeep Subramanian, Ronan Collobert, Marc'Aurelio Ranzato, Y-Lan Boureau
We investigate multi-scale transformer language models that learn representations of text at multiple scales, and present three different architectures that have an inductive bias to handle the hierarchical nature of language.
no code implementations • 27 Jan 2020 • Vineel Pratap, Qiantong Xu, Jacob Kahn, Gilad Avidov, Tatiana Likhomanenko, Awni Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert
We design an online end-to-end speech recognition system based on Time-Depth Separable (TDS) convolutions and Connectionist Temporal Classification (CTC).
1 code implementation • 17 Dec 2019 • Jacob Kahn, Morgane Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdel-rahman Mohamed, Emmanuel Dupoux
Additionally, we provide baseline systems and evaluation metrics working under three settings: (1) the zero resource/unsupervised setting (ABX), (2) the semi-supervised setting (PER, CER) and (3) the distant supervision setting (WER).
Ranked #1 on
Speech Recognition
on Libri-Light test-other
(ABX-across metric)
1 code implementation • 19 Nov 2019 • Gabriel Synnaeve, Qiantong Xu, Jacob Kahn, Tatiana Likhomanenko, Edouard Grave, Vineel Pratap, Anuroop Sriram, Vitaliy Liptchinsky, Ronan Collobert
We study pseudo-labeling for the semi-supervised training of ResNet, Time-Depth Separable ConvNets, and Transformers for speech recognition, with either CTC or Seq2Seq loss functions.
Ranked #14 on
Speech Recognition
on LibriSpeech test-other
(using extra training data)
no code implementations • ICML 2020 • Ronan Collobert, Awni Hannun, Gabriel Synnaeve
We propose a direct-to-word sequence model which uses a word network to learn word embeddings from letters.
5 code implementations • 11 Apr 2019 • Steffen Schneider, Alexei Baevski, Ronan Collobert, Michael Auli
Our experiments on WSJ reduce WER of a strong character-based log-mel filterbank baseline by up to 36% when only a few hours of transcribed data is available.
Ranked #5 on
Speech Recognition
on TIMIT
(using extra training data)
no code implementations • 9 Apr 2019 • Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert
Lexicon-free speech recognition naturally deals with the problem of out-of-vocabulary (OOV) words.
no code implementations • 4 Apr 2019 • Awni Hannun, Ann Lee, Qiantong Xu, Ronan Collobert
Coupled with a convolutional language model, our time-depth separable convolution architecture improves by more than 22% relative WER over the best previously reported sequence-to-sequence results on the noisy LibriSpeech test set.
1 code implementation • 16 Feb 2019 • Ronan Collobert, Awni Hannun, Gabriel Synnaeve
We demonstrate our approach scales by applying it to speech recognition, jointly training acoustic and word-level language models.
8 code implementations • 18 Dec 2018 • Vineel Pratap, Awni Hannun, Qiantong Xu, Jeff Cai, Jacob Kahn, Gabriel Synnaeve, Vitaliy Liptchinsky, Ronan Collobert
This paper introduces wav2letter++, the fastest open-source deep learning speech recognition framework.
no code implementations • 17 Dec 2018 • Neil Zeghidour, Qiantong Xu, Vitaliy Liptchinsky, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert
In this paper we present an alternative approach based solely on convolutional neural networks, leveraging recent advances in acoustic models from the raw waveform and language modeling.
Ranked #5 on
Speech Recognition
on WSJ eval93
no code implementations • 9 Dec 2018 • Yossi Adi, Neil Zeghidour, Ronan Collobert, Nicolas Usunier, Vitaliy Liptchinsky, Gabriel Synnaeve
In multi-task learning, the goal is speaker prediction; we expect a performance improvement with this joint training if the two tasks of speech recognition and speaker recognition share a common set of underlying features.
1 code implementation • 19 Jun 2018 • Neil Zeghidour, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert, Emmanuel Dupoux
In this paper, we study end-to-end systems trained directly from the raw waveform, building on two alternatives for trainable replacements of mel-filterbanks that use a convolutional architecture.
no code implementations • ICLR 2018 • Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert
In this paper we introduce a new speech recognition system, leveraging a simple letter-based ConvNet acoustic model.
2 code implementations • 22 Dec 2017 • Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert
In the recent literature, "end-to-end" speech systems often refer to letter-based acoustic models trained in a sequence-to-sequence manner, either via a recurrent model or via a structured output learning approach (such as CTC).
Ranked #40 on
Speech Recognition
on LibriSpeech test-clean
8 code implementations • arXiv 2016 • Ronan Collobert, Christian Puhrsch, Gabriel Synnaeve
This paper presents a simple end-to-end model for speech recognition, combining a convolutional network based acoustic model and a graph decoding.
no code implementations • WS 2016 • Joel Legrand, Michael Auli, Ronan Collobert
We present a simple neural network for word alignment that builds source and target word window representations to compute alignment scores for sentence pairs.
2 code implementations • 29 Mar 2016 • Pedro O. Pinheiro, Tsung-Yi Lin, Ronan Collobert, Piotr Dollàr
In this work we propose to augment feedforward nets for object segmentation with a novel top-down refinement approach.
Ranked #4 on
Region Proposal
on COCO test-dev
no code implementations • CVPR 2016 • Chen Sun, Manohar Paluri, Ronan Collobert, Ram Nevatia, Lubomir Bourdev
This paper aims to classify and locate objects accurately and efficiently, without using bounding box annotations.
Ranked #5 on
Weakly Supervised Object Detection
on COCO
2 code implementations • NeurIPS 2015 • Pedro O. Pinheiro, Ronan Collobert, Piotr Dollar
Recent object detection systems rely on two critical steps: (1) a set of object proposals is predicted as efficiently as possible, and (2) this set of candidate proposals is then passed to an object classifier.
no code implementations • 18 Jun 2015 • Rémi Lebret, Ronan Collobert
We evaluate the quality of the word representations on several classical word evaluation tasks, and we introduce a novel task to evaluate the quality of the phrase representations.
no code implementations • 12 Feb 2015 • Rémi Lebret, Pedro O. Pinheiro, Ronan Collobert
Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing.
no code implementations • 29 Dec 2014 • Remi Lebret, Pedro O. Pinheiro, Ronan Collobert
Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing.
no code implementations • 22 Dec 2014 • Joël Legrand, Ronan Collobert
This paper introduces a greedy parser based on neural networks, which leverages a new compositional sub-tree representation.
no code implementations • 22 Dec 2014 • Dimitri Palaz, Mathew Magimai. -Doss, Ronan Collobert
This system was shown to yield similar or better performance than HMM/ANN based system on phoneme recognition task and on large scale continuous speech recognition task, using less parameters.
1 code implementation • 20 Dec 2014 • MarcAurelio Ranzato, Arthur Szlam, Joan Bruna, Michael Mathieu, Ronan Collobert, Sumit Chopra
We propose a strong baseline model for unsupervised feature learning using video data.
no code implementations • 19 Dec 2014 • Rémi Lebret, Ronan Collobert
The number of features is therefore dramatically reduced and documents can be represented as bag of semantic concepts.
no code implementations • 16 Dec 2014 • Rémi Lebret, Ronan Collobert
We present a systematic study of the use of the Hellinger distance to extract semantic representations from the word co-occurence statistics of large text corpora.
no code implementations • CVPR 2015 • Pedro O. Pinheiro, Ronan Collobert
We are interested in inferring object segmentation by leveraging only object class information, and by considering only minimal priors on the object segmentation task.
Multiple Instance Learning
Weakly supervised segmentation
+1
no code implementations • 19 Dec 2013 • Rémi Lebret, Ronan Collobert
Word embeddings resulting from neural language models have been shown to be successful for a large variety of NLP tasks.
1 code implementation • 7 Dec 2013 • Dimitri Palaz, Ronan Collobert, Mathew Magimai. -Doss
Most phoneme recognition state-of-the-art systems rely on a classical neural network classifiers, fed with highly tuned features, such as MFCC or PLP features.
no code implementations • 12 Jun 2013 • Pedro H. O. Pinheiro, Ronan Collobert
Scene parsing is a technique that consist on giving a label to all pixels in an image according to the class they belong to.
no code implementations • 3 Apr 2013 • Dimitri Palaz, Ronan Collobert, Mathew Magimai. -Doss
Motivated from these studies, in the framework of convolutional neural networks (CNNs), this paper investigates a novel approach, where the input to the ANN is raw speech signal and the output is phoneme class conditional probability estimates.
1 code implementation • 2 Mar 2011 • Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, Pavel Kuksa
We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including: part-of-speech tagging, chunking, named entity recognition, and semantic role labeling.
no code implementations • NeurIPS 2009 • Bing Bai, Jason Weston, David Grangier, Ronan Collobert, Kunihiko Sadamasa, Yanjun Qi, Corinna Cortes, Mehryar Mohri
We present a class of nonlinear (polynomial) models that are discriminatively trained to directly map from the word content in a query-document or document-document pair to a ranking score.