Search Results for author: Ronan Collobert

Found 53 papers, 23 papers with code

Flashlight: Enabling Innovation in Tools for Machine Learning

1 code implementation29 Jan 2022 Jacob Kahn, Vineel Pratap, Tatiana Likhomanenko, Qiantong Xu, Awni Hannun, Jeff Cai, Paden Tomasello, Ann Lee, Edouard Grave, Gilad Avidov, Benoit Steiner, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert

As the computational requirements for machine learning systems and the size and complexity of machine learning frameworks increases, essential framework innovation has become challenging.

Star Temporal Classification: Sequence Classification with Partially Labeled Data

1 code implementation28 Jan 2022 Vineel Pratap, Awni Hannun, Gabriel Synnaeve, Ronan Collobert

These experiments show that STC can recover most of the performance of supervised baseline when up to 70% of the labels are missing.

Automatic Speech Recognition Classification +1

Pseudo-Labeling for Massively Multilingual Speech Recognition

no code implementations30 Oct 2021 Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert

Semi-supervised learning through pseudo-labeling has become a staple of state-of-the-art monolingual speech recognition systems.

Speech Recognition

Word Order Does Not Matter For Speech Recognition

no code implementations12 Oct 2021 Vineel Pratap, Qiantong Xu, Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert

In this paper, we study training of automatic speech recognition system in a weakly supervised setting where the order of words in transcript labels of the audio training data is not known.

Automatic Speech Recognition

Kaizen: Continuously improving teacher using Exponential Moving Average for semi-supervised speech recognition

no code implementations14 Jun 2021 Vimal Manohar, Tatiana Likhomanenko, Qiantong Xu, Wei-Ning Hsu, Ronan Collobert, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed

In this paper, we introduce the Kaizen framework that uses a continuously improving teacher to generate pseudo-labels for semi-supervised speech recognition (ASR).

Frame Speech Recognition

Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training

2 code implementations2 Apr 2021 Wei-Ning Hsu, Anuroop Sriram, Alexei Baevski, Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Jacob Kahn, Ann Lee, Ronan Collobert, Gabriel Synnaeve, Michael Auli

On a large-scale competitive setup, we show that pre-training on unlabeled in-domain data reduces the gap between models trained on in-domain and out-of-domain labeled data by 66%-73%.

Self-Supervised Learning

MLS: A Large-Scale Multilingual Dataset for Speech Research

1 code implementation7 Dec 2020 Vineel Pratap, Qiantong Xu, Anuroop Sriram, Gabriel Synnaeve, Ronan Collobert

This paper introduces Multilingual LibriSpeech (MLS) dataset, a large multilingual corpus suitable for speech research.

Automatic Speech Recognition

Joint Masked CPC and CTC Training for ASR

1 code implementation30 Oct 2020 Chaitanya Talnikar, Tatiana Likhomanenko, Ronan Collobert, Gabriel Synnaeve

Self-supervised learning (SSL) has shown promise in learning representations of audio that are useful for automatic speech recognition (ASR).

Automatic Speech Recognition Self-Supervised Learning

Rethinking Evaluation in ASR: Are Our Models Robust Enough?

1 code implementation22 Oct 2020 Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Paden Tomasello, Jacob Kahn, Gilad Avidov, Ronan Collobert, Gabriel Synnaeve

Finally, we show that training a single acoustic model on the most widely-used datasets - combined - reaches competitive performance on both research and real-world benchmarks.

Automatic Speech Recognition

SlimIPL: Language-Model-Free Iterative Pseudo-Labeling

no code implementations22 Oct 2020 Tatiana Likhomanenko, Qiantong Xu, Jacob Kahn, Gabriel Synnaeve, Ronan Collobert

We improve upon the IPL algorithm: as the model learns, we propose to iteratively re-generate transcriptions with hard labels (the most probable tokens), that is, without a language model.

Automatic Speech Recognition

Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters

no code implementations6 Jul 2020 Vineel Pratap, Anuroop Sriram, Paden Tomasello, Awni Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert

We study training a single acoustic model for multiple languages with the aim of improving automatic speech recognition (ASR) performance on low-resource languages, and over-all simplifying deployment of ASR systems that support diverse languages.

Automatic Speech Recognition

Unsupervised Cross-lingual Representation Learning for Speech Recognition

4 code implementations24 Jun 2020 Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdel-rahman Mohamed, Michael Auli

This paper presents XLSR which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.

Quantization Representation Learning +1

Iterative Pseudo-Labeling for Speech Recognition

1 code implementation19 May 2020 Qiantong Xu, Tatiana Likhomanenko, Jacob Kahn, Awni Hannun, Gabriel Synnaeve, Ronan Collobert

In particular, IPL fine-tunes an existing model at each iteration using both labeled data and a subset of unlabeled data.

Ranked #9 on Speech Recognition on LibriSpeech test-other (using extra training data)

Automatic Speech Recognition Data Augmentation

Multi-scale Transformer Language Models

no code implementations1 May 2020 Sandeep Subramanian, Ronan Collobert, Marc'Aurelio Ranzato, Y-Lan Boureau

We investigate multi-scale transformer language models that learn representations of text at multiple scales, and present three different architectures that have an inductive bias to handle the hierarchical nature of language.

Language Modelling

Scaling Up Online Speech Recognition Using ConvNets

no code implementations27 Jan 2020 Vineel Pratap, Qiantong Xu, Jacob Kahn, Gilad Avidov, Tatiana Likhomanenko, Awni Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert

We design an online end-to-end speech recognition system based on Time-Depth Separable (TDS) convolutions and Connectionist Temporal Classification (CTC).

Speech Recognition

Libri-Light: A Benchmark for ASR with Limited or No Supervision

1 code implementation17 Dec 2019 Jacob Kahn, Morgane Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdel-rahman Mohamed, Emmanuel Dupoux

Additionally, we provide baseline systems and evaluation metrics working under three settings: (1) the zero resource/unsupervised setting (ABX), (2) the semi-supervised setting (PER, CER) and (3) the distant supervision setting (WER).

 Ranked #1 on Speech Recognition on Libri-Light test-other (ABX-across metric)

Speech Recognition

End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures

1 code implementation19 Nov 2019 Gabriel Synnaeve, Qiantong Xu, Jacob Kahn, Tatiana Likhomanenko, Edouard Grave, Vineel Pratap, Anuroop Sriram, Vitaliy Liptchinsky, Ronan Collobert

We study pseudo-labeling for the semi-supervised training of ResNet, Time-Depth Separable ConvNets, and Transformers for speech recognition, with either CTC or Seq2Seq loss functions.

Ranked #14 on Speech Recognition on LibriSpeech test-other (using extra training data)

Speech Recognition

wav2vec: Unsupervised Pre-training for Speech Recognition

5 code implementations11 Apr 2019 Steffen Schneider, Alexei Baevski, Ronan Collobert, Michael Auli

Our experiments on WSJ reduce WER of a strong character-based log-mel filterbank baseline by up to 36% when only a few hours of transcribed data is available.

Ranked #5 on Speech Recognition on TIMIT (using extra training data)

General Classification Speech Recognition +1

Who Needs Words? Lexicon-Free Speech Recognition

no code implementations9 Apr 2019 Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert

Lexicon-free speech recognition naturally deals with the problem of out-of-vocabulary (OOV) words.

Speech Recognition

Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions

no code implementations4 Apr 2019 Awni Hannun, Ann Lee, Qiantong Xu, Ronan Collobert

Coupled with a convolutional language model, our time-depth separable convolution architecture improves by more than 22% relative WER over the best previously reported sequence-to-sequence results on the noisy LibriSpeech test set.

Sequence-To-Sequence Speech Recognition

A Fully Differentiable Beam Search Decoder

1 code implementation16 Feb 2019 Ronan Collobert, Awni Hannun, Gabriel Synnaeve

We demonstrate our approach scales by applying it to speech recognition, jointly training acoustic and word-level language models.

Speech Recognition

Fully Convolutional Speech Recognition

no code implementations17 Dec 2018 Neil Zeghidour, Qiantong Xu, Vitaliy Liptchinsky, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert

In this paper we present an alternative approach based solely on convolutional neural networks, leveraging recent advances in acoustic models from the raw waveform and language modeling.

Speech Recognition

To Reverse the Gradient or Not: An Empirical Comparison of Adversarial and Multi-task Learning in Speech Recognition

no code implementations9 Dec 2018 Yossi Adi, Neil Zeghidour, Ronan Collobert, Nicolas Usunier, Vitaliy Liptchinsky, Gabriel Synnaeve

In multi-task learning, the goal is speaker prediction; we expect a performance improvement with this joint training if the two tasks of speech recognition and speaker recognition share a common set of underlying features.

Multi-Task Learning Speaker Recognition +1

End-to-End Speech Recognition From the Raw Waveform

1 code implementation19 Jun 2018 Neil Zeghidour, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert, Emmanuel Dupoux

In this paper, we study end-to-end systems trained directly from the raw waveform, building on two alternatives for trainable replacements of mel-filterbanks that use a convolutional architecture.

Speech Recognition

Gated ConvNets for Letter-Based ASR

no code implementations ICLR 2018 Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert

In this paper we introduce a new speech recognition system, leveraging a simple letter-based ConvNet acoustic model.

Speech Recognition

Letter-Based Speech Recognition with Gated ConvNets

2 code implementations22 Dec 2017 Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert

In the recent literature, "end-to-end" speech systems often refer to letter-based acoustic models trained in a sequence-to-sequence manner, either via a recurrent model or via a structured output learning approach (such as CTC).

Speech Recognition

Wav2Letter: an End-to-End ConvNet-based Speech Recognition System

8 code implementations arXiv 2016 Ronan Collobert, Christian Puhrsch, Gabriel Synnaeve

This paper presents a simple end-to-end model for speech recognition, combining a convolutional network based acoustic model and a graph decoding.

Speech Recognition

Neural Network-based Word Alignment through Score Aggregation

no code implementations WS 2016 Joel Legrand, Michael Auli, Ronan Collobert

We present a simple neural network for word alignment that builds source and target word window representations to compute alignment scores for sentence pairs.

Word Alignment

Learning to Refine Object Segments

2 code implementations29 Mar 2016 Pedro O. Pinheiro, Tsung-Yi Lin, Ronan Collobert, Piotr Dollàr

In this work we propose to augment feedforward nets for object segmentation with a novel top-down refinement approach.

Semantic Segmentation

Learning to Segment Object Candidates

2 code implementations NeurIPS 2015 Pedro O. Pinheiro, Ronan Collobert, Piotr Dollar

Recent object detection systems rely on two critical steps: (1) a set of object proposals is predicted as efficiently as possible, and (2) this set of candidate proposals is then passed to an object classifier.

Object Detection Superpixels

"The Sum of Its Parts": Joint Learning of Word and Phrase Representations with Autoencoders

no code implementations18 Jun 2015 Rémi Lebret, Ronan Collobert

We evaluate the quality of the word representations on several classical word evaluation tasks, and we introduce a novel task to evaluate the quality of the phrase representations.

Phrase-based Image Captioning

no code implementations12 Feb 2015 Rémi Lebret, Pedro O. Pinheiro, Ronan Collobert

Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing.

Image Captioning Language Modelling

Simple Image Description Generator via a Linear Phrase-Based Approach

no code implementations29 Dec 2014 Remi Lebret, Pedro O. Pinheiro, Ronan Collobert

Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing.

Language Modelling

Joint RNN-Based Greedy Parsing and Word Composition

no code implementations22 Dec 2014 Joël Legrand, Ronan Collobert

This paper introduces a greedy parser based on neural networks, which leverages a new compositional sub-tree representation.


Learning linearly separable features for speech recognition using convolutional neural networks

no code implementations22 Dec 2014 Dimitri Palaz, Mathew Magimai. -Doss, Ronan Collobert

This system was shown to yield similar or better performance than HMM/ANN based system on phoneme recognition task and on large scale continuous speech recognition task, using less parameters.

Automatic Speech Recognition

N-gram-Based Low-Dimensional Representation for Document Classification

no code implementations19 Dec 2014 Rémi Lebret, Ronan Collobert

The number of features is therefore dramatically reduced and documents can be represented as bag of semantic concepts.

Classification Document Classification +2

Rehabilitation of Count-based Models for Word Vector Representations

no code implementations16 Dec 2014 Rémi Lebret, Ronan Collobert

We present a systematic study of the use of the Hellinger distance to extract semantic representations from the word co-occurence statistics of large text corpora.

Dimensionality Reduction Word Embeddings +1

From Image-level to Pixel-level Labeling with Convolutional Networks

no code implementations CVPR 2015 Pedro O. Pinheiro, Ronan Collobert

We are interested in inferring object segmentation by leveraging only object class information, and by considering only minimal priors on the object segmentation task.

Multiple Instance Learning Weakly supervised segmentation +1

Word Emdeddings through Hellinger PCA

no code implementations19 Dec 2013 Rémi Lebret, Ronan Collobert

Word embeddings resulting from neural language models have been shown to be successful for a large variety of NLP tasks.

NER Word Embeddings

End-to-end Phoneme Sequence Recognition using Convolutional Neural Networks

1 code implementation7 Dec 2013 Dimitri Palaz, Ronan Collobert, Mathew Magimai. -Doss

Most phoneme recognition state-of-the-art systems rely on a classical neural network classifiers, fed with highly tuned features, such as MFCC or PLP features.

Recurrent Convolutional Neural Networks for Scene Parsing

no code implementations12 Jun 2013 Pedro H. O. Pinheiro, Ronan Collobert

Scene parsing is a technique that consist on giving a label to all pixels in an image according to the class they belong to.

Scene Parsing

Estimating Phoneme Class Conditional Probabilities from Raw Speech Signal using Convolutional Neural Networks

no code implementations3 Apr 2013 Dimitri Palaz, Ronan Collobert, Mathew Magimai. -Doss

Motivated from these studies, in the framework of convolutional neural networks (CNNs), this paper investigates a novel approach, where the input to the ANN is raw speech signal and the output is phoneme class conditional probability estimates.

Automatic Speech Recognition

Natural Language Processing (almost) from Scratch

1 code implementation2 Mar 2011 Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, Pavel Kuksa

We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including: part-of-speech tagging, chunking, named entity recognition, and semantic role labeling.

Chunking Named Entity Recognition +2

Polynomial Semantic Indexing

no code implementations NeurIPS 2009 Bing Bai, Jason Weston, David Grangier, Ronan Collobert, Kunihiko Sadamasa, Yanjun Qi, Corinna Cortes, Mehryar Mohri

We present a class of nonlinear (polynomial) models that are discriminatively trained to directly map from the word content in a query-document or document-document pair to a ranking score.

Cannot find the paper you are looking for? You can Submit a new open access paper.