Search Results for author: Ronan Collobert

Found 58 papers, 23 papers with code

AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition

no code implementations29 Sep 2023 Andrew Rouditchenko, Ronan Collobert, Tatiana Likhomanenko

Audio-visual speech contains synchronized audio and visual information that provides cross-modal supervision to learn representations for both automatic speech recognition (ASR) and visual speech recognition (VSR).

Audio-Visual Speech Recognition Automatic Speech Recognition +4

Continuous Soft Pseudo-Labeling in ASR

no code implementations11 Nov 2022 Tatiana Likhomanenko, Ronan Collobert, Navdeep Jaitly, Samy Bengio

Continuous pseudo-labeling (PL) algorithms such as slimIPL have recently emerged as a powerful strategy for semi-supervised learning in speech recognition.

speech-recognition Speech Recognition

More Speaking or More Speakers?

no code implementations2 Nov 2022 Dan Berrebbi, Ronan Collobert, Navdeep Jaitly, Tatiana Likhomanenko

We perform a systematic analysis on both labeled and unlabeled data by varying the number of speakers while keeping the number of hours fixed and vice versa.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Continuous Pseudo-Labeling from the Start

no code implementations17 Oct 2022 Dan Berrebbi, Ronan Collobert, Samy Bengio, Navdeep Jaitly, Tatiana Likhomanenko

Nevertheless, these approaches still rely on bootstrapping the ST using an initial supervised learning phase where the model is trained on labeled data alone.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Pseudo-Labeling for Massively Multilingual Speech Recognition

no code implementations30 Oct 2021 Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert

Semi-supervised learning through pseudo-labeling has become a staple of state-of-the-art monolingual speech recognition systems.

speech-recognition Speech Recognition

Word Order Does Not Matter For Speech Recognition

no code implementations12 Oct 2021 Vineel Pratap, Qiantong Xu, Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert

In this paper, we study training of automatic speech recognition system in a weakly supervised setting where the order of words in transcript labels of the audio training data is not known.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training

3 code implementations2 Apr 2021 Wei-Ning Hsu, Anuroop Sriram, Alexei Baevski, Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Jacob Kahn, Ann Lee, Ronan Collobert, Gabriel Synnaeve, Michael Auli

On a large-scale competitive setup, we show that pre-training on unlabeled in-domain data reduces the gap between models trained on in-domain and out-of-domain labeled data by 66%-73%.

Self-Supervised Learning

Joint Masked CPC and CTC Training for ASR

1 code implementation30 Oct 2020 Chaitanya Talnikar, Tatiana Likhomanenko, Ronan Collobert, Gabriel Synnaeve

Self-supervised learning (SSL) has shown promise in learning representations of audio that are useful for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

SlimIPL: Language-Model-Free Iterative Pseudo-Labeling

no code implementations22 Oct 2020 Tatiana Likhomanenko, Qiantong Xu, Jacob Kahn, Gabriel Synnaeve, Ronan Collobert

We improve upon the IPL algorithm: as the model learns, we propose to iteratively re-generate transcriptions with hard labels (the most probable tokens), that is, without a language model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Rethinking Evaluation in ASR: Are Our Models Robust Enough?

1 code implementation22 Oct 2020 Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Paden Tomasello, Jacob Kahn, Gilad Avidov, Ronan Collobert, Gabriel Synnaeve

Finally, we show that training a single acoustic model on the most widely-used datasets - combined - reaches competitive performance on both research and real-world benchmarks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Self-training and Pre-training are Complementary for Speech Recognition

3 code implementations22 Oct 2020 Qiantong Xu, Alexei Baevski, Tatiana Likhomanenko, Paden Tomasello, Alexis Conneau, Ronan Collobert, Gabriel Synnaeve, Michael Auli

Self-training and unsupervised pre-training have emerged as effective approaches to improve speech recognition systems using unlabeled data.

 Ranked #1 on Speech Recognition on LibriSpeech train-clean-100 test-other (using extra training data)

speech-recognition Speech Recognition +1

Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters

no code implementations6 Jul 2020 Vineel Pratap, Anuroop Sriram, Paden Tomasello, Awni Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert

We study training a single acoustic model for multiple languages with the aim of improving automatic speech recognition (ASR) performance on low-resource languages, and over-all simplifying deployment of ASR systems that support diverse languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Unsupervised Cross-lingual Representation Learning for Speech Recognition

6 code implementations24 Jun 2020 Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdel-rahman Mohamed, Michael Auli

This paper presents XLSR which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.

Quantization Representation Learning +2

Multi-scale Transformer Language Models

no code implementations1 May 2020 Sandeep Subramanian, Ronan Collobert, Marc'Aurelio Ranzato, Y-Lan Boureau

We investigate multi-scale transformer language models that learn representations of text at multiple scales, and present three different architectures that have an inductive bias to handle the hierarchical nature of language.

Inductive Bias Language Modelling

Scaling Up Online Speech Recognition Using ConvNets

no code implementations27 Jan 2020 Vineel Pratap, Qiantong Xu, Jacob Kahn, Gilad Avidov, Tatiana Likhomanenko, Awni Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert

We design an online end-to-end speech recognition system based on Time-Depth Separable (TDS) convolutions and Connectionist Temporal Classification (CTC).

speech-recognition Speech Recognition

Libri-Light: A Benchmark for ASR with Limited or No Supervision

2 code implementations17 Dec 2019 Jacob Kahn, Morgane Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdel-rahman Mohamed, Emmanuel Dupoux

Additionally, we provide baseline systems and evaluation metrics working under three settings: (1) the zero resource/unsupervised setting (ABX), (2) the semi-supervised setting (PER, CER) and (3) the distant supervision setting (WER).

 Ranked #1 on Speech Recognition on Libri-Light test-other (ABX-within metric)

speech-recognition Speech Recognition

End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures

1 code implementation19 Nov 2019 Gabriel Synnaeve, Qiantong Xu, Jacob Kahn, Tatiana Likhomanenko, Edouard Grave, Vineel Pratap, Anuroop Sriram, Vitaliy Liptchinsky, Ronan Collobert

We study pseudo-labeling for the semi-supervised training of ResNet, Time-Depth Separable ConvNets, and Transformers for speech recognition, with either CTC or Seq2Seq loss functions.

Ranked #16 on Speech Recognition on LibriSpeech test-other (using extra training data)

Language Modelling speech-recognition +1

wav2vec: Unsupervised Pre-training for Speech Recognition

5 code implementations11 Apr 2019 Steffen Schneider, Alexei Baevski, Ronan Collobert, Michael Auli

Our experiments on WSJ reduce WER of a strong character-based log-mel filterbank baseline by up to 36% when only a few hours of transcribed data is available.

Ranked #5 on Speech Recognition on TIMIT (using extra training data)

Binary Classification General Classification +2

Who Needs Words? Lexicon-Free Speech Recognition

no code implementations9 Apr 2019 Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert

Lexicon-free speech recognition naturally deals with the problem of out-of-vocabulary (OOV) words.

speech-recognition Speech Recognition

Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions

no code implementations4 Apr 2019 Awni Hannun, Ann Lee, Qiantong Xu, Ronan Collobert

Coupled with a convolutional language model, our time-depth separable convolution architecture improves by more than 22% relative WER over the best previously reported sequence-to-sequence results on the noisy LibriSpeech test set.

Language Modelling Sequence-To-Sequence Speech Recognition +1

A Fully Differentiable Beam Search Decoder

1 code implementation16 Feb 2019 Ronan Collobert, Awni Hannun, Gabriel Synnaeve

We demonstrate our approach scales by applying it to speech recognition, jointly training acoustic and word-level language models.

Language Modelling speech-recognition +1

Fully Convolutional Speech Recognition

no code implementations17 Dec 2018 Neil Zeghidour, Qiantong Xu, Vitaliy Liptchinsky, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert

In this paper we present an alternative approach based solely on convolutional neural networks, leveraging recent advances in acoustic models from the raw waveform and language modeling.

Language Modelling speech-recognition +1

To Reverse the Gradient or Not: An Empirical Comparison of Adversarial and Multi-task Learning in Speech Recognition

no code implementations9 Dec 2018 Yossi Adi, Neil Zeghidour, Ronan Collobert, Nicolas Usunier, Vitaliy Liptchinsky, Gabriel Synnaeve

In multi-task learning, the goal is speaker prediction; we expect a performance improvement with this joint training if the two tasks of speech recognition and speaker recognition share a common set of underlying features.

Multi-Task Learning Speaker Recognition +2

End-to-End Speech Recognition From the Raw Waveform

1 code implementation19 Jun 2018 Neil Zeghidour, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert, Emmanuel Dupoux

In this paper, we study end-to-end systems trained directly from the raw waveform, building on two alternatives for trainable replacements of mel-filterbanks that use a convolutional architecture.

speech-recognition Speech Recognition

Gated ConvNets for Letter-Based ASR

no code implementations ICLR 2018 Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert

In this paper we introduce a new speech recognition system, leveraging a simple letter-based ConvNet acoustic model.

Language Modelling speech-recognition +1

Letter-Based Speech Recognition with Gated ConvNets

2 code implementations22 Dec 2017 Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert

In the recent literature, "end-to-end" speech systems often refer to letter-based acoustic models trained in a sequence-to-sequence manner, either via a recurrent model or via a structured output learning approach (such as CTC).

Language Modelling speech-recognition +1

Wav2Letter: an End-to-End ConvNet-based Speech Recognition System

9 code implementations arXiv 2016 Ronan Collobert, Christian Puhrsch, Gabriel Synnaeve

This paper presents a simple end-to-end model for speech recognition, combining a convolutional network based acoustic model and a graph decoding.

Speech Recognition

Neural Network-based Word Alignment through Score Aggregation

no code implementations WS 2016 Joel Legrand, Michael Auli, Ronan Collobert

We present a simple neural network for word alignment that builds source and target word window representations to compute alignment scores for sentence pairs.

Sentence Word Alignment

Learning to Refine Object Segments

2 code implementations29 Mar 2016 Pedro O. Pinheiro, Tsung-Yi Lin, Ronan Collobert, Piotr Dollàr

In this work we propose to augment feedforward nets for object segmentation with a novel top-down refinement approach.

Object Semantic Segmentation

Learning to Segment Object Candidates

2 code implementations NeurIPS 2015 Pedro O. Pinheiro, Ronan Collobert, Piotr Dollar

Recent object detection systems rely on two critical steps: (1) a set of object proposals is predicted as efficiently as possible, and (2) this set of candidate proposals is then passed to an object classifier.

Object object-detection +3

"The Sum of Its Parts": Joint Learning of Word and Phrase Representations with Autoencoders

no code implementations18 Jun 2015 Rémi Lebret, Ronan Collobert

We evaluate the quality of the word representations on several classical word evaluation tasks, and we introduce a novel task to evaluate the quality of the phrase representations.

Phrase-based Image Captioning

no code implementations12 Feb 2015 Rémi Lebret, Pedro O. Pinheiro, Ronan Collobert

Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing.

Descriptive Image Captioning +1

Simple Image Description Generator via a Linear Phrase-Based Approach

no code implementations29 Dec 2014 Remi Lebret, Pedro O. Pinheiro, Ronan Collobert

Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing.

Descriptive Language Modelling

Joint RNN-Based Greedy Parsing and Word Composition

no code implementations22 Dec 2014 Joël Legrand, Ronan Collobert

This paper introduces a greedy parser based on neural networks, which leverages a new compositional sub-tree representation.

TAG

Learning linearly separable features for speech recognition using convolutional neural networks

no code implementations22 Dec 2014 Dimitri Palaz, Mathew Magimai. -Doss, Ronan Collobert

This system was shown to yield similar or better performance than HMM/ANN based system on phoneme recognition task and on large scale continuous speech recognition task, using less parameters.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

N-gram-Based Low-Dimensional Representation for Document Classification

no code implementations19 Dec 2014 Rémi Lebret, Ronan Collobert

The number of features is therefore dramatically reduced and documents can be represented as bag of semantic concepts.

Classification Clustering +4

Rehabilitation of Count-based Models for Word Vector Representations

no code implementations16 Dec 2014 Rémi Lebret, Ronan Collobert

We present a systematic study of the use of the Hellinger distance to extract semantic representations from the word co-occurence statistics of large text corpora.

Dimensionality Reduction Word Embeddings +1

From Image-level to Pixel-level Labeling with Convolutional Networks

no code implementations CVPR 2015 Pedro O. Pinheiro, Ronan Collobert

We are interested in inferring object segmentation by leveraging only object class information, and by considering only minimal priors on the object segmentation task.

Multiple Instance Learning Object +4

Word Emdeddings through Hellinger PCA

no code implementations19 Dec 2013 Rémi Lebret, Ronan Collobert

Word embeddings resulting from neural language models have been shown to be successful for a large variety of NLP tasks.

NER Word Embeddings

End-to-end Phoneme Sequence Recognition using Convolutional Neural Networks

1 code implementation7 Dec 2013 Dimitri Palaz, Ronan Collobert, Mathew Magimai. -Doss

Most phoneme recognition state-of-the-art systems rely on a classical neural network classifiers, fed with highly tuned features, such as MFCC or PLP features.

Recurrent Convolutional Neural Networks for Scene Parsing

no code implementations12 Jun 2013 Pedro H. O. Pinheiro, Ronan Collobert

Scene parsing is a technique that consist on giving a label to all pixels in an image according to the class they belong to.

Scene Parsing

Estimating Phoneme Class Conditional Probabilities from Raw Speech Signal using Convolutional Neural Networks

no code implementations3 Apr 2013 Dimitri Palaz, Ronan Collobert, Mathew Magimai. -Doss

Motivated from these studies, in the framework of convolutional neural networks (CNNs), this paper investigates a novel approach, where the input to the ANN is raw speech signal and the output is phoneme class conditional probability estimates.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Natural Language Processing (almost) from Scratch

2 code implementations2 Mar 2011 Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, Pavel Kuksa

We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including: part-of-speech tagging, chunking, named entity recognition, and semantic role labeling.

Chunking named-entity-recognition +4

Polynomial Semantic Indexing

no code implementations NeurIPS 2009 Bing Bai, Jason Weston, David Grangier, Ronan Collobert, Kunihiko Sadamasa, Yanjun Qi, Corinna Cortes, Mehryar Mohri

We present a class of nonlinear (polynomial) models that are discriminatively trained to directly map from the word content in a query-document or document-document pair to a ranking score.

Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.