Search Results for author: Qiantong Xu

Found 22 papers, 15 papers with code

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

4 code implementations Preprint 2022 Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli

While the general idea of self-supervised learning is identical across modalities, the actual algorithms and objectives differ widely because they were developed with a single modality in mind.

Image Classification Linguistic Acceptability +5

Flashlight: Enabling Innovation in Tools for Machine Learning

1 code implementation29 Jan 2022 Jacob Kahn, Vineel Pratap, Tatiana Likhomanenko, Qiantong Xu, Awni Hannun, Jeff Cai, Paden Tomasello, Ann Lee, Edouard Grave, Gilad Avidov, Benoit Steiner, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert

As the computational requirements for machine learning systems and the size and complexity of machine learning frameworks increases, essential framework innovation has become challenging.

Word Order Does Not Matter For Speech Recognition

no code implementations12 Oct 2021 Vineel Pratap, Qiantong Xu, Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert

In this paper, we study training of automatic speech recognition system in a weakly supervised setting where the order of words in transcript labels of the audio training data is not known.

Automatic Speech Recognition

Simple and Effective Zero-shot Cross-lingual Phoneme Recognition

2 code implementations23 Sep 2021 Qiantong Xu, Alexei Baevski, Michael Auli

Recent progress in self-training, self-supervised pretraining and unsupervised learning enabled well performing speech recognition systems without any labeled data.

Speech Recognition Transfer Learning +1

Kaizen: Continuously improving teacher using Exponential Moving Average for semi-supervised speech recognition

no code implementations14 Jun 2021 Vimal Manohar, Tatiana Likhomanenko, Qiantong Xu, Wei-Ning Hsu, Ronan Collobert, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed

In this paper, we introduce the Kaizen framework that uses a continuously improving teacher to generate pseudo-labels for semi-supervised speech recognition (ASR).

Frame Speech Recognition

Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training

2 code implementations2 Apr 2021 Wei-Ning Hsu, Anuroop Sriram, Alexei Baevski, Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Jacob Kahn, Ann Lee, Ronan Collobert, Gabriel Synnaeve, Michael Auli

On a large-scale competitive setup, we show that pre-training on unlabeled in-domain data reduces the gap between models trained on in-domain and out-of-domain labeled data by 66%-73%.

Self-Supervised Learning

MLS: A Large-Scale Multilingual Dataset for Speech Research

1 code implementation7 Dec 2020 Vineel Pratap, Qiantong Xu, Anuroop Sriram, Gabriel Synnaeve, Ronan Collobert

This paper introduces Multilingual LibriSpeech (MLS) dataset, a large multilingual corpus suitable for speech research.

Automatic Speech Recognition

SlimIPL: Language-Model-Free Iterative Pseudo-Labeling

no code implementations22 Oct 2020 Tatiana Likhomanenko, Qiantong Xu, Jacob Kahn, Gabriel Synnaeve, Ronan Collobert

We improve upon the IPL algorithm: as the model learns, we propose to iteratively re-generate transcriptions with hard labels (the most probable tokens), that is, without a language model.

Automatic Speech Recognition

Rethinking Evaluation in ASR: Are Our Models Robust Enough?

1 code implementation22 Oct 2020 Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Paden Tomasello, Jacob Kahn, Gilad Avidov, Ronan Collobert, Gabriel Synnaeve

Finally, we show that training a single acoustic model on the most widely-used datasets - combined - reaches competitive performance on both research and real-world benchmarks.

Automatic Speech Recognition

Iterative Pseudo-Labeling for Speech Recognition

1 code implementation19 May 2020 Qiantong Xu, Tatiana Likhomanenko, Jacob Kahn, Awni Hannun, Gabriel Synnaeve, Ronan Collobert

In particular, IPL fine-tunes an existing model at each iteration using both labeled data and a subset of unlabeled data.

Ranked #9 on Speech Recognition on LibriSpeech test-other (using extra training data)

Automatic Speech Recognition Data Augmentation

Scaling Up Online Speech Recognition Using ConvNets

no code implementations27 Jan 2020 Vineel Pratap, Qiantong Xu, Jacob Kahn, Gilad Avidov, Tatiana Likhomanenko, Awni Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert

We design an online end-to-end speech recognition system based on Time-Depth Separable (TDS) convolutions and Connectionist Temporal Classification (CTC).

Speech Recognition

Libri-Light: A Benchmark for ASR with Limited or No Supervision

1 code implementation17 Dec 2019 Jacob Kahn, Morgane Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdel-rahman Mohamed, Emmanuel Dupoux

Additionally, we provide baseline systems and evaluation metrics working under three settings: (1) the zero resource/unsupervised setting (ABX), (2) the semi-supervised setting (PER, CER) and (3) the distant supervision setting (WER).

 Ranked #1 on Speech Recognition on Libri-Light test-other (ABX-across metric)

Speech Recognition

End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures

1 code implementation19 Nov 2019 Gabriel Synnaeve, Qiantong Xu, Jacob Kahn, Tatiana Likhomanenko, Edouard Grave, Vineel Pratap, Anuroop Sriram, Vitaliy Liptchinsky, Ronan Collobert

We study pseudo-labeling for the semi-supervised training of ResNet, Time-Depth Separable ConvNets, and Transformers for speech recognition, with either CTC or Seq2Seq loss functions.

Ranked #14 on Speech Recognition on LibriSpeech test-other (using extra training data)

Speech Recognition

Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions

no code implementations4 Apr 2019 Awni Hannun, Ann Lee, Qiantong Xu, Ronan Collobert

Coupled with a convolutional language model, our time-depth separable convolution architecture improves by more than 22% relative WER over the best previously reported sequence-to-sequence results on the noisy LibriSpeech test set.

Sequence-To-Sequence Speech Recognition

Fully Convolutional Speech Recognition

no code implementations17 Dec 2018 Neil Zeghidour, Qiantong Xu, Vitaliy Liptchinsky, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert

In this paper we present an alternative approach based solely on convolutional neural networks, leveraging recent advances in acoustic models from the raw waveform and language modeling.

Speech Recognition

Learning a Repression Network for Precise Vehicle Search

1 code implementation8 Aug 2017 Qiantong Xu, Ke Yan, Yonghong Tian

The growing explosion in the use of surveillance cameras in public security highlights the importance of vehicle search from large-scale image databases.

Multi-Task Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.