1 code implementation • 25 May 2023 • Qiantong Xu, Fenglu Hong, Bo Li, Changran Hu, Zhengyu Chen, Jian Zhang
In this paper, we ask can we enhance open-source LLMs to be competitive to leading closed LLM APIs in tool manipulation, with practical amount of human supervision.
8 code implementations • Preprint 2022 • Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli
While the general idea of self-supervised learning is identical across modalities, the actual algorithms and objectives differ widely because they were developed with a single modality in mind.
Ranked #1 on
Paraphrase Identification
on Quora Question Pairs
(Accuracy metric)
1 code implementation • 29 Jan 2022 • Jacob Kahn, Vineel Pratap, Tatiana Likhomanenko, Qiantong Xu, Awni Hannun, Jeff Cai, Paden Tomasello, Ann Lee, Edouard Grave, Gilad Avidov, Benoit Steiner, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert
This is in part due to the difficulties involved in prototyping new computational paradigms with existing frameworks.
2 code implementations • 17 Nov 2021 • Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli
On the CoVoST-2 speech translation benchmark, we improve the previous state of the art by an average of 7. 4 BLEU over 21 translation directions into English.
Ranked #1 on
Language Identification
on VoxLingua107
(using extra training data)
no code implementations • 12 Oct 2021 • Vineel Pratap, Qiantong Xu, Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert
In this paper, we study training of automatic speech recognition system in a weakly supervised setting where the order of words in transcript labels of the audio training data is not known.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
2 code implementations • 23 Sep 2021 • Qiantong Xu, Alexei Baevski, Michael Auli
Recent progress in self-training, self-supervised pretraining and unsupervised learning enabled well performing speech recognition systems without any labeled data.
no code implementations • 14 Jun 2021 • Vimal Manohar, Tatiana Likhomanenko, Qiantong Xu, Wei-Ning Hsu, Ronan Collobert, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed
In this paper, we introduce the Kaizen framework that uses a continuously improving teacher to generate pseudo-labels for semi-supervised speech recognition (ASR).
1 code implementation • NeurIPS 2021 • Tatiana Likhomanenko, Qiantong Xu, Gabriel Synnaeve, Ronan Collobert, Alex Rogozhnikov
Absolute or relative positional embeddings are the most popular ways to feed Transformer models with positional information.
2 code implementations • 2 Apr 2021 • Wei-Ning Hsu, Anuroop Sriram, Alexei Baevski, Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Jacob Kahn, Ann Lee, Ronan Collobert, Gabriel Synnaeve, Michael Auli
On a large-scale competitive setup, we show that pre-training on unlabeled in-domain data reduces the gap between models trained on in-domain and out-of-domain labeled data by 66%-73%.
1 code implementation • 7 Dec 2020 • Vineel Pratap, Qiantong Xu, Anuroop Sriram, Gabriel Synnaeve, Ronan Collobert
This paper introduces Multilingual LibriSpeech (MLS) dataset, a large multilingual corpus suitable for speech research.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 22 Oct 2020 • Tatiana Likhomanenko, Qiantong Xu, Jacob Kahn, Gabriel Synnaeve, Ronan Collobert
We improve upon the IPL algorithm: as the model learns, we propose to iteratively re-generate transcriptions with hard labels (the most probable tokens), that is, without a language model.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
3 code implementations • 22 Oct 2020 • Qiantong Xu, Alexei Baevski, Tatiana Likhomanenko, Paden Tomasello, Alexis Conneau, Ronan Collobert, Gabriel Synnaeve, Michael Auli
Self-training and unsupervised pre-training have emerged as effective approaches to improve speech recognition systems using unlabeled data.
Ranked #1 on
Speech Recognition
on LibriSpeech train-clean-100 test-other
(using extra training data)
1 code implementation • 22 Oct 2020 • Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Paden Tomasello, Jacob Kahn, Gilad Avidov, Ronan Collobert, Gabriel Synnaeve
Finally, we show that training a single acoustic model on the most widely-used datasets - combined - reaches competitive performance on both research and real-world benchmarks.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 3 Jun 2020 • Juan Pino, Qiantong Xu, Xutai Ma, Mohammad Javad Dousti, Yun Tang
One of the main challenges for end-to-end speech translation is data scarcity.
1 code implementation • 19 May 2020 • Qiantong Xu, Tatiana Likhomanenko, Jacob Kahn, Awni Hannun, Gabriel Synnaeve, Ronan Collobert
In particular, IPL fine-tunes an existing model at each iteration using both labeled data and a subset of unlabeled data.
Ranked #10 on
Speech Recognition
on LibriSpeech test-other
(using extra training data)
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 27 Jan 2020 • Vineel Pratap, Qiantong Xu, Jacob Kahn, Gilad Avidov, Tatiana Likhomanenko, Awni Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert
We design an online end-to-end speech recognition system based on Time-Depth Separable (TDS) convolutions and Connectionist Temporal Classification (CTC).
2 code implementations • 17 Dec 2019 • Jacob Kahn, Morgane Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdel-rahman Mohamed, Emmanuel Dupoux
Additionally, we provide baseline systems and evaluation metrics working under three settings: (1) the zero resource/unsupervised setting (ABX), (2) the semi-supervised setting (PER, CER) and (3) the distant supervision setting (WER).
Ranked #1 on
Speech Recognition
on Libri-Light test-other
(ABX-across metric)
1 code implementation • 19 Nov 2019 • Gabriel Synnaeve, Qiantong Xu, Jacob Kahn, Tatiana Likhomanenko, Edouard Grave, Vineel Pratap, Anuroop Sriram, Vitaliy Liptchinsky, Ronan Collobert
We study pseudo-labeling for the semi-supervised training of ResNet, Time-Depth Separable ConvNets, and Transformers for speech recognition, with either CTC or Seq2Seq loss functions.
Ranked #15 on
Speech Recognition
on LibriSpeech test-other
(using extra training data)
no code implementations • 4 Apr 2019 • Awni Hannun, Ann Lee, Qiantong Xu, Ronan Collobert
Coupled with a convolutional language model, our time-depth separable convolution architecture improves by more than 22% relative WER over the best previously reported sequence-to-sequence results on the noisy LibriSpeech test set.
Language Modelling
Sequence-To-Sequence Speech Recognition
+1
8 code implementations • 18 Dec 2018 • Vineel Pratap, Awni Hannun, Qiantong Xu, Jeff Cai, Jacob Kahn, Gabriel Synnaeve, Vitaliy Liptchinsky, Ronan Collobert
This paper introduces wav2letter++, the fastest open-source deep learning speech recognition framework.
no code implementations • 17 Dec 2018 • Neil Zeghidour, Qiantong Xu, Vitaliy Liptchinsky, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert
In this paper we present an alternative approach based solely on convolutional neural networks, leveraging recent advances in acoustic models from the raw waveform and language modeling.
Ranked #3 on
Speech Recognition
on WSJ eval93
4 code implementations • ICLR 2018 • Qiantong Xu, Gao Huang, Yang Yuan, Chuan Guo, Yu Sun, Felix Wu, Kilian Weinberger
Evaluating generative adversarial networks (GANs) is inherently challenging.
1 code implementation • 8 Aug 2017 • Qiantong Xu, Ke Yan, Yonghong Tian
The growing explosion in the use of surveillance cameras in public security highlights the importance of vehicle search from large-scale image databases.