1 code implementation • 27 Sep 2024 • Brian Yan, Vineel Pratap, Shinji Watanabe, Michael Auli
Multilingual Automatic Speech Recognition (ASR) models are typically evaluated in a setting where the ground-truth language of the speech utterance is known, however, this is often not the case for most practical settings.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
1 code implementation • 25 Jul 2024 • Jinming Zhao, Vineel Pratap, Michael Auli
Despite rapid progress in increasing the language coverage of automatic speech recognition, the field is still far from covering all languages with a known writing script.
no code implementations • 12 Oct 2023 • Ju-chieh Chou, Chung-Ming Chien, Wei-Ning Hsu, Karen Livescu, Arun Babu, Alexis Conneau, Alexei Baevski, Michael Auli
However, in the field of language modeling, very little effort has been made to model them jointly.
3 code implementations • arXiv 2023 • Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli
Expanding the language coverage of speech technology has the potential to improve access to information for many more people.
1 code implementation • NeurIPS 2023 • Alexander H. Liu, Heng-Jui Chang, Michael Auli, Wei-Ning Hsu, James R. Glass
In this paper, we introduce self-distillation and online clustering for self-supervised speech representation learning (DinoSR) which combines masked language modeling, self-distillation, and online clustering.
no code implementations • 10 Feb 2023 • Jiachen Lian, Alexei Baevski, Wei-Ning Hsu, Michael Auli
Self-supervision has shown great potential for audio-visual speech recognition by vastly reducing the amount of labeled data required to build good systems.
4 code implementations • 14 Dec 2022 • Alexei Baevski, Arun Babu, Wei-Ning Hsu, Michael Auli
Current self-supervised learning algorithms are often modality-specific and require large amounts of computational resources.
Ranked #90 on
Image Classification
on ImageNet
no code implementations • 18 Oct 2022 • Changhan Wang, Hirofumi Inaguma, Peng-Jen Chen, Ilia Kulikov, Yun Tang, Wei-Ning Hsu, Michael Auli, Juan Pino
The amount of labeled data to train models for speech tasks is limited for most languages, however, the data scarcity is exacerbated for speech translation which requires labeled data covering two different languages.
4 code implementations • 13 Jul 2022 • Po-Yao Huang, Hu Xu, Juncheng Li, Alexei Baevski, Michael Auli, Wojciech Galuba, Florian Metze, Christoph Feichtenhofer
Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio spectrogram patches with a high masking ratio, feeding only the non-masked tokens through encoder layers.
Ranked #4 on
Speaker Identification
on VoxCeleb1
(using extra training data)
no code implementations • 27 Jun 2022 • Anuroop Sriram, Michael Auli, Alexei Baevski
Self-supervised learning (SSL) of speech representations has received much attention over the last few years but most work has focused on languages and domains with an abundance of unlabeled data.
no code implementations • 25 Apr 2022 • Apoorv Vyas, Wei-Ning Hsu, Michael Auli, Alexei Baevski
Our results for models pre-trained on 960h Librispeech dataset and fine-tuned on 10h of transcribed data show that using the same stochastic model, we get a smooth trade-off between word error rate (WER) and inference time with only marginal WER degradation compared to the W2V2 and SEW models trained for a specific setting.
no code implementations • ACL 2022 • Yun Tang, Hongyu Gong, Ning Dong, Changhan Wang, Wei-Ning Hsu, Jiatao Gu, Alexei Baevski, Xian Li, Abdelrahman Mohamed, Michael Auli, Juan Pino
Two pre-training configurations for speech translation and recognition, respectively, are presented to alleviate subtask interference.
no code implementations • 6 Apr 2022 • Alexander H. Liu, Cheng-I Jeff Lai, Wei-Ning Hsu, Michael Auli, Alexei Baevski, James Glass
We introduce the first unsupervised speech synthesis system based on a simple, yet effective recipe.
1 code implementation • 5 Apr 2022 • Alexander H. Liu, Wei-Ning Hsu, Michael Auli, Alexei Baevski
Unsupervised speech recognition has shown great potential to make Automatic Speech Recognition (ASR) systems accessible to every language.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 21 Mar 2022 • Alexis Conneau, Ankur Bapna, Yu Zhang, Min Ma, Patrick von Platen, Anton Lozhkov, Colin Cherry, Ye Jia, Clara Rivera, Mihir Kale, Daan van Esch, Vera Axelrod, Simran Khanuja, Jonathan H. Clark, Orhan Firat, Michael Auli, Sebastian Ruder, Jason Riesa, Melvin Johnson
Covering 102 languages from 10+ language families, 3 different domains and 4 task families, XTREME-S aims to simplify multilingual speech representation evaluation, as well as catalyze research in "universal" speech representation learning.
no code implementations • 1 Mar 2022 • Ramon Sanabria, Wei-Ning Hsu, Alexei Baevski, Michael Auli
In this paper, we present a controlled study to better understand the effect of such factors on the performance of pre-trained representations on automatic speech recognition.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
11 code implementations • Preprint 2022 • Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli
While the general idea of self-supervised learning is identical across modalities, the actual algorithms and objectives differ widely because they were developed with a single modality in mind.
Ranked #1 on
Paraphrase Identification
on Quora Question Pairs
(Accuracy metric)
2 code implementations • 17 Nov 2021 • Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli
On the CoVoST-2 speech translation benchmark, we improve the previous state of the art by an average of 7. 4 BLEU over 21 translation directions into English.
Ranked #1 on
Language Identification
on VOXLINGUA107
2 code implementations • 23 Sep 2021 • Qiantong Xu, Alexei Baevski, Michael Auli
Recent progress in self-training, self-supervised pretraining and unsupervised learning enabled well performing speech recognition systems without any labeled data.
no code implementations • ACL 2021 • Xian Li, Changhan Wang, Yun Tang, Chau Tran, Yuqing Tang, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli
We present a simple yet effective approach to build multilingual speech-to-text (ST) translation through efficient transfer learning from a pretrained speech encoder and text decoder.
no code implementations • ACL 2021 • Ann Lee, Michael Auli, Marc{'}Aurelio Ranzato
Reranking models enable the integration of rich features to select a better output hypothesis within an n-best list or lattice.
no code implementations • 8 Jul 2021 • Andros Tjandra, Diptanu Gon Choudhury, Frank Zhang, Kritika Singh, Alexis Conneau, Alexei Baevski, Assaf Sela, Yatharth Saraf, Michael Auli
Language identification greatly impacts the success of downstream tasks such as automatic speech recognition.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
4 code implementations • NeurIPS 2021 • Alexei Baevski, Wei-Ning Hsu, Alexis Conneau, Michael Auli
Despite rapid progress in the recent past, current speech recognition systems still require labeled training data which limits this technology to a small fraction of the languages spoken around the globe.
no code implementations • 14 Apr 2021 • Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau
In this paper, we improve speech translation (ST) through effectively leveraging large quantities of unlabeled speech and text data in different and complementary ways.
3 code implementations • 2 Apr 2021 • Wei-Ning Hsu, Anuroop Sriram, Alexei Baevski, Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Jacob Kahn, Ann Lee, Ronan Collobert, Gabriel Synnaeve, Michael Auli
On a large-scale competitive setup, we show that pre-training on unlabeled in-domain data reduces the gap between models trained on in-domain and out-of-domain labeled data by 66%-73%.
no code implementations • 26 Jan 2021 • Zhiyi Ma, Sergey Edunov, Michael Auli
Document-level machine translation conditions on surrounding sentences to produce coherent translations.
no code implementations • ACL 2021 • Sheng Shen, Alexei Baevski, Ari S. Morcos, Kurt Keutzer, Michael Auli, Douwe Kiela
We demonstrate that transformers obtain impressive performance even when some of the layers are randomly initialized and never updated.
1 code implementation • WMT (EMNLP) 2020 • Shruti Bhosale, Kyra Yee, Sergey Edunov, Michael Auli
Pre-training models on vast quantities of unlabeled data has emerged as an effective approach to improving accuracy on many NLP tasks.
Ranked #1 on
Machine Translation
on WMT2016 Romanian-English
(using extra training data)
no code implementations • 24 Oct 2020 • Henry Zhou, Alexei Baevski, Michael Auli
Neural latent variable models enable the discovery of interesting structure in speech audio data.
no code implementations • 24 Oct 2020 • Xian Li, Changhan Wang, Yun Tang, Chau Tran, Yuqing Tang, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli
We present a simple yet effective approach to build multilingual speech-to-text (ST) translation by efficient transfer learning from pretrained speech encoder and text decoder.
3 code implementations • 22 Oct 2020 • Qiantong Xu, Alexei Baevski, Tatiana Likhomanenko, Paden Tomasello, Alexis Conneau, Ronan Collobert, Gabriel Synnaeve, Michael Auli
Self-training and unsupervised pre-training have emerged as effective approaches to improve speech recognition systems using unlabeled data.
Ranked #1 on
Speech Recognition
on LibriSpeech train-clean-100 test-other
(using extra training data)
8 code implementations • 21 Oct 2020 • Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin
Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages.
1 code implementation • NAACL 2021 • Jingfei Du, Edouard Grave, Beliz Gunel, Vishrav Chaudhary, Onur Celebi, Michael Auli, Ves Stoyanov, Alexis Conneau
Unsupervised pre-training has led to much recent progress in natural language understanding.
8 code implementations • 24 Jun 2020 • Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdel-rahman Mohamed, Michael Auli
This paper presents XLSR which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.
25 code implementations • NeurIPS 2020 • Alexei Baevski, Henry Zhou, Abdel-rahman Mohamed, Michael Auli
We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.
Ranked #1 on
Speech Recognition
on TIMIT
(using extra training data)
no code implementations • ECCV 2020 • Jiaming Song, Lunjia Hu, Michael Auli, Yann Dauphin, Tengyu Ma
We address this problem by reasoning counterfactually about the loss distribution of examples with uniform random labels had they were trained with the real examples, and use this information to remove noisy examples from the training set.
Ranked #37 on
Image Classification
on mini WebVision 1.0
no code implementations • 21 Nov 2019 • Xinyi Wang, Jason Weston, Michael Auli, Yacine Jernite
Neural sequence to sequence models are well established for applications which can be cast as mapping a single input sequence into a single output sequence.
Ranked #6 on
Open-Domain Question Answering
on ELI5
2 code implementations • 10 Nov 2019 • Alexei Baevski, Michael Auli, Abdel-rahman Mohamed
We compare self-supervised representation learning algorithms which either explicitly quantize the audio data or learn representations without quantization.
no code implementations • ICLR 2020 • Maha Elbayad, Jiatao Gu, Edouard Grave, Michael Auli
State of the art sequence-to-sequence models for large scale tasks perform a fixed number of computations for each input sequence regardless of whether it is easy or hard to process.
3 code implementations • ICLR 2020 • Alexei Baevski, Steffen Schneider, Michael Auli
We propose vq-wav2vec to learn discrete representations of audio segments through a wav2vec-style self-supervised context prediction task.
Ranked #2 on
Speech Recognition
on TIMIT
(using extra training data)
no code implementations • EACL 2021 • Jiajun Shen, Peng-Jen Chen, Matt Le, Junxian He, Jiatao Gu, Myle Ott, Michael Auli, Marc'Aurelio Ranzato
While we live in an increasingly interconnected world, different places still exhibit strikingly different cultures and many events we experience in our every day life pertain only to the specific place we live in.
1 code implementation • IJCNLP 2019 • Kyra Yee, Nathan Ng, Yann N. Dauphin, Michael Auli
Previous work on neural noisy channel modeling relied on latent variable models that incrementally process the source and target sentence.
1 code implementation • ACL 2020 • Sergey Edunov, Myle Ott, Marc'Aurelio Ranzato, Michael Auli
Back-translation is a widely used data augmentation technique which leverages target monolingual data.
3 code implementations • ACL 2019 • Angela Fan, Yacine Jernite, Ethan Perez, David Grangier, Jason Weston, Michael Auli
We introduce the first large-scale corpus for long-form question answering, a task requiring elaborate and in-depth answers to open-ended questions.
5 code implementations • WS 2019 • Nathan Ng, Kyra Yee, Alexei Baevski, Myle Ott, Michael Auli, Sergey Edunov
This paper describes Facebook FAIR's submission to the WMT19 shared news translation task.
Ranked #1 on
Machine Translation
on WMT2019 English-German
1 code implementation • 15 Jul 2019 • Sidak Pal Singh, Angela Fan, Michael Auli
Both are trained to reconstruct the sentence based on a latent code and our model can be used to generate text.
no code implementations • ICLR 2019 • Jiaming Song, Tengyu Ma, Michael Auli, Yann Dauphin
Memorization in over-parameterized neural networks can severely hurt generalization in the presence of mislabeled examples.
no code implementations • ICLR 2019 • Tianxiao Shen, Myle Ott, Michael Auli, Marc’Aurelio Ranzato
There are many ways to translate a sentence into another language.
7 code implementations • 11 Apr 2019 • Steffen Schneider, Alexei Baevski, Ronan Collobert, Michael Auli
Our experiments on WSJ reduce WER of a strong character-based log-mel filterbank baseline by up to 36% when only a few hours of transcribed data is available.
Ranked #5 on
Speech Recognition
on TIMIT
(using extra training data)
6 code implementations • NAACL 2019 • Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli
fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks.
1 code implementation • NAACL 2019 • Sergey Edunov, Alexei Baevski, Michael Auli
Pre-trained language model representations have been successful in a wide range of language understanding tasks.
no code implementations • IJCNLP 2019 • Alexei Baevski, Sergey Edunov, Yinhan Liu, Luke Zettlemoyer, Michael Auli
We present a new approach for pretraining a bi-directional transformer model that provides significant performance gains across a variety of language understanding problems.
Ranked #12 on
Constituency Parsing
on Penn Treebank
1 code implementation • 20 Feb 2019 • Tianxiao Shen, Myle Ott, Michael Auli, Marc'Aurelio Ranzato
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
3 code implementations • ICLR 2019 • Felix Wu, Angela Fan, Alexei Baevski, Yann N. Dauphin, Michael Auli
We predict separate convolution kernels based solely on the current time-step in order to determine the importance of context elements.
Ranked #1 on
Machine Translation
on WMT 2017 English-Chinese
1 code implementation • 21 Jan 2019 • Dario Pavllo, Christoph Feichtenhofer, Michael Auli, David Grangier
Previous work on predicting or generating 3D human pose sequences regresses either joint rotations or joint positions.
10 code implementations • CVPR 2019 • Dario Pavllo, Christoph Feichtenhofer, David Grangier, Michael Auli
We start with predicted 2D keypoints for unlabeled video, then estimate 3D poses and finally back-project to the input 2D keypoints.
Ranked #13 on
Weakly-supervised 3D Human Pose Estimation
on Human3.6M
(Number of Frames Per View metric)
2 code implementations • ICLR 2019 • Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, Jason Weston
In open-domain dialogue intelligent agents should exhibit the use of knowledge, however there are few convincing demonstrations of this to date.
3 code implementations • ICLR 2019 • Alexei Baevski, Michael Auli
We introduce adaptive input representations for neural language modeling which extend the adaptive softmax of Grave et al. (2017) to input representations of variable capacity.
Ranked #7 on
Language Modelling
on One Billion Word
3 code implementations • EMNLP 2018 • Sergey Edunov, Myle Ott, Michael Auli, David Grangier
An effective method to improve neural machine translation with monolingual data is to augment the parallel training corpus with back-translations of target language sentences.
Ranked #2 on
Machine Translation
on WMT2014 English-German
(using extra training data)
no code implementations • NAACL 2018 • David Grangier, Michael Auli
We also evaluate our model for paraphrasing through a user study.
5 code implementations • WS 2018 • Myle Ott, Sergey Edunov, David Grangier, Michael Auli
Sequence to sequence learning models still require several days to reach state of the art performance on large benchmark datasets using a single machine.
Ranked #12 on
Machine Translation
on WMT2014 English-French
1 code implementation • 16 May 2018 • Dario Pavllo, David Grangier, Michael Auli
Deep learning for predicting or generating 3D human pose sequences is an active research area.
1 code implementation • ICML 2018 • Myle Ott, Michael Auli, David Grangier, Marc'Aurelio Ranzato
We propose tools and metrics to assess how uncertainty in the data is captured by the model distribution and how it affects search strategies that generate translations.
1 code implementation • NAACL 2018 • Sergey Edunov, Myle Ott, Michael Auli, David Grangier, Marc'Aurelio Ranzato
There has been much recent work on training neural attention models at the sequence-level using either reinforcement learning-style methods or by optimizing the beam.
Ranked #4 on
Machine Translation
on IWSLT2015 German-English
no code implementations • WS 2018 • Angela Fan, David Grangier, Michael Auli
Current models for document summarization disregard user preferences such as the desired length, style, the entities that the user might be interested in, or how much of the document the user has already read.
no code implementations • HLT 2018 • David Grangier, Michael Auli
We also evaluate our model for paraphrasing through a user study.
37 code implementations • ICML 2017 • Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. Dauphin
The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks.
11 code implementations • ICML 2017 • Yann N. Dauphin, Angela Fan, Michael Auli, David Grangier
The pre-dominant approach to language modeling to date is based on recurrent neural networks.
Ranked #20 on
Language Modelling
on One Billion Word
2 code implementations • ACL 2017 • Jonas Gehring, Michael Auli, David Grangier, Yann N. Dauphin
The prevalent approach to neural machine translation relies on bi-directional LSTMs to encode the source sentence.
Ranked #7 on
Machine Translation
on IWSLT2015 German-English
no code implementations • 20 Oct 2016 • Roman Novak, Michael Auli, David Grangier
Existing machine translation decoding algorithms generate translations in a strictly monotonic fashion and never revisit previous decisions.
no code implementations • 1 Oct 2016 • Gurvan L'Hostis, David Grangier, Michael Auli
Classical translation models constrain the space of possible outputs by selecting a subset of translation rules based on the input sentence.
no code implementations • WS 2016 • Joel Legrand, Michael Auli, Ronan Collobert
We present a simple neural network for word alignment that builds source and target word window representations to compute alignment scores for sentence pairs.
2 code implementations • EMNLP 2016 • Remi Lebret, David Grangier, Michael Auli
This paper introduces a neural model for concept-to-text generation that scales to large, rich domains.
Ranked #4 on
Table-to-Text Generation
on WikiBio
2 code implementations • ACL 2016 • Welin Chen, David Grangier, Michael Auli
Training neural network language models over large vocabularies is still computationally very costly compared to count-based models such as Kneser-Ney.
5 code implementations • 20 Nov 2015 • Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, Wojciech Zaremba
Many natural language processing applications use language models to generate text.
Ranked #14 on
Machine Translation
on IWSLT2015 German-English
no code implementations • IJCNLP 2015 • Michel Galley, Chris Brockett, Alessandro Sordoni, Yangfeng Ji, Michael Auli, Chris Quirk, Margaret Mitchell, Jianfeng Gao, Bill Dolan
We introduce Discriminative BLEU (deltaBLEU), a novel metric for intrinsic evaluation of generated text in tasks that admit a diverse range of possible outputs.
no code implementations • HLT 2015 • Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, Yangfeng Ji, Margaret Mitchell, Jian-Yun Nie, Jianfeng Gao, Bill Dolan
We present a novel response generation system that can be trained end to end on large quantities of unstructured Twitter conversations.