Search Results for author: Michael Auli

Found 83 papers, 43 papers with code

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

1 code implementation NeurIPS 2023 Alexander H. Liu, Heng-Jui Chang, Michael Auli, Wei-Ning Hsu, James R. Glass

In this paper, we introduce self-distillation and online clustering for self-supervised speech representation learning (DinoSR) which combines masked language modeling, self-distillation, and online clustering.

Clustering Language Modelling +3

AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations

no code implementations10 Feb 2023 Jiachen Lian, Alexei Baevski, Wei-Ning Hsu, Michael Auli

Self-supervision has shown great potential for audio-visual speech recognition by vastly reducing the amount of labeled data required to build good systems.

Audio-Visual Speech Recognition Self-Supervised Learning +2

Simple and Effective Unsupervised Speech Translation

no code implementations18 Oct 2022 Changhan Wang, Hirofumi Inaguma, Peng-Jen Chen, Ilia Kulikov, Yun Tang, Wei-Ning Hsu, Michael Auli, Juan Pino

The amount of labeled data to train models for speech tasks is limited for most languages, however, the data scarcity is exacerbated for speech translation which requires labeled data covering two different languages.

Machine Translation speech-recognition +6

Masked Autoencoders that Listen

3 code implementations13 Jul 2022 Po-Yao Huang, Hu Xu, Juncheng Li, Alexei Baevski, Michael Auli, Wojciech Galuba, Florian Metze, Christoph Feichtenhofer

Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio spectrogram patches with a high masking ratio, feeding only the non-masked tokens through encoder layers.

Ranked #2 on Speaker Identification on VoxCeleb1 (using extra training data)

Audio Classification Representation Learning +1

Wav2Vec-Aug: Improved self-supervised training with limited data

no code implementations27 Jun 2022 Anuroop Sriram, Michael Auli, Alexei Baevski

Self-supervised learning (SSL) of speech representations has received much attention over the last few years but most work has focused on languages and domains with an abundance of unlabeled data.

Data Augmentation Self-Supervised Learning

On-demand compute reduction with stochastic wav2vec 2.0

no code implementations25 Apr 2022 Apoorv Vyas, Wei-Ning Hsu, Michael Auli, Alexei Baevski

Our results for models pre-trained on 960h Librispeech dataset and fine-tuned on 10h of transcribed data show that using the same stochastic model, we get a smooth trade-off between word error rate (WER) and inference time with only marginal WER degradation compared to the W2V2 and SEW models trained for a specific setting.

Towards End-to-end Unsupervised Speech Recognition

1 code implementation5 Apr 2022 Alexander H. Liu, Wei-Ning Hsu, Michael Auli, Alexei Baevski

Unsupervised speech recognition has shown great potential to make Automatic Speech Recognition (ASR) systems accessible to every language.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

XTREME-S: Evaluating Cross-lingual Speech Representations

no code implementations21 Mar 2022 Alexis Conneau, Ankur Bapna, Yu Zhang, Min Ma, Patrick von Platen, Anton Lozhkov, Colin Cherry, Ye Jia, Clara Rivera, Mihir Kale, Daan van Esch, Vera Axelrod, Simran Khanuja, Jonathan H. Clark, Orhan Firat, Michael Auli, Sebastian Ruder, Jason Riesa, Melvin Johnson

Covering 102 languages from 10+ language families, 3 different domains and 4 task families, XTREME-S aims to simplify multilingual speech representation evaluation, as well as catalyze research in "universal" speech representation learning.

Representation Learning Retrieval +4

Measuring the Impact of Individual Domain Factors in Self-Supervised Pre-Training

no code implementations1 Mar 2022 Ramon Sanabria, Wei-Ning Hsu, Alexei Baevski, Michael Auli

In this paper, we present a controlled study to better understand the effect of such factors on the performance of pre-trained representations on automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

8 code implementations Preprint 2022 Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli

While the general idea of self-supervised learning is identical across modalities, the actual algorithms and objectives differ widely because they were developed with a single modality in mind.

Image Classification Linguistic Acceptability +5

Simple and Effective Zero-shot Cross-lingual Phoneme Recognition

2 code implementations23 Sep 2021 Qiantong Xu, Alexei Baevski, Michael Auli

Recent progress in self-training, self-supervised pretraining and unsupervised learning enabled well performing speech recognition systems without any labeled data.

speech-recognition Speech Recognition +2

Multilingual Speech Translation from Efficient Finetuning of Pretrained Models

no code implementations ACL 2021 Xian Li, Changhan Wang, Yun Tang, Chau Tran, Yuqing Tang, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli

We present a simple yet effective approach to build multilingual speech-to-text (ST) translation through efficient transfer learning from a pretrained speech encoder and text decoder.

Text Generation Transfer Learning +1

Discriminative Reranking for Neural Machine Translation

no code implementations ACL 2021 Ann Lee, Michael Auli, Marc{'}Aurelio Ranzato

Reranking models enable the integration of rich features to select a better output hypothesis within an n-best list or lattice.

Data Augmentation Machine Translation +2

Unsupervised Speech Recognition

3 code implementations NeurIPS 2021 Alexei Baevski, Wei-Ning Hsu, Alexis Conneau, Michael Auli

Despite rapid progress in the recent past, current speech recognition systems still require labeled training data which limits this technology to a small fraction of the languages spoken around the globe.

speech-recognition Speech Recognition +1

Large-Scale Self- and Semi-Supervised Learning for Speech Translation

no code implementations14 Apr 2021 Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau

In this paper, we improve speech translation (ST) through effectively leveraging large quantities of unlabeled speech and text data in different and complementary ways.

Language Modelling Translation

Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training

2 code implementations2 Apr 2021 Wei-Ning Hsu, Anuroop Sriram, Alexei Baevski, Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Jacob Kahn, Ann Lee, Ronan Collobert, Gabriel Synnaeve, Michael Auli

On a large-scale competitive setup, we show that pre-training on unlabeled in-domain data reduces the gap between models trained on in-domain and out-of-domain labeled data by 66%-73%.

Self-Supervised Learning

Reservoir Transformers

no code implementations ACL 2021 Sheng Shen, Alexei Baevski, Ari S. Morcos, Kurt Keutzer, Michael Auli, Douwe Kiela

We demonstrate that transformers obtain impressive performance even when some of the layers are randomly initialized and never updated.

BIG-bench Machine Learning Language Modelling +2

Language Models not just for Pre-training: Fast Online Neural Noisy Channel Modeling

1 code implementation WMT (EMNLP) 2020 Shruti Bhosale, Kyra Yee, Sergey Edunov, Michael Auli

Pre-training models on vast quantities of unlabeled data has emerged as an effective approach to improving accuracy on many NLP tasks.

 Ranked #1 on Machine Translation on WMT2016 Romanian-English (using extra training data)

Machine Translation Translation

Multilingual Speech Translation with Efficient Finetuning of Pretrained Models

no code implementations24 Oct 2020 Xian Li, Changhan Wang, Yun Tang, Chau Tran, Yuqing Tang, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli

We present a simple yet effective approach to build multilingual speech-to-text (ST) translation by efficient transfer learning from pretrained speech encoder and text decoder.

Cross-Lingual Transfer Text Generation +2

A Comparison of Discrete Latent Variable Models for Speech Representation Learning

no code implementations24 Oct 2020 Henry Zhou, Alexei Baevski, Michael Auli

Neural latent variable models enable the discovery of interesting structure in speech audio data.

Representation Learning

Self-training and Pre-training are Complementary for Speech Recognition

3 code implementations22 Oct 2020 Qiantong Xu, Alexei Baevski, Tatiana Likhomanenko, Paden Tomasello, Alexis Conneau, Ronan Collobert, Gabriel Synnaeve, Michael Auli

Self-training and unsupervised pre-training have emerged as effective approaches to improve speech recognition systems using unlabeled data.

 Ranked #1 on Speech Recognition on LibriSpeech train-clean-100 test-other (using extra training data)

speech-recognition Speech Recognition +1

Beyond English-Centric Multilingual Machine Translation

7 code implementations21 Oct 2020 Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin

Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages.

Machine Translation Translation

Unsupervised Cross-lingual Representation Learning for Speech Recognition

5 code implementations24 Jun 2020 Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdel-rahman Mohamed, Michael Auli

This paper presents XLSR which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.

Quantization Representation Learning +2

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

22 code implementations NeurIPS 2020 Alexei Baevski, Henry Zhou, Abdel-rahman Mohamed, Michael Auli

We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.

 Ranked #1 on Speech Recognition on TIMIT (using extra training data)

Quantization Self-Supervised Learning +1

Robust and On-the-fly Dataset Denoising for Image Classification

no code implementations ECCV 2020 Jiaming Song, Lunjia Hu, Michael Auli, Yann Dauphin, Tengyu Ma

We address this problem by reasoning counterfactually about the loss distribution of examples with uniform random labels had they were trained with the real examples, and use this information to remove noisy examples from the training set.

Classification counterfactual +4

Improving Conditioning in Context-Aware Sequence to Sequence Models

no code implementations21 Nov 2019 Xinyi Wang, Jason Weston, Michael Auli, Yacine Jernite

Neural sequence to sequence models are well established for applications which can be cast as mapping a single input sequence into a single output sequence.

abstractive question answering Data Augmentation +2

Effectiveness of self-supervised pre-training for speech recognition

2 code implementations10 Nov 2019 Alexei Baevski, Michael Auli, Abdel-rahman Mohamed

We compare self-supervised representation learning algorithms which either explicitly quantize the audio data or learn representations without quantization.

Language Modelling Quantization +3

Depth-Adaptive Transformer

no code implementations ICLR 2020 Maha Elbayad, Jiatao Gu, Edouard Grave, Michael Auli

State of the art sequence-to-sequence models for large scale tasks perform a fixed number of computations for each input sequence regardless of whether it is easy or hard to process.

Machine Translation Translation

vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations

3 code implementations ICLR 2020 Alexei Baevski, Steffen Schneider, Michael Auli

We propose vq-wav2vec to learn discrete representations of audio segments through a wav2vec-style self-supervised context prediction task.

Ranked #2 on Speech Recognition on TIMIT (using extra training data)

Clustering General Classification +3

The Source-Target Domain Mismatch Problem in Machine Translation

no code implementations EACL 2021 Jiajun Shen, Peng-Jen Chen, Matt Le, Junxian He, Jiatao Gu, Myle Ott, Michael Auli, Marc'Aurelio Ranzato

While we live in an increasingly interconnected world, different places still exhibit strikingly different cultures and many events we experience in our every day life pertain only to the specific place we live in.

Machine Translation Translation

Simple and Effective Noisy Channel Modeling for Neural Machine Translation

1 code implementation IJCNLP 2019 Kyra Yee, Nathan Ng, Yann N. Dauphin, Michael Auli

Previous work on neural noisy channel modeling relied on latent variable models that incrementally process the source and target sentence.

Machine Translation Sentence +1

ELI5: Long Form Question Answering

3 code implementations ACL 2019 Angela Fan, Yacine Jernite, Ethan Perez, David Grangier, Jason Weston, Michael Auli

We introduce the first large-scale corpus for long-form question answering, a task requiring elaborate and in-depth answers to open-ended questions.

Language Modelling Long Form Question Answering +2

GLOSS: Generative Latent Optimization of Sentence Representations

1 code implementation15 Jul 2019 Sidak Pal Singh, Angela Fan, Michael Auli

Both are trained to reconstruct the sentence based on a latent code and our model can be used to generate text.

Sentence Sentence Embedding

Better Generalization with On-the-fly Dataset Denoising

no code implementations ICLR 2019 Jiaming Song, Tengyu Ma, Michael Auli, Yann Dauphin

Memorization in over-parameterized neural networks can severely hurt generalization in the presence of mislabeled examples.

Denoising Memorization

wav2vec: Unsupervised Pre-training for Speech Recognition

5 code implementations11 Apr 2019 Steffen Schneider, Alexei Baevski, Ronan Collobert, Michael Auli

Our experiments on WSJ reduce WER of a strong character-based log-mel filterbank baseline by up to 36% when only a few hours of transcribed data is available.

Ranked #5 on Speech Recognition on TIMIT (using extra training data)

Binary Classification General Classification +2

fairseq: A Fast, Extensible Toolkit for Sequence Modeling

6 code implementations NAACL 2019 Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli

fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks.

Language Modelling Text Generation +1

Cloze-driven Pretraining of Self-attention Networks

no code implementations IJCNLP 2019 Alexei Baevski, Sergey Edunov, Yinhan Liu, Luke Zettlemoyer, Michael Auli

We present a new approach for pretraining a bi-directional transformer model that provides significant performance gains across a variety of language understanding problems.

Constituency Parsing NER +2

Modeling Human Motion with Quaternion-based Neural Networks

1 code implementation21 Jan 2019 Dario Pavllo, Christoph Feichtenhofer, Michael Auli, David Grangier

Previous work on predicting or generating 3D human pose sequences regresses either joint rotations or joint positions.

Wizard of Wikipedia: Knowledge-Powered Conversational agents

2 code implementations ICLR 2019 Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, Jason Weston

In open-domain dialogue intelligent agents should exhibit the use of knowledge, however there are few convincing demonstrations of this to date.

Dialogue Generation

Adaptive Input Representations for Neural Language Modeling

3 code implementations ICLR 2019 Alexei Baevski, Michael Auli

We introduce adaptive input representations for neural language modeling which extend the adaptive softmax of Grave et al. (2017) to input representations of variable capacity.

Language Modelling

Understanding Back-Translation at Scale

3 code implementations EMNLP 2018 Sergey Edunov, Myle Ott, Michael Auli, David Grangier

An effective method to improve neural machine translation with monolingual data is to augment the parallel training corpus with back-translations of target language sentences.

Ranked #2 on Machine Translation on WMT2014 English-German (using extra training data)

Machine Translation Translation

Scaling Neural Machine Translation

5 code implementations WS 2018 Myle Ott, Sergey Edunov, David Grangier, Michael Auli

Sequence to sequence learning models still require several days to reach state of the art performance on large benchmark datasets using a single machine.

Machine Translation Question Answering +1

Analyzing Uncertainty in Neural Machine Translation

1 code implementation ICML 2018 Myle Ott, Michael Auli, David Grangier, Marc'Aurelio Ranzato

We propose tools and metrics to assess how uncertainty in the data is captured by the model distribution and how it affects search strategies that generate translations.

Machine Translation Sentence +2

Controllable Abstractive Summarization

no code implementations WS 2018 Angela Fan, David Grangier, Michael Auli

Current models for document summarization disregard user preferences such as the desired length, style, the entities that the user might be interested in, or how much of the document the user has already read.

Abstractive Text Summarization Document Summarization

Classical Structured Prediction Losses for Sequence to Sequence Learning

1 code implementation NAACL 2018 Sergey Edunov, Myle Ott, Michael Auli, David Grangier, Marc'Aurelio Ranzato

There has been much recent work on training neural attention models at the sequence-level using either reinforcement learning-style methods or by optimizing the beam.

Abstractive Text Summarization Machine Translation +3

Iterative Refinement for Machine Translation

no code implementations20 Oct 2016 Roman Novak, Michael Auli, David Grangier

Existing machine translation decoding algorithms generate translations in a strictly monotonic fashion and never revisit previous decisions.

Machine Translation Sentence +1

Vocabulary Selection Strategies for Neural Machine Translation

no code implementations1 Oct 2016 Gurvan L'Hostis, David Grangier, Michael Auli

Classical translation models constrain the space of possible outputs by selecting a subset of translation rules based on the input sentence.

Machine Translation Sentence +1

Neural Network-based Word Alignment through Score Aggregation

no code implementations WS 2016 Joel Legrand, Michael Auli, Ronan Collobert

We present a simple neural network for word alignment that builds source and target word window representations to compute alignment scores for sentence pairs.

Sentence Word Alignment

Strategies for Training Large Vocabulary Neural Language Models

2 code implementations ACL 2016 Welin Chen, David Grangier, Michael Auli

Training neural network language models over large vocabularies is still computationally very costly compared to count-based models such as Kneser-Ney.

Machine Translation speech-recognition +2

deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets

no code implementations IJCNLP 2015 Michel Galley, Chris Brockett, Alessandro Sordoni, Yangfeng Ji, Michael Auli, Chris Quirk, Margaret Mitchell, Jianfeng Gao, Bill Dolan

We introduce Discriminative BLEU (deltaBLEU), a novel metric for intrinsic evaluation of generated text in tasks that admit a diverse range of possible outputs.

Sentence

Cannot find the paper you are looking for? You can Submit a new open access paper.