Search Results for author: Alexei Baevski

Found 32 papers, 19 papers with code

Offline Visual Representation Learning for Embodied Navigation

no code implementations27 Apr 2022 Karmesh Yadav, Ram Ramrakhya, Arjun Majumdar, Vincent-Pierre Berges, Sachit Kuhar, Dhruv Batra, Alexei Baevski, Oleksandr Maksymets

In this paper, we show that an alternative 2-stage strategy is far more effective: (1) offline pretraining of visual representations with self-supervised learning (SSL) using large-scale pre-rendered images of indoor environments (Omnidata), and (2) online finetuning of visuomotor representations on specific tasks with image augmentations under long learning schedules.

Representation Learning Self-Supervised Learning

On-demand compute reduction with stochastic wav2vec 2.0

no code implementations25 Apr 2022 Apoorv Vyas, Wei-Ning Hsu, Michael Auli, Alexei Baevski

Our results for models pre-trained on 960h Librispeech dataset and fine-tuned on 10h of transcribed data show that using the same stochastic model, we get a smooth trade-off between word error rate (WER) and inference time with only marginal WER degradation compared to the W2V2 and SEW models trained for a specific setting.

Towards End-to-end Unsupervised Speech Recognition

1 code implementation5 Apr 2022 Alexander H. Liu, Wei-Ning Hsu, Michael Auli, Alexei Baevski

Unsupervised speech recognition has shown great potential to make Automatic Speech Recognition (ASR) systems accessible to every language.

Automatic Speech Recognition Unsupervised Speech Recognition

Measuring the Impact of Individual Domain Factors in Self-Supervised Pre-Training

no code implementations1 Mar 2022 Ramon Sanabria, Wei-Ning Hsu, Alexei Baevski, Michael Auli

In this paper, we present a controlled study to better understand the effect of such factors on the performance of pre-trained representations.

Automatic Speech Recognition

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

4 code implementations Preprint 2022 Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli

While the general idea of self-supervised learning is identical across modalities, the actual algorithms and objectives differ widely because they were developed with a single modality in mind.

Image Classification Linguistic Acceptability +5

Simple and Effective Zero-shot Cross-lingual Phoneme Recognition

2 code implementations23 Sep 2021 Qiantong Xu, Alexei Baevski, Michael Auli

Recent progress in self-training, self-supervised pretraining and unsupervised learning enabled well performing speech recognition systems without any labeled data.

Speech Recognition Transfer Learning +1

Multilingual Speech Translation from Efficient Finetuning of Pretrained Models

no code implementations ACL 2021 Xian Li, Changhan Wang, Yun Tang, Chau Tran, Yuqing Tang, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli

We present a simple yet effective approach to build multilingual speech-to-text (ST) translation through efficient transfer learning from a pretrained speech encoder and text decoder.

Text Generation Transfer Learning +1

Unsupervised Speech Recognition

3 code implementations NeurIPS 2021 Alexei Baevski, Wei-Ning Hsu, Alexis Conneau, Michael Auli

Despite rapid progress in the recent past, current speech recognition systems still require labeled training data which limits this technology to a small fraction of the languages spoken around the globe.

Speech Recognition Unsupervised Speech Recognition

Large-Scale Self- and Semi-Supervised Learning for Speech Translation

no code implementations14 Apr 2021 Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau

In this paper, we improve speech translation (ST) through effectively leveraging large quantities of unlabeled speech and text data in different and complementary ways.

Language Modelling Translation

Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training

2 code implementations2 Apr 2021 Wei-Ning Hsu, Anuroop Sriram, Alexei Baevski, Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Jacob Kahn, Ann Lee, Ronan Collobert, Gabriel Synnaeve, Michael Auli

On a large-scale competitive setup, we show that pre-training on unlabeled in-domain data reduces the gap between models trained on in-domain and out-of-domain labeled data by 66%-73%.

Self-Supervised Learning

Generative Spoken Language Modeling from Raw Audio

2 code implementations1 Feb 2021 Kushal Lakhotia, Evgeny Kharitonov, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Benjamin Bolte, Tu-Anh Nguyen, Jade Copet, Alexei Baevski, Adelrahman Mohamed, Emmanuel Dupoux

We introduce Generative Spoken Language Modeling, the task of learning the acoustic and linguistic characteristics of a language from raw audio (no text, no labels), and a set of metrics to automatically evaluate the learned representations at acoustic and linguistic levels for both encoding and generation.

Language Modelling Resynthesis

Uncovering the impact of learning rate for global magnitude pruning

no code implementations1 Jan 2021 Janice Lan, Rudy Chin, Alexei Baevski, Ari S. Morcos

However, prior work has implicitly assumed that the best training configuration for model performance was also the best configuration for mask discovery.

Reservoir Transformers

no code implementations ACL 2021 Sheng Shen, Alexei Baevski, Ari S. Morcos, Kurt Keutzer, Michael Auli, Douwe Kiela

We demonstrate that transformers obtain impressive performance even when some of the layers are randomly initialized and never updated.

Language Modelling Machine Translation +1

The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling

1 code implementation23 Nov 2020 Tu Anh Nguyen, Maureen de Seyssel, Patricia Rozé, Morgane Rivière, Evgeny Kharitonov, Alexei Baevski, Ewan Dunbar, Emmanuel Dupoux

We introduce a new unsupervised task, spoken language modeling: the learning of linguistic representations from raw audio signals without any labels, along with the Zero Resource Speech Benchmark 2021: a suite of 4 black-box, zero-shot metrics probing for the quality of the learned models at 4 linguistic levels: phonetics, lexicon, syntax and semantics.

Language Modelling Representation Learning

Multilingual Speech Translation with Efficient Finetuning of Pretrained Models

no code implementations24 Oct 2020 Xian Li, Changhan Wang, Yun Tang, Chau Tran, Yuqing Tang, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli

We present a simple yet effective approach to build multilingual speech-to-text (ST) translation by efficient transfer learning from pretrained speech encoder and text decoder.

Cross-Lingual Transfer Text Generation +2

A Comparison of Discrete Latent Variable Models for Speech Representation Learning

no code implementations24 Oct 2020 Henry Zhou, Alexei Baevski, Michael Auli

Neural latent variable models enable the discovery of interesting structure in speech audio data.

Representation Learning

Unsupervised Cross-lingual Representation Learning for Speech Recognition

4 code implementations24 Jun 2020 Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdel-rahman Mohamed, Michael Auli

This paper presents XLSR which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.

Quantization Representation Learning +1

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

16 code implementations NeurIPS 2020 Alexei Baevski, Henry Zhou, Abdel-rahman Mohamed, Michael Auli

We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.

 Ranked #1 on Speech Recognition on TIMIT (using extra training data)

Quantization Self-Supervised Learning +1

Effectiveness of self-supervised pre-training for speech recognition

2 code implementations10 Nov 2019 Alexei Baevski, Michael Auli, Abdel-rahman Mohamed

We compare self-supervised representation learning algorithms which either explicitly quantize the audio data or learn representations without quantization.

Quantization Representation Learning +1

vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations

3 code implementations ICLR 2020 Alexei Baevski, Steffen Schneider, Michael Auli

We propose vq-wav2vec to learn discrete representations of audio segments through a wav2vec-style self-supervised context prediction task.

Ranked #2 on Speech Recognition on TIMIT (using extra training data)

General Classification Self-Supervised Learning +1

wav2vec: Unsupervised Pre-training for Speech Recognition

5 code implementations11 Apr 2019 Steffen Schneider, Alexei Baevski, Ronan Collobert, Michael Auli

Our experiments on WSJ reduce WER of a strong character-based log-mel filterbank baseline by up to 36% when only a few hours of transcribed data is available.

Ranked #5 on Speech Recognition on TIMIT (using extra training data)

General Classification Speech Recognition +1

fairseq: A Fast, Extensible Toolkit for Sequence Modeling

5 code implementations NAACL 2019 Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli

fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks.

Language Modelling Text Generation +1

Pre-trained Language Model Representations for Language Generation

1 code implementation NAACL 2019 Sergey Edunov, Alexei Baevski, Michael Auli

Pre-trained language model representations have been successful in a wide range of language understanding tasks.

14 Abstractive Text Summarization +4

Cloze-driven Pretraining of Self-attention Networks

no code implementations IJCNLP 2019 Alexei Baevski, Sergey Edunov, Yinhan Liu, Luke Zettlemoyer, Michael Auli

We present a new approach for pretraining a bi-directional transformer model that provides significant performance gains across a variety of language understanding problems.

Constituency Parsing Named Entity Recognition +3

Adaptive Input Representations for Neural Language Modeling

2 code implementations ICLR 2019 Alexei Baevski, Michael Auli

We introduce adaptive input representations for neural language modeling which extend the adaptive softmax of Grave et al. (2017) to input representations of variable capacity.

Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.