Search Results for author: Alexei Baevski

Found 40 papers, 23 papers with code

Adaptive Input Representations for Neural Language Modeling

3 code implementations • ICLR 2019 • Alexei Baevski, Michael Auli

We introduce adaptive input representations for neural language modeling which extend the adaptive softmax of Grave et al. (2017) to input representations of variable capacity.

Ranked #5 on Language Modelling on One Billion Word

Language Modelling

29,192

Paper
Code

Pay Less Attention with Lightweight and Dynamic Convolutions

4 code implementations • ICLR 2019 • Felix Wu, Angela Fan, Alexei Baevski, Yann N. Dauphin, Michael Auli

We predict separate convolution kernels based solely on the current time-step in order to determine the importance of context elements.

Ranked #1 on Machine Translation on WMT 2017 English-Chinese

Abstractive Text Summarization Language Modelling +2

29,192

Paper
Code

Cloze-driven Pretraining of Self-attention Networks

no code implementations • IJCNLP 2019 • Alexei Baevski, Sergey Edunov, Yinhan Liu, Luke Zettlemoyer, Michael Auli

We present a new approach for pretraining a bi-directional transformer model that provides significant performance gains across a variety of language understanding problems.

Ranked #10 on Constituency Parsing on Penn Treebank

Constituency Parsing NER +2

Paper
Add Code

Pre-trained Language Model Representations for Language Generation

1 code implementation • NAACL 2019 • Sergey Edunov, Alexei Baevski, Michael Auli

Pre-trained language model representations have been successful in a wide range of language understanding tasks.

Abstractive Text Summarization Language Modelling +4

29,192

Paper
Code

fairseq: A Fast, Extensible Toolkit for Sequence Modeling

6 code implementations • NAACL 2019 • Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli

fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks.

Language Modelling Text Generation +1

29,193

Paper
Code

wav2vec: Unsupervised Pre-training for Speech Recognition

5 code implementations • 11 Apr 2019 • Steffen Schneider, Alexei Baevski, Ronan Collobert, Michael Auli

Our experiments on WSJ reduce WER of a strong character-based log-mel filterbank baseline by up to 36% when only a few hours of transcribed data is available.

Ranked #5 on Speech Recognition on TIMIT (using extra training data)

Binary Classification General Classification +2

29,192

Paper
Code

Facebook FAIR's WMT19 News Translation Task Submission

5 code implementations • WS 2019 • Nathan Ng, Kyra Yee, Alexei Baevski, Myle Ott, Michael Auli, Sergey Edunov

This paper describes Facebook FAIR's submission to the WMT19 shared news translation task.

Ranked #1 on Machine Translation on WMT2019 English-German

Machine Translation Translation

124,527

Paper
Code

vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations

3 code implementations • ICLR 2020 • Alexei Baevski, Steffen Schneider, Michael Auli

We propose vq-wav2vec to learn discrete representations of audio segments through a wav2vec-style self-supervised context prediction task.

Ranked #2 on Speech Recognition on TIMIT (using extra training data)

Clustering General Classification +3

29,192

Paper
Code

Effectiveness of self-supervised pre-training for speech recognition

2 code implementations • 10 Nov 2019 • Alexei Baevski, Michael Auli, Abdel-rahman Mohamed

We compare self-supervised representation learning algorithms which either explicitly quantize the audio data or learn representations without quantization.

Language Modelling Quantization +3

373

Paper
Code

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

22 code implementations • NeurIPS 2020 • Alexei Baevski, Henry Zhou, Abdel-rahman Mohamed, Michael Auli

We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.

Ranked #1 on Speech Recognition on TIMIT (using extra training data)

Quantization Self-Supervised Learning +1

124,527

Paper
Code

Unsupervised Cross-lingual Representation Learning for Speech Recognition

6 code implementations • 24 Jun 2020 • Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdel-rahman Mohamed, Michael Auli

This paper presents XLSR which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.

Quantization Representation Learning +2

124,527

Paper
Code

Self-training and Pre-training are Complementary for Speech Recognition

3 code implementations • 22 Oct 2020 • Qiantong Xu, Alexei Baevski, Tatiana Likhomanenko, Paden Tomasello, Alexis Conneau, Ronan Collobert, Gabriel Synnaeve, Michael Auli

Self-training and unsupervised pre-training have emerged as effective approaches to improve speech recognition systems using unlabeled data.

Ranked #1 on Speech Recognition on LibriSpeech train-clean-100 test-other (using extra training data)

speech-recognition Speech Recognition +1

29,193

Paper
Code

Multilingual Speech Translation with Efficient Finetuning of Pretrained Models

no code implementations • 24 Oct 2020 • Xian Li, Changhan Wang, Yun Tang, Chau Tran, Yuqing Tang, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli

We present a simple yet effective approach to build multilingual speech-to-text (ST) translation by efficient transfer learning from pretrained speech encoder and text decoder.

Cross-Lingual Transfer Text Generation +2

Paper
Add Code

A Comparison of Discrete Latent Variable Models for Speech Representation Learning

no code implementations • 24 Oct 2020 • Henry Zhou, Alexei Baevski, Michael Auli

Neural latent variable models enable the discovery of interesting structure in speech audio data.

Representation Learning

Paper
Add Code

The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling

2 code implementations • 23 Nov 2020 • Tu Anh Nguyen, Maureen de Seyssel, Patricia Rozé, Morgane Rivière, Evgeny Kharitonov, Alexei Baevski, Ewan Dunbar, Emmanuel Dupoux

We introduce a new unsupervised task, spoken language modeling: the learning of linguistic representations from raw audio signals without any labels, along with the Zero Resource Speech Benchmark 2021: a suite of 4 black-box, zero-shot metrics probing for the quality of the learned models at 4 linguistic levels: phonetics, lexicon, syntax and semantics.

Clustering Language Modelling +1

Paper
Code

Reservoir Transformers

no code implementations • ACL 2021 • Sheng Shen, Alexei Baevski, Ari S. Morcos, Kurt Keutzer, Michael Auli, Douwe Kiela

We demonstrate that transformers obtain impressive performance even when some of the layers are randomly initialized and never updated.

BIG-bench Machine Learning Language Modelling +2

Paper
Add Code

Uncovering the impact of learning rate for global magnitude pruning

no code implementations • 1 Jan 2021 • Janice Lan, Rudy Chin, Alexei Baevski, Ari S. Morcos

However, prior work has implicitly assumed that the best training configuration for model performance was also the best configuration for mask discovery.

Paper
Add Code

Generative Spoken Language Modeling from Raw Audio

2 code implementations • 1 Feb 2021 • Kushal Lakhotia, Evgeny Kharitonov, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Benjamin Bolte, Tu-Anh Nguyen, Jade Copet, Alexei Baevski, Adelrahman Mohamed, Emmanuel Dupoux

We introduce Generative Spoken Language Modeling, the task of learning the acoustic and linguistic characteristics of a language from raw audio (no text, no labels), and a set of metrics to automatically evaluate the learned representations at acoustic and linguistic levels for both encoding and generation.

Ranked #1 on Resynthesis on LibriSpeech

Language Modelling Resynthesis

29,188

Paper
Code

Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training

3 code implementations • 2 Apr 2021 • Wei-Ning Hsu, Anuroop Sriram, Alexei Baevski, Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Jacob Kahn, Ann Lee, Ronan Collobert, Gabriel Synnaeve, Michael Auli

On a large-scale competitive setup, we show that pre-training on unlabeled in-domain data reduces the gap between models trained on in-domain and out-of-domain labeled data by 66%-73%.

Self-Supervised Learning

29,192

Paper
Code

Large-Scale Self- and Semi-Supervised Learning for Speech Translation

no code implementations • 14 Apr 2021 • Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau

In this paper, we improve speech translation (ST) through effectively leveraging large quantities of unlabeled speech and text data in different and complementary ways.

Language Modelling Translation

Paper
Add Code

Unsupervised Speech Recognition

4 code implementations • NeurIPS 2021 • Alexei Baevski, Wei-Ning Hsu, Alexis Conneau, Michael Auli

Despite rapid progress in the recent past, current speech recognition systems still require labeled training data which limits this technology to a small fraction of the languages spoken around the globe.

speech-recognition Speech Recognition +1

29,193

Paper
Code

Improved Language Identification Through Cross-Lingual Self-Supervised Learning

no code implementations • 8 Jul 2021 • Andros Tjandra, Diptanu Gon Choudhury, Frank Zhang, Kritika Singh, Alexis Conneau, Alexei Baevski, Assaf Sela, Yatharth Saraf, Michael Auli

Language identification greatly impacts the success of downstream tasks such as automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Multilingual Speech Translation from Efficient Finetuning of Pretrained Models

no code implementations • ACL 2021 • Xian Li, Changhan Wang, Yun Tang, Chau Tran, Yuqing Tang, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli

We present a simple yet effective approach to build multilingual speech-to-text (ST) translation through efficient transfer learning from a pretrained speech encoder and text decoder.

Text Generation Transfer Learning +1

Paper
Add Code

Simple and Effective Zero-shot Cross-lingual Phoneme Recognition

2 code implementations • 23 Sep 2021 • Qiantong Xu, Alexei Baevski, Michael Auli

Recent progress in self-training, self-supervised pretraining and unsupervised learning enabled well performing speech recognition systems without any labeled data.

speech-recognition Speech Recognition +2

29,193

Paper
Code

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

2 code implementations • 17 Nov 2021 • Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli

On the CoVoST-2 speech translation benchmark, we improve the previous state of the art by an average of 7. 4 BLEU over 21 translation directions into English.

Ranked #1 on Language Identification on VoxLingua107 (using extra training data)

Language Identification Representation Learning +3

29,192

Paper
Code

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

9 code implementations • Preprint 2022 • Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli

While the general idea of self-supervised learning is identical across modalities, the actual algorithms and objectives differ widely because they were developed with a single modality in mind.

Ranked #1 on Paraphrase Identification on Quora Question Pairs (Accuracy metric)

Image Classification Linguistic Acceptability +5

124,489

Paper
Code

Measuring the Impact of Individual Domain Factors in Self-Supervised Pre-Training

no code implementations • 1 Mar 2022 • Ramon Sanabria, Wei-Ning Hsu, Alexei Baevski, Michael Auli

In this paper, we present a controlled study to better understand the effect of such factors on the performance of pre-trained representations on automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Towards End-to-end Unsupervised Speech Recognition

1 code implementation • 5 Apr 2022 • Alexander H. Liu, Wei-Ning Hsu, Michael Auli, Alexei Baevski

Unsupervised speech recognition has shown great potential to make Automatic Speech Recognition (ASR) systems accessible to every language.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

29,192

Paper
Code

Simple and Effective Unsupervised Speech Synthesis

no code implementations • 6 Apr 2022 • Alexander H. Liu, Cheng-I Jeff Lai, Wei-Ning Hsu, Michael Auli, Alexei Baevski, James Glass

We introduce the first unsupervised speech synthesis system based on a simple, yet effective recipe.

speech-recognition Speech Recognition +2

Paper
Add Code

Unified Speech-Text Pre-training for Speech Translation and Recognition

no code implementations • ACL 2022 • Yun Tang, Hongyu Gong, Ning Dong, Changhan Wang, Wei-Ning Hsu, Jiatao Gu, Alexei Baevski, Xian Li, Abdelrahman Mohamed, Michael Auli, Juan Pino

Two pre-training configurations for speech translation and recognition, respectively, are presented to alleviate subtask interference.

speech-recognition Speech Recognition +1

Paper
Add Code

On-demand compute reduction with stochastic wav2vec 2.0

no code implementations • 25 Apr 2022 • Apoorv Vyas, Wei-Ning Hsu, Michael Auli, Alexei Baevski

Our results for models pre-trained on 960h Librispeech dataset and fine-tuned on 10h of transcribed data show that using the same stochastic model, we get a smooth trade-off between word error rate (WER) and inference time with only marginal WER degradation compared to the W2V2 and SEW models trained for a specific setting.

Paper
Add Code

Offline Visual Representation Learning for Embodied Navigation

1 code implementation • 27 Apr 2022 • Karmesh Yadav, Ram Ramrakhya, Arjun Majumdar, Vincent-Pierre Berges, Sachit Kuhar, Dhruv Batra, Alexei Baevski, Oleksandr Maksymets

In this paper, we show that an alternative 2-stage strategy is far more effective: (1) offline pretraining of visual representations with self-supervised learning (SSL) using large-scale pre-rendered images of indoor environments (Omnidata), and (2) online finetuning of visuomotor representations on specific tasks with image augmentations under long learning schedules.

Representation Learning Self-Supervised Learning

Paper
Code

Wav2Vec-Aug: Improved self-supervised training with limited data

no code implementations • 27 Jun 2022 • Anuroop Sriram, Michael Auli, Alexei Baevski

Self-supervised learning (SSL) of speech representations has received much attention over the last few years but most work has focused on languages and domains with an abundance of unlabeled data.

Data Augmentation Self-Supervised Learning

Paper
Add Code

Masked Autoencoders that Listen

4 code implementations • 13 Jul 2022 • Po-Yao Huang, Hu Xu, Juncheng Li, Alexei Baevski, Michael Auli, Wojciech Galuba, Florian Metze, Christoph Feichtenhofer

Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio spectrogram patches with a high masking ratio, feeding only the non-masked tokens through encoder layers.

Ranked #2 on Speaker Identification on VoxCeleb1 (using extra training data)

Audio Classification Representation Learning +1

1,286

Paper
Code

Introducing Semantics into Speech Encoders

no code implementations • 15 Nov 2022 • Derek Xu, Shuyan Dong, Changhan Wang, Suyoun Kim, Zhaojiang Lin, Akshat Shrivastava, Shang-Wen Li, Liang-Hsuan Tseng, Alexei Baevski, Guan-Ting Lin, Hung-Yi Lee, Yizhou Sun, Wei Wang

Recent studies find existing self-supervised speech encoders contain primarily acoustic rather than semantic information.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +10

Paper
Add Code

Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language

3 code implementations • 14 Dec 2022 • Alexei Baevski, Arun Babu, Wei-Ning Hsu, Michael Auli

Current self-supervised learning algorithms are often modality-specific and require large amounts of computational resources.

Ranked #91 on Image Classification on ImageNet

Image Classification Natural Language Understanding +3

29,186

Paper
Code

AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations

no code implementations • 10 Feb 2023 • Jiachen Lian, Alexei Baevski, Wei-Ning Hsu, Michael Auli

Self-supervision has shown great potential for audio-visual speech recognition by vastly reducing the amount of labeled data required to build good systems.

Audio-Visual Speech Recognition Self-Supervised Learning +2

Paper
Add Code

OVRL-V2: A simple state-of-art baseline for ImageNav and ObjectNav

no code implementations • 14 Mar 2023 • Karmesh Yadav, Arjun Majumdar, Ram Ramrakhya, Naoki Yokoyama, Alexei Baevski, Zsolt Kira, Oleksandr Maksymets, Dhruv Batra

We present a single neural network architecture composed of task-agnostic components (ViTs, convolutions, and LSTMs) that achieves state-of-art results on both the ImageNav ("go to location in <this picture>") and ObjectNav ("find a chair") tasks without any task-specific modules like object detection, segmentation, mapping, or planning modules.

object-detection Object Detection +3

Paper
Add Code

Scaling Speech Technology to 1,000+ Languages

3 code implementations • arXiv 2023 • Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli

Expanding the language coverage of speech technology has the potential to improve access to information for many more people.

Automatic Speech Recognition Language Identification +4

29,193

Paper
Code

Toward Joint Language Modeling for Speech Units and Text

no code implementations • 12 Oct 2023 • Ju-chieh Chou, Chung-Ming Chien, Wei-Ning Hsu, Karen Livescu, Arun Babu, Alexis Conneau, Alexei Baevski, Michael Auli

However, in the field of language modeling, very little effort has been made to model them jointly.

Language Modelling Spoken Language Understanding

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.