Search Results for author: Ann Lee

Found 29 papers, 12 papers with code

fairseq Sˆ2: A Scalable and Integrable Speech Synthesis Toolkit

1 code implementation • EMNLP (ACL) 2021 • Changhan Wang, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Ann Lee, Peng-Jen Chen, Jiatao Gu, Juan Pino

This paper presents fairseq Sˆ2, a fairseq extension for speech synthesis.

Speech Synthesis

29,192

Paper
Code

Facebook AI’s WMT20 News Translation Task Submission

no code implementations • WMT (EMNLP) 2020 • Peng-Jen Chen, Ann Lee, Changhan Wang, Naman Goyal, Angela Fan, Mary Williamson, Jiatao Gu

We approach the low resource problem using two main strategies, leveraging all available data and adapting the system to the target news domain.

Data Augmentation Translation

Paper
Add Code

Seamless: Multilingual Expressive and Streaming Speech Translation

1 code implementation • 8 Dec 2023 • Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek, Yilin Yang, Ethan Ye, Ivan Evtimov, Pierre Fernandez, Cynthia Gao, Prangthip Hansanti, Elahe Kalbassi, Amanda Kallet, Artyom Kozhevnikov, Gabriel Mejia Gonzalez, Robin San Roman, Christophe Touret, Corinne Wong, Carleigh Wood, Bokai Yu, Pierre Andrews, Can Balioglu, Peng-Jen Chen, Marta R. Costa-jussà, Maha Elbayad, Hongyu Gong, Francisco Guzmán, Kevin Heffernan, Somya Jain, Justine Kao, Ann Lee, Xutai Ma, Alex Mourachko, Benjamin Peloquin, Juan Pino, Sravya Popuri, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Anna Sun, Paden Tomasello, Changhan Wang, Jeff Wang, Skyler Wang, Mary Williamson

In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion.

Multimodal Machine Translation Translation

10,144

Paper
Code

SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

2 code implementations • 22 Aug 2023 • Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Cora Meglioli, David Dale, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, Hongyu Gong, Kevin Heffernan, John Hoffman, Christopher Klaiber, Pengwei Li, Daniel Licht, Jean Maillard, Alice Rakotoarison, Kaushik Ram Sadagopan, Guillaume Wenzek, Ethan Ye, Bapi Akula, Peng-Jen Chen, Naji El Hachem, Brian Ellis, Gabriel Mejia Gonzalez, Justin Haaheim, Prangthip Hansanti, Russ Howes, Bernie Huang, Min-Jae Hwang, Hirofumi Inaguma, Somya Jain, Elahe Kalbassi, Amanda Kallet, Ilia Kulikov, Janice Lam, Daniel Li, Xutai Ma, Ruslan Mavlyutov, Benjamin Peloquin, Mohamed Ramadan, Abinesh Ramakrishnan, Anna Sun, Kevin Tran, Tuan Tran, Igor Tufanov, Vish Vogeti, Carleigh Wood, Yilin Yang, Bokai Yu, Pierre Andrews, Can Balioglu, Marta R. Costa-jussà, Onur Celebi, Maha Elbayad, Cynthia Gao, Francisco Guzmán, Justine Kao, Ann Lee, Alexandre Mourachko, Juan Pino, Sravya Popuri, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Paden Tomasello, Changhan Wang, Jeff Wang, Skyler Wang

What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages?

Automatic Speech Recognition Speech-to-Speech Translation +3

10,144

Paper
Code

Multilingual Speech-to-Speech Translation into Multiple Target Languages

no code implementations • 17 Jul 2023 • Hongyu Gong, Ning Dong, Sravya Popuri, Vedanuj Goswami, Ann Lee, Juan Pino

Despite a few studies on multilingual S2ST, their focus is the multilinguality on the source side, i. e., the translation from multiple source languages to one target language.

Language Identification Speech-to-Speech Translation +1

Paper
Add Code

Enhancing Speech-to-Speech Translation with Multiple TTS Targets

no code implementations • 10 Apr 2023 • Jiatong Shi, Yun Tang, Ann Lee, Hirofumi Inaguma, Changhan Wang, Juan Pino, Shinji Watanabe

It has been known that direct speech-to-speech translation (S2ST) models usually suffer from the data scarcity issue because of the limited existing parallel materials for both source and target speech.

Speech-to-Speech Translation Speech-to-Text Translation +1

Paper
Add Code

A Holistic Cascade System, benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation

no code implementations • 25 Jan 2023 • Wen-Chin Huang, Benjamin Peloquin, Justine Kao, Changhan Wang, Hongyu Gong, Elizabeth Salesky, Yossi Adi, Ann Lee, Peng-Jen Chen

Expressive speech-to-speech translation (S2ST) aims to transfer prosodic attributes of source speech to target speech while maintaining translation accuracy.

Speech-to-Speech Translation Translation

Paper
Add Code

UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units

1 code implementation • 15 Dec 2022 • Hirofumi Inaguma, Sravya Popuri, Ilia Kulikov, Peng-Jen Chen, Changhan Wang, Yu-An Chung, Yun Tang, Ann Lee, Shinji Watanabe, Juan Pino

We enhance the model performance by subword prediction in the first-pass decoder, advanced two-pass decoder architecture design and search strategy, and better training regularization.

Denoising Speech-to-Speech Translation +3

29,193

Paper
Code

Speech-to-Speech Translation For A Real-world Unwritten Language

no code implementations • arXiv 2022 • Peng-Jen Chen, Kevin Tran, Yilin Yang, Jingfei Du, Justine Kao, Yu-An Chung, Paden Tomasello, Paul-Ambroise Duquenne, Holger Schwenk, Hongyu Gong, Hirofumi Inaguma, Sravya Popuri, Changhan Wang, Juan Pino, Wei-Ning Hsu, Ann Lee

We use English-Taiwanese Hokkien as a case study, and present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.

Speech-to-Speech Translation Translation

Paper
Add Code

SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations

no code implementations • arXiv 2022 • Paul-Ambroise Duquenne, Hongyu Gong, Ning Dong, Jingfei Du, Ann Lee, Vedanuj Goswani, Changhan Wang, Juan Pino, Benoît Sagot, Holger Schwenk

We present SpeechMatrix, a large-scale multilingual corpus of speech-to-speech translations mined from real speech of European Parliament recordings.

Speech-to-Speech Translation Translation

Paper
Add Code

Bridging Speech and Textual Pre-trained Models with Unsupervised ASR

no code implementations • 6 Nov 2022 • Jiatong Shi, Chan-Jan Hsu, Holam Chung, Dongji Gao, Paola Garcia, Shinji Watanabe, Ann Lee, Hung-Yi Lee

To be specific, we propose to use unsupervised automatic speech recognition (ASR) as a connector that bridges different modalities used in speech and textual pre-trained models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling

no code implementations • 30 Sep 2022 • Itai Gat, Felix Kreuk, Tu Anh Nguyen, Ann Lee, Jade Copet, Gabriel Synnaeve, Emmanuel Dupoux, Yossi Adi

This work focuses on improving the robustness of discrete input representations for generative spoken language modeling.

Language Modelling Speech-to-Speech Translation

Paper
Add Code

Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation

no code implementations • 6 Apr 2022 • Sravya Popuri, Peng-Jen Chen, Changhan Wang, Juan Pino, Yossi Adi, Jiatao Gu, Wei-Ning Hsu, Ann Lee

Direct speech-to-speech translation (S2ST) models suffer from data scarcity issues as there exists little parallel S2ST data, compared to the amount of data available for conventional cascaded systems that consist of automatic speech recognition (ASR), machine translation (MT), and text-to-speech (TTS) synthesis.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Add Code

textless-lib: a Library for Textless Spoken Language Processing

1 code implementation • NAACL (ACL) 2022 • Eugene Kharitonov, Jade Copet, Kushal Lakhotia, Tu Anh Nguyen, Paden Tomasello, Ann Lee, Ali Elkahky, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi

Textless spoken language processing research aims to extend the applicability of standard NLP toolset onto spoken language and languages with few or no textual resources.

Resynthesis

496

Paper
Code

Flashlight: Enabling Innovation in Tools for Machine Learning

2 code implementations • 29 Jan 2022 • Jacob Kahn, Vineel Pratap, Tatiana Likhomanenko, Qiantong Xu, Awni Hannun, Jeff Cai, Paden Tomasello, Ann Lee, Edouard Grave, Gilad Avidov, Benoit Steiner, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert

This is in part due to the difficulties involved in prototyping new computational paradigms with existing frameworks.

BIG-bench Machine Learning

5,142

Paper
Code

Textless Speech-to-Speech Translation on Real Data

no code implementations • NAACL 2022 • Ann Lee, Hongyu Gong, Paul-Ambroise Duquenne, Holger Schwenk, Peng-Jen Chen, Changhan Wang, Sravya Popuri, Yossi Adi, Juan Pino, Jiatao Gu, Wei-Ning Hsu

To our knowledge, we are the first to establish a textless S2ST technique that can be trained with real-world data and works for multiple language pairs.

Speech-to-Speech Translation Translation

Paper
Add Code

Direct Simultaneous Speech-to-Speech Translation with Variational Monotonic Multihead Attention

no code implementations • 15 Oct 2021 • Xutai Ma, Hongyu Gong, Danni Liu, Ann Lee, Yun Tang, Peng-Jen Chen, Wei-Ning Hsu, Phillip Koehn, Juan Pino

We present a direct simultaneous speech-to-speech translation (Simul-S2ST) model, Furthermore, the generation of translation is independent from intermediate text representations.

Speech Synthesis Speech-to-Speech Translation +1

Paper
Add Code

fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit

4 code implementations • 14 Sep 2021 • Changhan Wang, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Ann Lee, Peng-Jen Chen, Jiatao Gu, Juan Pino

This paper presents fairseq S^2, a fairseq extension for speech synthesis.

Speech Synthesis

29,192

Paper
Code

Text-Free Prosody-Aware Generative Spoken Language Modeling

1 code implementation • ACL 2022 • Eugene Kharitonov, Ann Lee, Adam Polyak, Yossi Adi, Jade Copet, Kushal Lakhotia, Tu-Anh Nguyen, Morgane Rivière, Abdelrahman Mohamed, Emmanuel Dupoux, Wei-Ning Hsu

Generative Spoken Language Modeling (GSLM) \cite{Lakhotia2021} is the only prior work addressing the generative aspects of speech pre-training, which replaces text with discovered phone-like units for language modeling and shows the ability to generate meaningful novel sentences.

Language Modelling

29,192

Paper
Code

Discriminative Reranking for Neural Machine Translation

no code implementations • ACL 2021 • Ann Lee, Michael Auli, Marc{'}Aurelio Ranzato

Reranking models enable the integration of rich features to select a better output hypothesis within an n-best list or lattice.

Data Augmentation Machine Translation +2

Paper
Add Code

Direct speech-to-speech translation with discrete units

1 code implementation • ACL 2022 • Ann Lee, Peng-Jen Chen, Changhan Wang, Jiatao Gu, Sravya Popuri, Xutai Ma, Adam Polyak, Yossi Adi, Qing He, Yun Tang, Juan Pino, Wei-Ning Hsu

When target text transcripts are available, we design a joint speech and text training framework that enables the model to generate dual modality output (speech and text) simultaneously in the same inference pass.

Speech-to-Speech Translation Text Generation +1

157

Paper
Code

Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training

3 code implementations • 2 Apr 2021 • Wei-Ning Hsu, Anuroop Sriram, Alexei Baevski, Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Jacob Kahn, Ann Lee, Ronan Collobert, Gabriel Synnaeve, Michael Auli

On a large-scale competitive setup, we show that pre-training on unlabeled in-domain data reduces the gap between models trained on in-domain and out-of-domain labeled data by 66%-73%.

Self-Supervised Learning

29,192

Paper
Code

VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation

1 code implementation • ACL 2021 • Changhan Wang, Morgane Rivière, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, Emmanuel Dupoux

We introduce VoxPopuli, a large-scale multilingual corpus providing 100K hours of unlabelled speech data in 23 languages.

Ranked #3 on Speech Recognition on Common Voice French (using extra training data)

Representation Learning speech-recognition +1

491

Paper
Code

Few-shot Sequence Learning with Transformers

no code implementations • 17 Dec 2020 • Lajanugen Logeswaran, Ann Lee, Myle Ott, Honglak Lee, Marc'Aurelio Ranzato, Arthur Szlam

In the simplest setting, we append a token to an input sequence which represents the particular task to be undertaken, and show that the embedding of this token can be optimized on the fly given few labeled examples.

Few-Shot Learning

Paper
Add Code

Facebook AI's WMT20 News Translation Task Submission

no code implementations • 16 Nov 2020 • Peng-Jen Chen, Ann Lee, Changhan Wang, Naman Goyal, Angela Fan, Mary Williamson, Jiatao Gu

We approach the low resource problem using two main strategies, leveraging all available data and adapting the system to the target news domain.

Data Augmentation Translation

Paper
Add Code

Semi-Supervised Speech Recognition via Local Prior Matching

1 code implementation • 24 Feb 2020 • Wei-Ning Hsu, Ann Lee, Gabriel Synnaeve, Awni Hannun

For sequence transduction tasks like speech recognition, a strong structured prior model encodes rich information about the target space, implicitly ruling out invalid sequences by assigning them low probability.

Ranked #45 on Speech Recognition on LibriSpeech test-other

Knowledge Distillation Language Modelling +2

6,331

Paper
Code

Self-Supervised Speech Recognition via Local Prior Matching

no code implementations • 25 Sep 2019 • Wei-Ning Hsu, Ann Lee, Gabriel Synnaeve, Awni Hannun

We propose local prior matching (LPM), a self-supervised objective for speech recognition.

Language Modelling speech-recognition +1

Paper
Add Code

Self-Training for End-to-End Speech Recognition

no code implementations • 19 Sep 2019 • Jacob Kahn, Ann Lee, Awni Hannun

We revisit self-training in the context of end-to-end speech recognition.

Language Modelling Pseudo Label +2

Paper
Add Code

Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions

no code implementations • 4 Apr 2019 • Awni Hannun, Ann Lee, Qiantong Xu, Ronan Collobert

Coupled with a convolutional language model, our time-depth separable convolution architecture improves by more than 22% relative WER over the best previously reported sequence-to-sequence results on the noisy LibriSpeech test set.

Language Modelling Sequence-To-Sequence Speech Recognition +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.