Search Results for author: Yossi Adi

Found 68 papers, 36 papers with code

Generative Spoken Language Modeling from Raw Audio

2 code implementations • 1 Feb 2021 • Kushal Lakhotia, Evgeny Kharitonov, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Benjamin Bolte, Tu-Anh Nguyen, Jade Copet, Alexei Baevski, Adelrahman Mohamed, Emmanuel Dupoux

We introduce Generative Spoken Language Modeling, the task of learning the acoustic and linguistic characteristics of a language from raw audio (no text, no labels), and a set of metrics to automatically evaluate the learned representations at acoustic and linguistic levels for both encoding and generation.

Ranked #1 on Resynthesis on LibriSpeech

Language Modelling Resynthesis

29,198

Paper
Code

STOP: A dataset for Spoken Task Oriented Semantic Parsing

1 code implementation • 29 Jun 2022 • Paden Tomasello, Akshat Shrivastava, Daniel Lazar, Po-chun Hsu, Duc Le, Adithya Sagar, Ali Elkahky, Jade Copet, Wei-Ning Hsu, Yossi Adi, Robin Algayres, Tu Ahn Nguyen, Emmanuel Dupoux, Luke Zettlemoyer, Abdelrahman Mohamed

Furthermore, in addition to the human-recorded audio, we are releasing a TTS-generated version to benchmark the performance for low-resource domain adaptation of end-to-end SLU systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

29,193

Paper
Code

Scaling Speech Technology to 1,000+ Languages

3 code implementations • arXiv 2023 • Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli

Expanding the language coverage of speech technology has the potential to improve access to information for many more people.

Automatic Speech Recognition Language Identification +4

29,193

Paper
Code

Text-Free Prosody-Aware Generative Spoken Language Modeling

1 code implementation • ACL 2022 • Eugene Kharitonov, Ann Lee, Adam Polyak, Yossi Adi, Jade Copet, Kushal Lakhotia, Tu-Anh Nguyen, Morgane Rivière, Abdelrahman Mohamed, Emmanuel Dupoux, Wei-Ning Hsu

Generative Spoken Language Modeling (GSLM) \cite{Lakhotia2021} is the only prior work addressing the generative aspects of speech pre-training, which replaces text with discovered phone-like units for language modeling and shows the ability to generate meaningful novel sentences.

Language Modelling

29,192

Paper
Code

fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit

4 code implementations • 14 Sep 2021 • Changhan Wang, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Ann Lee, Peng-Jen Chen, Jiatao Gu, Juan Pino

This paper presents fairseq S^2, a fairseq extension for speech synthesis.

Speech Synthesis

29,192

Paper
Code

fairseq Sˆ2: A Scalable and Integrable Speech Synthesis Toolkit

1 code implementation • EMNLP (ACL) 2021 • Changhan Wang, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Ann Lee, Peng-Jen Chen, Jiatao Gu, Juan Pino

This paper presents fairseq Sˆ2, a fairseq extension for speech synthesis.

Speech Synthesis

29,192

Paper
Code

AudioGen: Textually Guided Audio Generation

1 code implementation • 30 Sep 2022 • Felix Kreuk, Gabriel Synnaeve, Adam Polyak, Uriel Singer, Alexandre Défossez, Jade Copet, Devi Parikh, Yaniv Taigman, Yossi Adi

Finally, we explore the ability of the proposed method to generate audio continuation conditionally and unconditionally.

Ranked #12 on Audio Generation on AudioCaps

Audio Generation Descriptive

19,561

Paper
Code

Simple and Controllable Music Generation

2 code implementations • NeurIPS 2023 • Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, Alexandre Défossez

We tackle the task of conditional music generation.

Ranked #4 on Text-to-Music Generation on MusicCaps

Language Modelling Music Generation +1

19,561

Paper
Code

Code Llama: Open Foundation Models for Code

2 code implementations • 24 Aug 2023 • Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve

We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks.

Ranked #26 on Code Generation on MBPP

16k Code Generation +1

14,330

Paper
Code

High Fidelity Neural Audio Compression

2 code implementations • 24 Oct 2022 • Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi

We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks.

Audio Compression Vocal Bursts Intensity Prediction

3,156

Paper
Code

Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks

3 code implementations • 15 Aug 2016 • Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, Yoav Goldberg

The analysis sheds light on the relative strengths of different sentence embedding methods with respect to these low level prediction tasks, and on the effect of the encoded vector's dimensionality on the resulting representations.

Sentence Sentence Embedding +1

2,279

Paper
Code

Real Time Speech Enhancement in the Waveform Domain

3 code implementations • 23 Jun 2020 • Alexandre Defossez, Gabriel Synnaeve, Yossi Adi

The proposed model matches state-of-the-art performance of both causal and non causal methods while working directly on the raw waveform.

Data Augmentation Speech Enhancement

1,557

Paper
Code

Voice Separation with an Unknown Number of Multiple Speakers

4 code implementations • ICML 2020 • Eliya Nachmani, Yossi Adi, Lior Wolf

We present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously.

Ranked #2 on Speech Separation on WSJ0-4mix

Speech Separation

1,157

Paper
Code

Single channel voice separation for unknown number of speakers under reverberant and noisy settings

2 code implementations • 4 Nov 2020 • Shlomo E. Chazan, Lior Wolf, Eliya Nachmani, Yossi Adi

The proposed approach is composed of several separation heads optimized together with a speaker classification branch.

Classification General Classification

1,157

Paper
Code

textless-lib: a Library for Textless Spoken Language Processing

1 code implementation • NAACL (ACL) 2022 • Eugene Kharitonov, Jade Copet, Kushal Lakhotia, Tu Anh Nguyen, Paden Tomasello, Ann Lee, Ali Elkahky, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi

Textless spoken language processing research aims to extend the applicability of standard NLP toolset onto spoken language and languages with few or no textual resources.

Resynthesis

496

Paper
Code

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations

2 code implementations • 1 Apr 2021 • Adam Polyak, Yossi Adi, Jade Copet, Eugene Kharitonov, Kushal Lakhotia, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux

We propose using self-supervised discrete representations for the task of speech resynthesis.

Disentanglement Resynthesis +2

353

Paper
Code

Differentiable Model Compression via Pseudo Quantization Noise

1 code implementation • 20 Apr 2021 • Alexandre Défossez, Yossi Adi, Gabriel Synnaeve

DiffQ is differentiable both with respect to the unquantized weights and the number of bits used.

Ranked #29 on Language Modelling on WikiText-103

Audio Source Separation Image Classification +3

229

Paper
Code

AERO: Audio Super Resolution in the Spectral Domain

1 code implementation • 22 Nov 2022 • Moshe Mandel, Or Tal, Yossi Adi

We optimize the model using both time and frequency domain loss functions.

Ranked #1 on Bandwidth Extension on VCTK

Audio Super-Resolution Bandwidth Extension +1

172

Paper
Code

Direct speech-to-speech translation with discrete units

1 code implementation • ACL 2022 • Ann Lee, Peng-Jen Chen, Changhan Wang, Jiatao Gu, Sravya Popuri, Xutai Ma, Adam Polyak, Yossi Adi, Qing He, Yun Tang, Juan Pino, Wei-Ning Hsu

When target text transcripts are available, we design a joint speech and text training framework that enables the model to generate dual modality output (speech and text) simultaneously in the same inference pass.

Speech-to-Speech Translation Text Generation +1

157

Paper
Code

Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation

2 code implementations • 27 Jul 2020 • Felix Kreuk, Joseph Keshet, Yossi Adi

Results suggest that our approach surpasses the baseline models and reaches state-of-the-art performance on both data sets.

Boundary Detection Contrastive Learning +1

135

Paper
Code

Speaking Style Conversion in the Waveform Domain Using Discrete Self-Supervised Units

1 code implementation • 19 Dec 2022 • Gallil Maimon, Yossi Adi

We introduce DISSC, a novel, lightweight method that converts the rhythm, pitch contour and timbre of a recording to a target speaker in a textless manner.

Voice Conversion

115

Paper
Code

Transformers are Multi-State RNNs

1 code implementation • 11 Jan 2024 • Matanel Oren, Michael Hassid, Yossi Adi, Roy Schwartz

We further show that pretrained transformers can be converted into $\textit{finite}$ multi-state RNNs by fixing the size of their hidden state.

103

Paper
Code

Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring

2 code implementations • 13 Feb 2018 • Yossi Adi, Carsten Baum, Moustapha Cisse, Benny Pinkas, Joseph Keshet

Unfortunately, once the models are sold they can be easily copied and redistributed.

General Classification

Paper
Code

Phoneme Boundary Detection using Learnable Segmental Features

1 code implementation • 11 Feb 2020 • Felix Kreuk, Yaniv Sheena, Joseph Keshet, Yossi Adi

Phoneme boundary detection plays an essential first step for a variety of speech processing applications such as speaker diarization, speech science, keyword spotting, etc.

Boundary Detection Keyword Spotting +2

Paper
Code

Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation

1 code implementation • 28 Sep 2023 • Guy Yariv, Itai Gat, Sagie Benaim, Lior Wolf, Idan Schwartz, Yossi Adi

The proposed method is based on a lightweight adaptor network, which learns to map the audio-based representation to the input representation expected by the text-to-video generation model.

Text-to-Video Generation Video Generation

Paper
Code

AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation

1 code implementation • Interspeech 2023 • Guy Yariv, Itai Gat, Lior Wolf, Yossi Adi, Idan Schwartz

In this paper, we propose a novel method utilizing latent diffusion models trained for text-to-image-generation to generate images conditioned on audio recordings.

audio-visual learning Text-to-Image Generation

Paper
Code

Continual self-training with bootstrapped remixing for speech enhancement

1 code implementation • 19 Oct 2021 • Efthymios Tzinis, Yossi Adi, Vamsi K. Ithapu, Buye Xu, Anurag Kumar

Specifically, a separation teacher model is pre-trained on an out-of-domain dataset and is used to infer estimated target signals for a batch of in-domain mixtures.

Ranked #13 on Speech Enhancement on Deep Noise Suppression (DNS) Challenge

Speech Enhancement Unsupervised Domain Adaptation

Paper
Code

RemixIT: Continual self-training of speech enhancement models via bootstrapped remixing

2 code implementations • 17 Feb 2022 • Efthymios Tzinis, Yossi Adi, Vamsi Krishna Ithapu, Buye Xu, Paris Smaragdis, Anurag Kumar

RemixIT is based on a continuous self-training scheme in which a pre-trained teacher model on out-of-domain data infers estimated pseudo-target signals for in-domain mixtures.

Ranked #4 on Speech Enhancement on Deep Noise Suppression (DNS) Challenge

Speech Enhancement Unsupervised Domain Adaptation

Paper
Code

Hide and Speak: Towards Deep Neural Networks for Speech Steganography

1 code implementation • 7 Feb 2019 • Felix Kreuk, Yossi Adi, Bhiksha Raj, Rita Singh, Joseph Keshet

Steganography is the science of hiding a secret message within an ordinary public message, which is referred to as Carrier.

Paper
Code

A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement

1 code implementation • 22 Jun 2022 • Or Tal, Moshe Mandel, Felix Kreuk, Yossi Adi

By conducting a series of controlled experiments, we observe the influence of different phonetic content models as well as various feature-injection techniques on enhancement performance, considering both causal and non-causal models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Analysing Discrete Self Supervised Speech Representation for Spoken Language Modeling

1 code implementation • 2 Jan 2023 • Amitay Sicherman, Yossi Adi

Following the findings of such an analysis, we propose practical improvements to the discrete unit for the GSLM.

Language Modelling Resynthesis

Paper
Code

Automatic measurement of vowel duration via structured prediction

1 code implementation • 26 Oct 2016 • Yossi Adi, Joseph Keshet, Emily Cibelli, Erin Gustafson, Cynthia Clopper, Matthew Goldrick

Manually-annotated data were used to train a model that takes as input an arbitrary length segment of the acoustic signal containing a single vowel that is preceded and followed by consonants and outputs the duration of the vowel.

Structured Prediction

Paper
Code

Textually Pretrained Speech Language Models

1 code implementation • NeurIPS 2023 • Michael Hassid, Tal Remez, Tu Anh Nguyen, Itai Gat, Alexis Conneau, Felix Kreuk, Jade Copet, Alexandre Defossez, Gabriel Synnaeve, Emmanuel Dupoux, Roy Schwartz, Yossi Adi

In this work, we propose TWIST, a method for training SpeechLMs using a warm-start from a pretrained textual language models.

Paper
Code

Deep Audio Waveform Prior

1 code implementation • 21 Jul 2022 • Arnon Turetzky, Tzvi Michelson, Yossi Adi, Shmuel Peleg

A network with relevant deep priors is likely to generate a cleaner version of the signal before converging on the corrupted signal.

Audio inpainting Audio Source Separation +2

Paper
Code

Learning Discrete Structured Variational Auto-Encoder using Natural Evolution Strategies

1 code implementation • ICLR 2022 • Alon Berliner, Guy Rotman, Yossi Adi, Roi Reichart, Tamir Hazan

Discrete variational auto-encoders (VAEs) are able to represent semantic latent spaces in generative learning.

Paper
Code

SAGRNN: Self-Attentive Gated RNN for Binaural Speaker Separation with Interaural Cue Preservation

1 code implementation • 2 Sep 2020 • Ke Tan, Buye Xu, Anurag Kumar, Eliya Nachmani, Yossi Adi

In addition, our approach effectively preserves the interaural cues, which improves the accuracy of sound localization.

Audio and Speech Processing Sound

Paper
Code

Fooling End-to-end Speaker Verification by Adversarial Examples

no code implementations • 10 Jan 2018 • Felix Kreuk, Yossi Adi, Moustapha Cisse, Joseph Keshet

We also present two black-box attacks: where the adversarial examples were generated with a system that was trained on YOHO, but the attack is on a system that was trained on NTIMIT; and when the adversarial examples were generated with a system that was trained on Mel-spectrum feature set, but the attack is on a system that was trained on MFCC.

Speaker Verification

Paper
Add Code

Houdini: Fooling Deep Structured Prediction Models

no code implementations • 17 Jul 2017 • Moustapha Cisse, Yossi Adi, Natalia Neverova, Joseph Keshet

Generating adversarial examples is a critical step for evaluating and improving the robustness of learning machines.

General Classification Pose Estimation +4

Paper
Add Code

Learning Similarity Functions for Pronunciation Variations

no code implementations • 28 Mar 2017 • Einat Naaman, Yossi Adi, Joseph Keshet

This task generalizes problems such as lexical access (the problem of learning the mapping between words and their possible pronunciations), and defining word neighborhoods.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Automatic Measurement of Pre-aspiration

no code implementations • 5 Apr 2017 • Yaniv Sheena, Míša Hejná, Yossi Adi, Joseph Keshet

Pre-aspiration is defined as the period of glottal friction occurring in sequences of vocalic/consonantal sonorants and phonetically voiceless obstruents.

Friction Structured Prediction

Paper
Add Code

Sequence Segmentation Using Joint RNN and Structured Prediction Models

no code implementations • 25 Oct 2016 • Yossi Adi, Joseph Keshet, Emily Cibelli, Matthew Goldrick

We describe and analyze a simple and effective algorithm for sequence segmentation applied to speech processing tasks.

Segmentation Structured Prediction

Paper
Add Code

Out-of-Distribution Detection using Multiple Semantic Label Representations

no code implementations • NeurIPS 2018 • Gabi Shalev, Yossi Adi, Joseph Keshet

Deep Neural Networks are powerful models that attained remarkable results on a variety of tasks.

Out-of-Distribution Detection

Paper
Add Code

To Reverse the Gradient or Not: An Empirical Comparison of Adversarial and Multi-task Learning in Speech Recognition

no code implementations • 9 Dec 2018 • Yossi Adi, Neil Zeghidour, Ronan Collobert, Nicolas Usunier, Vitaliy Liptchinsky, Gabriel Synnaeve

In multi-task learning, the goal is speaker prediction; we expect a performance improvement with this joint training if the two tasks of speech recognition and speaker recognition share a common set of underlying features.

Multi-Task Learning Speaker Recognition +2

Paper
Add Code

Houdini: Fooling Deep Structured Visual and Speech Recognition Models with Adversarial Examples

no code implementations • NeurIPS 2017 • Moustapha M. Cisse, Yossi Adi, Natalia Neverova, Joseph Keshet

Generating adversarial examples is a critical step for evaluating and improving the robustness of learning machines.

General Classification Pose Estimation +3

Paper
Add Code

On the generalization of bayesian deep nets for multi-class classification

no code implementations • 23 Feb 2020 • Yossi Adi, Yaniv Nemcovsky, Alex Schwing, Tamir Hazan

Generalization bounds which assess the difference between the true risk and the empirical risk have been studied extensively.

General Classification Generalization Bounds +1

Paper
Add Code

Unsupervised Cross-Domain Singing Voice Conversion

no code implementations • 6 Aug 2020 • Adam Polyak, Lior Wolf, Yossi Adi, Yaniv Taigman

We present a wav-to-wav generative model for the task of singing voice conversion from any identity.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Fairness in the Eyes of the Data: Certifying Machine-Learning Models

no code implementations • 3 Sep 2020 • Shahar Segal, Yossi Adi, Benny Pinkas, Carsten Baum, Chaya Ganesh, Joseph Keshet

We present a framework that allows to certify the fairness degree of a model based on an interactive and privacy-preserving test.

BIG-bench Machine Learning Fairness +1

Paper
Add Code

High Fidelity Speech Regeneration with Application to Speech Enhancement

no code implementations • 31 Jan 2021 • Adam Polyak, Lior Wolf, Yossi Adi, Ori Kabeli, Yaniv Taigman

Speech enhancement has seen great improvement in recent years mainly through contributions in denoising, speaker separation, and dereverberation methods that mostly deal with environmental effects on vocal audio.

Denoising Speaker Separation +3

Paper
Add Code

Online Self-Attentive Gated RNNs for Real-Time Speaker Separation

no code implementations • 25 Jun 2021 • Ori Kabeli, Yossi Adi, Zhenyu Tang, Buye Xu, Anurag Kumar

Our stateful implementation for online separation leads to a minor drop in performance compared to the offline model; 0. 8dB for monaural inputs and 0. 3dB for binaural inputs while reaching a real-time factor of 0. 65.

blind source separation Speaker Separation

Paper
Add Code

Textless Speech Emotion Conversion using Discrete and Decomposed Representations

no code implementations • 14 Nov 2021 • Felix Kreuk, Adam Polyak, Jade Copet, Eugene Kharitonov, Tu-Anh Nguyen, Morgane Rivière, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi

We use a decomposition of the speech signal into discrete learned representations, consisting of phonetic-content units, prosodic features, speaker, and emotion.

Paper
Add Code

PAC-Bayesian Neural Network Bounds

no code implementations • 25 Sep 2019 • Yossi Adi, Alex Schwing, Tamir Hazan

Bayesian neural networks, which both use the negative log-likelihood loss function and average their predictions using a learned posterior over the parameters, have been used successfully across many scientific fields, partly due to their ability to `effortlessly' extract desired representations from many large-scale datasets.

Generalization Bounds

Paper
Add Code

Textless Speech-to-Speech Translation on Real Data

no code implementations • NAACL 2022 • Ann Lee, Hongyu Gong, Paul-Ambroise Duquenne, Holger Schwenk, Peng-Jen Chen, Changhan Wang, Sravya Popuri, Yossi Adi, Juan Pino, Jiatao Gu, Wei-Ning Hsu

To our knowledge, we are the first to establish a textless S2ST technique that can be trained with real-world data and works for multiple language pairs.

Speech-to-Speech Translation Translation

Paper
Add Code

Textless Speech Emotion Conversion using Decomposed & Discrete Representations

no code implementations • arXiv 2021 • Felix Kreuk, Adam Polyak, Jade Copet, Eugene Kharitonov, Tu-Anh Nguyen, Morgane Rivière, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi

We decompose speech into discrete and disentangled learned representations, consisting of content units, F0, speaker, and emotion.

Paper
Add Code

Generative Spoken Dialogue Language Modeling

no code implementations • 30 Mar 2022 • Tu Anh Nguyen, Eugene Kharitonov, Jade Copet, Yossi Adi, Wei-Ning Hsu, Ali Elkahky, Paden Tomasello, Robin Algayres, Benoit Sagot, Abdelrahman Mohamed, Emmanuel Dupoux

We introduce dGSLM, the first "textless" model able to generate audio samples of naturalistic spoken dialogues.

Language Modelling

Paper
Add Code

Probing phoneme, language and speaker information in unsupervised speech representations

no code implementations • 30 Mar 2022 • Maureen de Seyssel, Marvin Lavechin, Yossi Adi, Emmanuel Dupoux, Guillaume Wisniewski

Language information, however, is very salient in the bilingual model only, suggesting CPC models learn to discriminate languages when trained on multiple languages.

Language Modelling

Paper
Add Code

Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation

no code implementations • 6 Apr 2022 • Sravya Popuri, Peng-Jen Chen, Changhan Wang, Juan Pino, Yossi Adi, Jiatao Gu, Wei-Ning Hsu, Ann Lee

Direct speech-to-speech translation (S2ST) models suffer from data scarcity issues as there exists little parallel S2ST data, compared to the amount of data available for conventional cascaded systems that consist of automatic speech recognition (ASR), machine translation (MT), and text-to-speech (TTS) synthesis.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Add Code

Unsupervised Symbolic Music Segmentation using Ensemble Temporal Prediction Errors

no code implementations • 2 Jul 2022 • Shahaf Bassan, Yossi Adi, Jeffrey S. Rosenschein

We proposed an unsupervised method for segmenting symbolic music.

Segmentation

Paper
Add Code

Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling

no code implementations • 30 Sep 2022 • Itai Gat, Felix Kreuk, Tu Anh Nguyen, Ann Lee, Jade Copet, Gabriel Synnaeve, Emmanuel Dupoux, Yossi Adi

This work focuses on improving the robustness of discrete input representations for generative spoken language modeling.

Language Modelling Speech-to-Speech Translation

Paper
Add Code

On the Importance of Gradient Norm in PAC-Bayesian Bounds

no code implementations • 12 Oct 2022 • Itai Gat, Yossi Adi, Alexander Schwing, Tamir Hazan

Generalization bounds which assess the difference between the true risk and the empirical risk, have been studied extensively.

Generalization Bounds

Paper
Add Code

ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement

no code implementations • 21 Dec 2022 • Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi

Moreover, we utilize self-supervised audio-visual speech model to initialize P-AVSR.

Ranked #1 on Speech Recognition on EasyCom

Audio-Visual Speech Recognition Resynthesis +6

Paper
Add Code

A Holistic Cascade System, benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation

no code implementations • 25 Jan 2023 • Wen-Chin Huang, Benjamin Peloquin, Justine Kao, Changhan Wang, Hongyu Gong, Elizabeth Salesky, Yossi Adi, Ann Lee, Peng-Jen Chen

Expressive speech-to-speech translation (S2ST) aims to transfer prosodic attributes of source speech to target speech while maintaining translation accuracy.

Speech-to-Speech Translation Translation

Paper
Add Code

ReVISE: Self-Supervised Speech Resynthesis With Visual Input for Universal and Generalized Speech Regeneration

no code implementations • CVPR 2023 • Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi

Moreover, we utilize self-supervised audio-visual speech model to initialize P-AVSR.

Audio-Visual Speech Recognition Resynthesis +5

Paper
Add Code

Layer Collaboration in the Forward-Forward Algorithm

no code implementations • 21 May 2023 • Guy Lorberbom, Itai Gat, Yossi Adi, Alex Schwing, Tamir Hazan

We show that the current version of the forward-forward algorithm is suboptimal when considering information flow in the network, resulting in a lack of collaboration between layers of the network.

Paper
Add Code

EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis

no code implementations • 10 Aug 2023 • Tu Anh Nguyen, Wei-Ning Hsu, Antony D'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarani, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi, Emmanuel Dupoux

Recent work has shown that it is possible to resynthesize high-quality speech based, not on text, but on low bitrate discrete units that have been learned in a self-supervised fashion and can therefore capture expressive aspects of speech that are hard to transcribe (prosody, voice styles, non-verbal vocalization).

Resynthesis Speech Synthesis

Paper
Add Code

Low-Resource Self-Supervised Learning with SSL-Enhanced TTS

no code implementations • 29 Sep 2023 • Po-chun Hsu, Ali Elkahky, Wei-Ning Hsu, Yossi Adi, Tu Anh Nguyen, Jade Copet, Emmanuel Dupoux, Hung-Yi Lee, Abdelrahman Mohamed

Self-supervised learning (SSL) techniques have achieved remarkable results in various speech processing tasks.

Self-Supervised Learning

Paper
Add Code

Generative Spoken Language Model based on continuous word-sized audio tokens

no code implementations • 8 Oct 2023 • Robin Algayres, Yossi Adi, Tu Anh Nguyen, Jade Copet, Gabriel Synnaeve, Benoit Sagot, Emmanuel Dupoux

In NLP, text language models based on words or subwords are known to outperform their character-based counterparts.

Language Modelling

Paper
Add Code

Masked Audio Generation using a Single Non-Autoregressive Transformer

no code implementations • 9 Jan 2024 • Alon Ziv, Itai Gat, Gael Le Lan, Tal Remez, Felix Kreuk, Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi

We introduce MAGNeT, a masked generative sequence modeling method that operates directly over several streams of audio tokens.

Audio Generation

Paper
Add Code

The Larger the Better? Improved LLM Code-Generation via Budget Reallocation

no code implementations • 31 Mar 2024 • Michael Hassid, Tal Remez, Jonas Gehring, Roy Schwartz, Yossi Adi

On the other hand, in scenarios where unit-tests are unavailable, a ranking-based selection of candidates from the smaller model falls short of the performance of a single output from larger ones.

Code Generation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.