1 code implementation • EMNLP (ACL) 2021 • Changhan Wang, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Ann Lee, Peng-Jen Chen, Jiatao Gu, Juan Pino
This paper presents fairseq Sˆ2, a fairseq extension for speech synthesis.
1 code implementation • 24 Aug 2023 • Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve
We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks.
no code implementations • 10 Aug 2023 • Tu Anh Nguyen, Wei-Ning Hsu, Antony D'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarani, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi, Emmanuel Dupoux
Recent work has shown that it is possible to resynthesize high-quality speech based, not on text, but on low bitrate discrete units that have been learned in a self-supervised fashion and can therefore capture expressive aspects of speech that are hard to transcribe (prosody, voice styles, non-verbal vocalization).
no code implementations • 2 Aug 2023 • Robin San Roman, Yossi Adi, Antoine Deleforge, Romain Serizel, Gabriel Synnaeve, Alexandre Défossez
Recently, such models have been used to synthesize audio waveforms conditioned on highly compressed representations.
no code implementations • 23 Jun 2023 • Matthew Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar, Wei-Ning Hsu
In this paper, we present Voicebox, the most versatile text-guided generative model for speech at scale.
1 code implementation • 8 Jun 2023 • Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, Alexandre Défossez
We tackle the task of conditional music generation.
Ranked #4 on
Text-to-Music Generation
on MusicCaps
1 code implementation • Interspeech 2023 • Guy Yariv, Itai Gat, Lior Wolf, Yossi Adi, Idan Schwartz
In this paper, we propose a novel method utilizing latent diffusion models trained for text-to-image-generation to generate images conditioned on audio recordings.
2 code implementations • arXiv 2023 • Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli
Expanding the language coverage of speech technology has the potential to improve access to information for many more people.
no code implementations • 22 May 2023 • Michael Hassid, Tal Remez, Tu Anh Nguyen, Itai Gat, Alexis Conneau, Felix Kreuk, Jade Copet, Alexandre Defossez, Gabriel Synnaeve, Emmanuel Dupoux, Roy Schwartz, Yossi Adi
Based on our observations, we present the largest (to the best of our knowledge) SpeechLM both in terms of number of parameters and training data.
no code implementations • 21 May 2023 • Guy Lorberbom, Itai Gat, Yossi Adi, Alex Schwing, Tamir Hazan
We show that the current version of the forward-forward algorithm is suboptimal when considering information flow in the network, resulting in a lack of collaboration between layers of the network.
no code implementations • 25 Jan 2023 • Wen-Chin Huang, Benjamin Peloquin, Justine Kao, Changhan Wang, Hongyu Gong, Elizabeth Salesky, Yossi Adi, Ann Lee, Peng-Jen Chen
Expressive speech-to-speech translation (S2ST) aims to transfer prosodic attributes of source speech to target speech while maintaining translation accuracy.
1 code implementation • 2 Jan 2023 • Amitay Sicherman, Yossi Adi
Following the findings of such an analysis, we propose practical improvements to the discrete unit for the GSLM.
no code implementations • CVPR 2023 • Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi
Moreover, we utilize self-supervised audio-visual speech model to initialize P-AVSR.
no code implementations • 21 Dec 2022 • Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi
Moreover, we utilize self-supervised audio-visual speech model to initialize P-AVSR.
Ranked #1 on
Speech Recognition
on EasyCom
1 code implementation • 19 Dec 2022 • Gallil Maimon, Yossi Adi
We introduce a suite of quantitative and qualitative evaluation metrics for this setup, and empirically demonstrate the proposed approach is significantly superior to the evaluated baselines.
1 code implementation • 22 Nov 2022 • Moshe Mandel, Or Tal, Yossi Adi
We optimize the model using both time and frequency domain loss functions.
Ranked #1 on
Bandwidth Extension
on VCTK
1 code implementation • 24 Oct 2022 • Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi
We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks.
no code implementations • 12 Oct 2022 • Itai Gat, Yossi Adi, Alexander Schwing, Tamir Hazan
Generalization bounds which assess the difference between the true risk and the empirical risk, have been studied extensively.
1 code implementation • 30 Sep 2022 • Felix Kreuk, Gabriel Synnaeve, Adam Polyak, Uriel Singer, Alexandre Défossez, Jade Copet, Devi Parikh, Yaniv Taigman, Yossi Adi
Finally, we explore the ability of the proposed method to generate audio continuation conditionally and unconditionally.
Ranked #8 on
Audio Generation
on AudioCaps
no code implementations • 30 Sep 2022 • Itai Gat, Felix Kreuk, Tu Anh Nguyen, Ann Lee, Jade Copet, Gabriel Synnaeve, Emmanuel Dupoux, Yossi Adi
This work focuses on improving the robustness of discrete input representations for generative spoken language modeling.
1 code implementation • 21 Jul 2022 • Arnon Turetzky, Tzvi Michelson, Yossi Adi, Shmuel Peleg
A network with relevant deep priors is likely to generate a cleaner version of the signal before converging on the corrupted signal.
no code implementations • 2 Jul 2022 • Shahaf Bassan, Yossi Adi, Jeffrey S. Rosenschein
We proposed an unsupervised method for segmenting symbolic music.
1 code implementation • 29 Jun 2022 • Paden Tomasello, Akshat Shrivastava, Daniel Lazar, Po-chun Hsu, Duc Le, Adithya Sagar, Ali Elkahky, Jade Copet, Wei-Ning Hsu, Yossi Adi, Robin Algayres, Tu Ahn Nguyen, Emmanuel Dupoux, Luke Zettlemoyer, Abdelrahman Mohamed
Furthermore, in addition to the human-recorded audio, we are releasing a TTS-generated version to benchmark the performance for low-resource domain adaptation of end-to-end SLU systems.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
1 code implementation • 22 Jun 2022 • Or Tal, Moshe Mandel, Felix Kreuk, Yossi Adi
By conducting a series of controlled experiments, we observe the influence of different phonetic content models as well as various feature-injection techniques on enhancement performance, considering both causal and non-causal models.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • ICLR 2022 • Alon Berliner, Guy Rotman, Yossi Adi, Roi Reichart, Tamir Hazan
Discrete variational auto-encoders (VAEs) are able to represent semantic latent spaces in generative learning.
no code implementations • 6 Apr 2022 • Sravya Popuri, Peng-Jen Chen, Changhan Wang, Juan Pino, Yossi Adi, Jiatao Gu, Wei-Ning Hsu, Ann Lee
Direct speech-to-speech translation (S2ST) models suffer from data scarcity issues as there exists little parallel S2ST data, compared to the amount of data available for conventional cascaded systems that consist of automatic speech recognition (ASR), machine translation (MT), and text-to-speech (TTS) synthesis.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+6
no code implementations • 30 Mar 2022 • Maureen de Seyssel, Marvin Lavechin, Yossi Adi, Emmanuel Dupoux, Guillaume Wisniewski
Language information, however, is very salient in the bilingual model only, suggesting CPC models learn to discriminate languages when trained on multiple languages.
no code implementations • 30 Mar 2022 • Tu Anh Nguyen, Eugene Kharitonov, Jade Copet, Yossi Adi, Wei-Ning Hsu, Ali Elkahky, Paden Tomasello, Robin Algayres, Benoit Sagot, Abdelrahman Mohamed, Emmanuel Dupoux
We introduce dGSLM, the first "textless" model able to generate audio samples of naturalistic spoken dialogues.
1 code implementation • 17 Feb 2022 • Efthymios Tzinis, Yossi Adi, Vamsi Krishna Ithapu, Buye Xu, Paris Smaragdis, Anurag Kumar
RemixIT is based on a continuous self-training scheme in which a pre-trained teacher model on out-of-domain data infers estimated pseudo-target signals for in-domain mixtures.
1 code implementation • NAACL (ACL) 2022 • Eugene Kharitonov, Jade Copet, Kushal Lakhotia, Tu Anh Nguyen, Paden Tomasello, Ann Lee, Ali Elkahky, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi
Textless spoken language processing research aims to extend the applicability of standard NLP toolset onto spoken language and languages with few or no textual resources.
no code implementations • NAACL 2022 • Ann Lee, Hongyu Gong, Paul-Ambroise Duquenne, Holger Schwenk, Peng-Jen Chen, Changhan Wang, Sravya Popuri, Yossi Adi, Juan Pino, Jiatao Gu, Wei-Ning Hsu
To our knowledge, we are the first to establish a textless S2ST technique that can be trained with real-world data and works for multiple language pairs.
no code implementations • arXiv 2021 • Felix Kreuk, Adam Polyak, Jade Copet, Eugene Kharitonov, Tu-Anh Nguyen, Morgane Rivière, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi
We decompose speech into discrete and disentangled learned representations, consisting of content units, F0, speaker, and emotion.
no code implementations • 14 Nov 2021 • Felix Kreuk, Adam Polyak, Jade Copet, Eugene Kharitonov, Tu-Anh Nguyen, Morgane Rivière, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi
We use a decomposition of the speech signal into discrete learned representations, consisting of phonetic-content units, prosodic features, speaker, and emotion.
1 code implementation • 19 Oct 2021 • Efthymios Tzinis, Yossi Adi, Vamsi K. Ithapu, Buye Xu, Anurag Kumar
Specifically, a separation teacher model is pre-trained on an out-of-domain dataset and is used to infer estimated target signals for a batch of in-domain mixtures.
2 code implementations • 14 Sep 2021 • Changhan Wang, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Ann Lee, Peng-Jen Chen, Jiatao Gu, Juan Pino
This paper presents fairseq S^2, a fairseq extension for speech synthesis.
1 code implementation • ACL 2022 • Eugene Kharitonov, Ann Lee, Adam Polyak, Yossi Adi, Jade Copet, Kushal Lakhotia, Tu-Anh Nguyen, Morgane Rivière, Abdelrahman Mohamed, Emmanuel Dupoux, Wei-Ning Hsu
Generative Spoken Language Modeling (GSLM) \cite{Lakhotia2021} is the only prior work addressing the generative aspects of speech pre-training, which replaces text with discovered phone-like units for language modeling and shows the ability to generate meaningful novel sentences.
no code implementations • ACL 2022 • Ann Lee, Peng-Jen Chen, Changhan Wang, Jiatao Gu, Sravya Popuri, Xutai Ma, Adam Polyak, Yossi Adi, Qing He, Yun Tang, Juan Pino, Wei-Ning Hsu
When target text transcripts are available, we design a joint speech and text training framework that enables the model to generate dual modality output (speech and text) simultaneously in the same inference pass.
no code implementations • 25 Jun 2021 • Ori Kabeli, Yossi Adi, Zhenyu Tang, Buye Xu, Anurag Kumar
Our stateful implementation for online separation leads to a minor drop in performance compared to the offline model; 0. 8dB for monaural inputs and 0. 3dB for binaural inputs while reaching a real-time factor of 0. 65.
1 code implementation • 20 Apr 2021 • Alexandre Défossez, Yossi Adi, Gabriel Synnaeve
DiffQ is differentiable both with respect to the unquantized weights and the number of bits used.
Ranked #25 on
Language Modelling
on WikiText-103
2 code implementations • 1 Apr 2021 • Adam Polyak, Yossi Adi, Jade Copet, Eugene Kharitonov, Kushal Lakhotia, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux
We propose using self-supervised discrete representations for the task of speech resynthesis.
2 code implementations • 1 Feb 2021 • Kushal Lakhotia, Evgeny Kharitonov, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Benjamin Bolte, Tu-Anh Nguyen, Jade Copet, Alexei Baevski, Adelrahman Mohamed, Emmanuel Dupoux
We introduce Generative Spoken Language Modeling, the task of learning the acoustic and linguistic characteristics of a language from raw audio (no text, no labels), and a set of metrics to automatically evaluate the learned representations at acoustic and linguistic levels for both encoding and generation.
Ranked #1 on
Resynthesis
on LJSpeech
no code implementations • 31 Jan 2021 • Adam Polyak, Lior Wolf, Yossi Adi, Ori Kabeli, Yaniv Taigman
Speech enhancement has seen great improvement in recent years mainly through contributions in denoising, speaker separation, and dereverberation methods that mostly deal with environmental effects on vocal audio.
2 code implementations • 4 Nov 2020 • Shlomo E. Chazan, Lior Wolf, Eliya Nachmani, Yossi Adi
The proposed approach is composed of several separation heads optimized together with a speaker classification branch.
no code implementations • 3 Sep 2020 • Shahar Segal, Yossi Adi, Benny Pinkas, Carsten Baum, Chaya Ganesh, Joseph Keshet
We present a framework that allows to certify the fairness degree of a model based on an interactive and privacy-preserving test.
1 code implementation • 2 Sep 2020 • Ke Tan, Buye Xu, Anurag Kumar, Eliya Nachmani, Yossi Adi
In addition, our approach effectively preserves the interaural cues, which improves the accuracy of sound localization.
Audio and Speech Processing Sound
no code implementations • 6 Aug 2020 • Adam Polyak, Lior Wolf, Yossi Adi, Yaniv Taigman
We present a wav-to-wav generative model for the task of singing voice conversion from any identity.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
2 code implementations • 27 Jul 2020 • Felix Kreuk, Joseph Keshet, Yossi Adi
Results suggest that our approach surpasses the baseline models and reaches state-of-the-art performance on both data sets.
3 code implementations • 23 Jun 2020 • Alexandre Defossez, Gabriel Synnaeve, Yossi Adi
The proposed model matches state-of-the-art performance of both causal and non causal methods while working directly on the raw waveform.
4 code implementations • ICML 2020 • Eliya Nachmani, Yossi Adi, Lior Wolf
We present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously.
Ranked #1 on
Speech Separation
on WSJ0-4mix
no code implementations • 23 Feb 2020 • Yossi Adi, Yaniv Nemcovsky, Alex Schwing, Tamir Hazan
Generalization bounds which assess the difference between the true risk and the empirical risk have been studied extensively.
1 code implementation • 11 Feb 2020 • Felix Kreuk, Yaniv Sheena, Joseph Keshet, Yossi Adi
Phoneme boundary detection plays an essential first step for a variety of speech processing applications such as speaker diarization, speech science, keyword spotting, etc.
no code implementations • 25 Sep 2019 • Yossi Adi, Alex Schwing, Tamir Hazan
Bayesian neural networks, which both use the negative log-likelihood loss function and average their predictions using a learned posterior over the parameters, have been used successfully across many scientific fields, partly due to their ability to `effortlessly' extract desired representations from many large-scale datasets.
1 code implementation • 7 Feb 2019 • Felix Kreuk, Yossi Adi, Bhiksha Raj, Rita Singh, Joseph Keshet
Steganography is the science of hiding a secret message within an ordinary public message, which is referred to as Carrier.
no code implementations • 9 Dec 2018 • Yossi Adi, Neil Zeghidour, Ronan Collobert, Nicolas Usunier, Vitaliy Liptchinsky, Gabriel Synnaeve
In multi-task learning, the goal is speaker prediction; we expect a performance improvement with this joint training if the two tasks of speech recognition and speaker recognition share a common set of underlying features.
no code implementations • NeurIPS 2018 • Gabi Shalev, Yossi Adi, Joseph Keshet
Deep Neural Networks are powerful models that attained remarkable results on a variety of tasks.
1 code implementation • 13 Feb 2018 • Yossi Adi, Carsten Baum, Moustapha Cisse, Benny Pinkas, Joseph Keshet
Unfortunately, once the models are sold they can be easily copied and redistributed.
no code implementations • 10 Jan 2018 • Felix Kreuk, Yossi Adi, Moustapha Cisse, Joseph Keshet
We also present two black-box attacks: where the adversarial examples were generated with a system that was trained on YOHO, but the attack is on a system that was trained on NTIMIT; and when the adversarial examples were generated with a system that was trained on Mel-spectrum feature set, but the attack is on a system that was trained on MFCC.
no code implementations • NeurIPS 2017 • Moustapha M. Cisse, Yossi Adi, Natalia Neverova, Joseph Keshet
Generating adversarial examples is a critical step for evaluating and improving the robustness of learning machines.
no code implementations • 17 Jul 2017 • Moustapha Cisse, Yossi Adi, Natalia Neverova, Joseph Keshet
Generating adversarial examples is a critical step for evaluating and improving the robustness of learning machines.
no code implementations • 5 Apr 2017 • Yaniv Sheena, Míša Hejná, Yossi Adi, Joseph Keshet
Pre-aspiration is defined as the period of glottal friction occurring in sequences of vocalic/consonantal sonorants and phonetically voiceless obstruents.
no code implementations • 28 Mar 2017 • Einat Naaman, Yossi Adi, Joseph Keshet
This task generalizes problems such as lexical access (the problem of learning the mapping between words and their possible pronunciations), and defining word neighborhoods.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 26 Oct 2016 • Yossi Adi, Joseph Keshet, Emily Cibelli, Erin Gustafson, Cynthia Clopper, Matthew Goldrick
Manually-annotated data were used to train a model that takes as input an arbitrary length segment of the acoustic signal containing a single vowel that is preceded and followed by consonants and outputs the duration of the vowel.
no code implementations • 25 Oct 2016 • Yossi Adi, Joseph Keshet, Emily Cibelli, Matthew Goldrick
We describe and analyze a simple and effective algorithm for sequence segmentation applied to speech processing tasks.
3 code implementations • 15 Aug 2016 • Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, Yoav Goldberg
The analysis sheds light on the relative strengths of different sentence embedding methods with respect to these low level prediction tasks, and on the effect of the encoded vector's dimensionality on the resulting representations.