Search Results for author: Yossi Adi

Found 44 papers, 19 papers with code

Deep Audio Waveform Prior

no code implementations21 Jul 2022 Arnon Turetzky, Tzvi Michelson, Yossi Adi, Shmuel Peleg

A network with relevant deep priors is likely to generate a cleaner version of the signal before converging on the corrupted signal.

Audio inpainting Audio Source Separation +2

A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement

1 code implementation22 Jun 2022 Or Tal, Moshe Mandel, Felix Kreuk, Yossi Adi

By conducting a series of controlled experiments, we observe the influence of different phonetic content models as well as various feature-injection techniques on enhancement performance, considering both causal and non-causal models.

Automatic Speech Recognition Self-Supervised Learning +2

Learning Discrete Structured Variational Auto-Encoder using Natural Evolution Strategies

1 code implementation ICLR 2022 Alon Berliner, Guy Rotman, Yossi Adi, Roi Reichart, Tamir Hazan

Discrete variational auto-encoders (VAEs) are able to represent semantic latent spaces in generative learning.

Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation

no code implementations6 Apr 2022 Sravya Popuri, Peng-Jen Chen, Changhan Wang, Juan Pino, Yossi Adi, Jiatao Gu, Wei-Ning Hsu, Ann Lee

Direct speech-to-speech translation (S2ST) models suffer from data scarcity issues as there exists little parallel S2ST data, compared to the amount of data available for conventional cascaded systems that consist of automatic speech recognition (ASR), machine translation (MT), and text-to-speech (TTS) synthesis.

Automatic Speech Recognition Data Augmentation +5

Probing phoneme, language and speaker information in unsupervised speech representations

no code implementations30 Mar 2022 Maureen de Seyssel, Marvin Lavechin, Yossi Adi, Emmanuel Dupoux, Guillaume Wisniewski

Language information, however, is very salient in the bilingual model only, suggesting CPC models learn to discriminate languages when trained on multiple languages.

Language Modelling

RemixIT: Continual self-training of speech enhancement models via bootstrapped remixing

no code implementations17 Feb 2022 Efthymios Tzinis, Yossi Adi, Vamsi Krishna Ithapu, Buye Xu, Paris Smaragdis, Anurag Kumar

RemixIT is based on a continuous self-training scheme in which a pre-trained teacher model on out-of-domain data infers estimated pseudo-target signals for in-domain mixtures.

Speech Enhancement Unsupervised Domain Adaptation

textless-lib: a Library for Textless Spoken Language Processing

1 code implementation NAACL (ACL) 2022 Eugene Kharitonov, Jade Copet, Kushal Lakhotia, Tu Anh Nguyen, Paden Tomasello, Ann Lee, Ali Elkahky, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi

Textless spoken language processing research aims to extend the applicability of standard NLP toolset onto spoken language and languages with few or no textual resources.

Resynthesis

Textless Speech Emotion Conversion using Discrete and Decomposed Representations

no code implementations14 Nov 2021 Felix Kreuk, Adam Polyak, Jade Copet, Eugene Kharitonov, Tu-Anh Nguyen, Morgane Rivière, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi

We use a decomposition of the speech signal into discrete learned representations, consisting of phonetic-content units, prosodic features, speaker, and emotion.

Continual self-training with bootstrapped remixing for speech enhancement

no code implementations19 Oct 2021 Efthymios Tzinis, Yossi Adi, Vamsi K. Ithapu, Buye Xu, Anurag Kumar

Specifically, a separation teacher model is pre-trained on an out-of-domain dataset and is used to infer estimated target signals for a batch of in-domain mixtures.

Speech Enhancement Unsupervised Domain Adaptation

Text-Free Prosody-Aware Generative Spoken Language Modeling

1 code implementation ACL 2022 Eugene Kharitonov, Ann Lee, Adam Polyak, Yossi Adi, Jade Copet, Kushal Lakhotia, Tu-Anh Nguyen, Morgane Rivière, Abdelrahman Mohamed, Emmanuel Dupoux, Wei-Ning Hsu

Generative Spoken Language Modeling (GSLM) \cite{Lakhotia2021} is the only prior work addressing the generative aspects of speech pre-training, which replaces text with discovered phone-like units for language modeling and shows the ability to generate meaningful novel sentences.

Language Modelling

Direct speech-to-speech translation with discrete units

no code implementations ACL 2022 Ann Lee, Peng-Jen Chen, Changhan Wang, Jiatao Gu, Sravya Popuri, Xutai Ma, Adam Polyak, Yossi Adi, Qing He, Yun Tang, Juan Pino, Wei-Ning Hsu

When target text transcripts are available, we design a joint speech and text training framework that enables the model to generate dual modality output (speech and text) simultaneously in the same inference pass.

Speech-to-Speech Translation Text Generation +1

Online Self-Attentive Gated RNNs for Real-Time Speaker Separation

no code implementations25 Jun 2021 Ori Kabeli, Yossi Adi, Zhenyu Tang, Buye Xu, Anurag Kumar

Our stateful implementation for online separation leads to a minor drop in performance compared to the offline model; 0. 8dB for monaural inputs and 0. 3dB for binaural inputs while reaching a real-time factor of 0. 65.

Speaker Separation

Differentiable Model Compression via Pseudo Quantization Noise

1 code implementation20 Apr 2021 Alexandre Défossez, Yossi Adi, Gabriel Synnaeve

Given a single hyper-parameter expressing the desired balance between the quantized model size and accuracy, DiffQ can optimize the number of bits used per individual weight or groups of weights, in a single training.

Audio Source Separation Image Classification +3

Generative Spoken Language Modeling from Raw Audio

2 code implementations1 Feb 2021 Kushal Lakhotia, Evgeny Kharitonov, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Benjamin Bolte, Tu-Anh Nguyen, Jade Copet, Alexei Baevski, Adelrahman Mohamed, Emmanuel Dupoux

We introduce Generative Spoken Language Modeling, the task of learning the acoustic and linguistic characteristics of a language from raw audio (no text, no labels), and a set of metrics to automatically evaluate the learned representations at acoustic and linguistic levels for both encoding and generation.

Language Modelling Resynthesis

High Fidelity Speech Regeneration with Application to Speech Enhancement

no code implementations31 Jan 2021 Adam Polyak, Lior Wolf, Yossi Adi, Ori Kabeli, Yaniv Taigman

Speech enhancement has seen great improvement in recent years mainly through contributions in denoising, speaker separation, and dereverberation methods that mostly deal with environmental effects on vocal audio.

Denoising Speaker Separation +2

Fairness in the Eyes of the Data: Certifying Machine-Learning Models

no code implementations3 Sep 2020 Shahar Segal, Yossi Adi, Benny Pinkas, Carsten Baum, Chaya Ganesh, Joseph Keshet

We present a framework that allows to certify the fairness degree of a model based on an interactive and privacy-preserving test.

BIG-bench Machine Learning Fairness +1

SAGRNN: Self-Attentive Gated RNN for Binaural Speaker Separation with Interaural Cue Preservation

1 code implementation2 Sep 2020 Ke Tan, Buye Xu, Anurag Kumar, Eliya Nachmani, Yossi Adi

In addition, our approach effectively preserves the interaural cues, which improves the accuracy of sound localization.

Audio and Speech Processing Sound

Unsupervised Cross-Domain Singing Voice Conversion

no code implementations6 Aug 2020 Adam Polyak, Lior Wolf, Yossi Adi, Yaniv Taigman

We present a wav-to-wav generative model for the task of singing voice conversion from any identity.

Automatic Speech Recognition speech-recognition +1

Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation

2 code implementations27 Jul 2020 Felix Kreuk, Joseph Keshet, Yossi Adi

Results suggest that our approach surpasses the baseline models and reaches state-of-the-art performance on both data sets.

Boundary Detection Contrastive Learning +1

Real Time Speech Enhancement in the Waveform Domain

2 code implementations23 Jun 2020 Alexandre Defossez, Gabriel Synnaeve, Yossi Adi

The proposed model matches state-of-the-art performance of both causal and non causal methods while working directly on the raw waveform.

Data Augmentation Speech Enhancement

Voice Separation with an Unknown Number of Multiple Speakers

2 code implementations ICML 2020 Eliya Nachmani, Yossi Adi, Lior Wolf

We present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously.

Speech Separation

On the generalization of bayesian deep nets for multi-class classification

no code implementations23 Feb 2020 Yossi Adi, Yaniv Nemcovsky, Alex Schwing, Tamir Hazan

Generalization bounds which assess the difference between the true risk and the empirical risk have been studied extensively.

General Classification Generalization Bounds +1

Phoneme Boundary Detection using Learnable Segmental Features

1 code implementation11 Feb 2020 Felix Kreuk, Yaniv Sheena, Joseph Keshet, Yossi Adi

Phoneme boundary detection plays an essential first step for a variety of speech processing applications such as speaker diarization, speech science, keyword spotting, etc.

Boundary Detection Keyword Spotting +2

PAC-Bayesian Neural Network Bounds

no code implementations25 Sep 2019 Yossi Adi, Alex Schwing, Tamir Hazan

Bayesian neural networks, which both use the negative log-likelihood loss function and average their predictions using a learned posterior over the parameters, have been used successfully across many scientific fields, partly due to their ability to `effortlessly' extract desired representations from many large-scale datasets.

Generalization Bounds

Hide and Speak: Towards Deep Neural Networks for Speech Steganography

1 code implementation7 Feb 2019 Felix Kreuk, Yossi Adi, Bhiksha Raj, Rita Singh, Joseph Keshet

Steganography is the science of hiding a secret message within an ordinary public message, which is referred to as Carrier.

To Reverse the Gradient or Not: An Empirical Comparison of Adversarial and Multi-task Learning in Speech Recognition

no code implementations9 Dec 2018 Yossi Adi, Neil Zeghidour, Ronan Collobert, Nicolas Usunier, Vitaliy Liptchinsky, Gabriel Synnaeve

In multi-task learning, the goal is speaker prediction; we expect a performance improvement with this joint training if the two tasks of speech recognition and speaker recognition share a common set of underlying features.

Multi-Task Learning Speaker Recognition +2

Fooling End-to-end Speaker Verification by Adversarial Examples

no code implementations10 Jan 2018 Felix Kreuk, Yossi Adi, Moustapha Cisse, Joseph Keshet

We also present two black-box attacks: where the adversarial examples were generated with a system that was trained on YOHO, but the attack is on a system that was trained on NTIMIT; and when the adversarial examples were generated with a system that was trained on Mel-spectrum feature set, but the attack is on a system that was trained on MFCC.

Speaker Verification

Houdini: Fooling Deep Structured Prediction Models

no code implementations17 Jul 2017 Moustapha Cisse, Yossi Adi, Natalia Neverova, Joseph Keshet

Generating adversarial examples is a critical step for evaluating and improving the robustness of learning machines.

General Classification Pose Estimation +4

Automatic Measurement of Pre-aspiration

no code implementations5 Apr 2017 Yaniv Sheena, Míša Hejná, Yossi Adi, Joseph Keshet

Pre-aspiration is defined as the period of glottal friction occurring in sequences of vocalic/consonantal sonorants and phonetically voiceless obstruents.

Structured Prediction

Learning Similarity Functions for Pronunciation Variations

no code implementations28 Mar 2017 Einat Naaman, Yossi Adi, Joseph Keshet

This task generalizes problems such as lexical access (the problem of learning the mapping between words and their possible pronunciations), and defining word neighborhoods.

Automatic Speech Recognition speech-recognition

Automatic measurement of vowel duration via structured prediction

1 code implementation26 Oct 2016 Yossi Adi, Joseph Keshet, Emily Cibelli, Erin Gustafson, Cynthia Clopper, Matthew Goldrick

Manually-annotated data were used to train a model that takes as input an arbitrary length segment of the acoustic signal containing a single vowel that is preceded and followed by consonants and outputs the duration of the vowel.

Structured Prediction

Sequence Segmentation Using Joint RNN and Structured Prediction Models

no code implementations25 Oct 2016 Yossi Adi, Joseph Keshet, Emily Cibelli, Matthew Goldrick

We describe and analyze a simple and effective algorithm for sequence segmentation applied to speech processing tasks.

Structured Prediction

Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks

3 code implementations15 Aug 2016 Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, Yoav Goldberg

The analysis sheds light on the relative strengths of different sentence embedding methods with respect to these low level prediction tasks, and on the effect of the encoded vector's dimensionality on the resulting representations.

Sentence Embedding Sentence-Embedding

Cannot find the paper you are looking for? You can Submit a new open access paper.