Search Results for author: Nima Mesgarani

Found 16 papers, 10 papers with code

StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis

no code implementations30 May 2022 Yinghao Aaron Li, Cong Han, Nima Mesgarani

Text-to-Speech (TTS) has recently seen great progress in synthesizing high-quality speech owing to the rapid development of parallel TTS systems, but producing speech with naturalistic prosodic variations, speaking styles and emotional tones remains challenging.

Data Augmentation Self-Supervised Learning +2

Understanding Adaptive, Multiscale Temporal Integration In Deep Speech Recognition Systems

1 code implementation NeurIPS 2021 Menoua Keshishian, Samuel Norman-Haignere, Nima Mesgarani

We show that training causes these integration windows to shrink at early layers and expand at higher layers, creating a hierarchy of integration windows across the network.

speech-recognition Speech Recognition

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

1 code implementation21 Jul 2021 Yinghao Aaron Li, Ali Zare, Nima Mesgarani

We present an unsupervised non-parallel many-to-many voice conversion (VC) method using a generative adversarial network (GAN) called StarGAN v2.

Voice Conversion

Group Communication with Context Codec for Lightweight Source Separation

1 code implementation14 Dec 2020 Yi Luo, Cong Han, Nima Mesgarani

A context codec module, containing a context encoder and a context decoder, is designed as a learnable downsampling and upsampling module to decrease the length of a sequential feature processed by the separation module.

Speech Enhancement Speech Separation

Separating Varying Numbers of Sources with Auxiliary Autoencoding Loss

no code implementations27 Mar 2020 Yi Luo, Nima Mesgarani

Many recent source separation systems are designed to separate a fixed number of sources out of a mixture.

End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation

2 code implementations30 Oct 2019 Yi Luo, Zhuo Chen, Nima Mesgarani, Takuya Yoshioka

An important problem in ad-hoc microphone speech separation is how to guarantee the robustness of a system with respect to the locations and numbers of microphones.

Speech Separation

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation

17 code implementations20 Sep 2018 Yi Luo, Nima Mesgarani

The majority of the previous methods have formulated the separation problem through the time-frequency representation of the mixed signal, which has several drawbacks, including the decoupling of the phase and magnitude of the signal, the suboptimality of time-frequency representation for speech separation, and the long latency in calculating the spectrograms.

Multi-task Audio Source Seperation Music Source Separation +3

Real-time Single-channel Dereverberation and Separation with Time-domainAudio Separation Network

1 code implementation ISCA Interspeech 2018 Yi Luo, Nima Mesgarani

We investigate the recently proposed Time-domain Audio Sep-aration Network (TasNet) in the task of real-time single-channel speech dereverberation.

Denoising Speech Dereverberation +1

TasNet: time-domain audio separation network for real-time, single-channel speech separation

3 code implementations1 Nov 2017 Yi Luo, Nima Mesgarani

We directly model the signal in the time-domain using an encoder-decoder framework and perform the source separation on nonnegative encoder outputs.

Speech Separation

Lip2AudSpec: Speech reconstruction from silent lip movements video

1 code implementation26 Oct 2017 Hassan Akbari, Himani Arora, Liangliang Cao, Nima Mesgarani

In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos.

Lip Reading

Visualizing and Understanding Multilayer Perceptron Models: A Case Study in Speech Processing

no code implementations ICML 2017 Tasha Nagamine, Nima Mesgarani

Despite the recent success of deep learning, the nature of the transformations they apply to the input features remains poorly understood.

Speaker-independent Speech Separation with Deep Attractor Network

no code implementations12 Jul 2017 Yi Luo, Zhuo Chen, Nima Mesgarani

A reference point attractor is created in the embedding space to represent each speaker which is defined as the centroid of the speaker in the embedding space.

Speech Separation

Deep attractor network for single-microphone speaker separation

1 code implementation27 Nov 2016 Zhuo Chen, Yi Luo, Nima Mesgarani

We propose a novel deep learning framework for single channel speech separation by creating attractor points in high dimensional embedding space of the acoustic signals which pull together the time-frequency bins corresponding to each source.

Speaker Separation Speech Separation

Deep Clustering and Conventional Networks for Music Separation: Stronger Together

no code implementations18 Nov 2016 Yi Luo, Zhuo Chen, John R. Hershey, Jonathan Le Roux, Nima Mesgarani

Deep clustering is the first method to handle general audio separation scenarios with multiple sources of the same type and an arbitrary number of sources, performing impressively in speaker-independent speech separation tasks.

Deep Clustering Multi-Task Learning +2

Cannot find the paper you are looking for? You can Submit a new open access paper.