no code implementations • 1 Feb 2024 • Masahito Togami, Jean-Marc Valin, Karim Helwani, Ritwik Giri, Umut Isik, Michael M. Goodwin
The algorithm runs in real-time on 10-ms frames with a 40 ms of look-ahead.
no code implementations • 18 Jun 2022 • Zhepei Wang, Ritwik Giri, Shrikant Venkataramani, Umut Isik, Jean-Marc Valin, Paris Smaragdis, Mike Goodwin, Arvindh Krishnaswamy
In this work, we propose Exformer, a time-domain architecture for target speaker extraction.
no code implementations • 16 Jun 2022 • Jean-Marc Valin, Ritwik Giri, Shrikant Venkataramani, Umut Isik, Arvindh Krishnaswamy
In real life, room effect, also known as room reverberation, and the present background noise degrade the quality of speech.
no code implementations • 28 Mar 2022 • Siyuan Yuan, Zhepei Wang, Umut Isik, Ritwik Giri, Jean-Marc Valin, Michael M. Goodwin, Arvindh Krishnaswamy
Singing voice separation aims to separate music into vocals and accompaniment components.
1 code implementation • 23 Feb 2022 • Krishna Subramani, Jean-Marc Valin, Umut Isik, Paris Smaragdis, Arvindh Krishnaswamy
Neural vocoders have recently demonstrated high quality speech synthesis, but typically require a high computational complexity.
2 code implementations • 22 Feb 2022 • Jean-Marc Valin, Umut Isik, Paris Smaragdis, Arvindh Krishnaswamy
Neural speech synthesis models can synthesize high quality speech but typically require a high computational complexity to do so.
no code implementations • 8 Jun 2021 • Ritwik Giri, Shrikant Venkataramani, Jean-Marc Valin, Umut Isik, Arvindh Krishnaswamy
The presence of multiple talkers in the surrounding environment poses a difficult challenge for real-time speech communication systems considering the constraints on network size and complexity.
no code implementations • 16 Feb 2021 • Zhepei Wang, Ritwik Giri, Umut Isik, Jean-Marc Valin, Arvindh Krishnaswamy
Given a limited set of labeled data, we present a method to leverage a large volume of unlabeled data to improve the model's performance.
no code implementations • 12 Feb 2021 • Jonah Casebeer, Vinjai Vale, Umut Isik, Jean-Marc Valin, Ritwik Giri, Arvindh Krishnaswamy
Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech output.
1 code implementation • 18 Aug 2020 • Wayne Chi, Prachi Kumar, Suri Yaddanapudi, Rahul Suresh, Umut Isik
We describe a novel approach for generating music using a self-correcting, non-chronological, autoregressive model.
no code implementations • 11 Aug 2020 • Umut Isik, Ritwik Giri, Neerad Phansalkar, Jean-Marc Valin, Karim Helwani, Arvindh Krishnaswamy
Neural network applications generally benefit from larger-sized models, but for current speech enhancement models, larger scale networks often suffer from decreased robustness to the variety of real-world use cases beyond what is encountered in training data.
no code implementations • 20 Feb 2020 • Jonah Casebeer, Umut Isik, Shrikant Venkataramani, Arvindh Krishnaswamy
Many neural speech enhancement and source separation systems operate in the time-frequency domain.
1 code implementation • 30 Jan 2020 • Bahareh Tolooshams, Ritwik Giri, Andrew H. Song, Umut Isik, Arvindh Krishnaswamy
Supervised deep learning has gained significant attention for speech enhancement recently.
Ranked #2 on
Speech Enhancement
on CHiME-3
no code implementations • WS 2020 • Marcello Federico, Robert Enyedi, Roberto Barra-Chicote, Ritwik Giri, Umut Isik, Arvindh Krishnaswamy, Hassan Sawaf
We present enhancements to a speech-to-speech translation pipeline in order to perform automatic dubbing.