2 code implementations • 31 May 2024 • Jean-Marc Valin, Ahmed Mustafa, Jan Büthe
Neural vocoders are now being used in a wide range of speech processing applications.
no code implementations • 1 Feb 2024 • Masahito Togami, Jean-Marc Valin, Karim Helwani, Ritwik Giri, Umut Isik, Michael M. Goodwin
The algorithm runs in real-time on 10-ms frames with a 40 ms of look-ahead.
no code implementations • 25 Sep 2023 • Jan Büthe, Ahmed Mustafa, Jean-Marc Valin, Karim Helwani, Michael M. Goodwin
Speech codec enhancement methods are designed to remove distortions added by speech codecs.
no code implementations • 25 Sep 2023 • Krishna Subramani, Jean-Marc Valin, Jan Buethe, Paris Smaragdis, Mike Goodwin
Pitch estimation is an essential step of many speech processing algorithms, including speech coding, synthesis, and enhancement.
no code implementations • 13 Jul 2023 • Jan Büthe, Jean-Marc Valin, Ahmed Mustafa
Classical speech coding uses low-complexity postfilters with zero lookahead to enhance the quality of coded speech, but their effectiveness is limited by their simplicity.
no code implementations • 23 Feb 2023 • Zhepei Wang, Ritwik Giri, Devansh Shah, Jean-Marc Valin, Michael M. Goodwin, Paris Smaragdis
In this study, we present an approach to train a single speech enhancement network that can perform both personalized and non-personalized speech enhancement.
no code implementations • 8 Dec 2022 • Ahmed Mustafa, Jean-Marc Valin, Jan Büthe, Paris Smaragdis, Mike Goodwin
GAN vocoders are currently one of the state-of-the-art methods for building high-quality neural waveform generative models.
no code implementations • 8 Dec 2022 • Jean-Marc Valin, Jan Büthe, Ahmed Mustafa
Robustness to packet loss is one of the main ongoing challenges in real-time speech communication.
no code implementations • 18 Jun 2022 • Zhepei Wang, Ritwik Giri, Shrikant Venkataramani, Umut Isik, Jean-Marc Valin, Paris Smaragdis, Mike Goodwin, Arvindh Krishnaswamy
In this work, we propose Exformer, a time-domain architecture for target speaker extraction.
no code implementations • 16 Jun 2022 • Jean-Marc Valin, Ritwik Giri, Shrikant Venkataramani, Umut Isik, Arvindh Krishnaswamy
In real life, room effect, also known as room reverberation, and the present background noise degrade the quality of speech.
1 code implementation • 11 May 2022 • Jean-Marc Valin, Ahmed Mustafa, Christopher Montgomery, Timothy B. Terriberry, Michael Klingbeil, Paris Smaragdis, Arvindh Krishnaswamy
As deep speech enhancement algorithms have recently demonstrated capabilities greatly surpassing their traditional counterparts for suppressing noise, reverberation and echo, attention is turning to the problem of packet loss concealment (PLC).
no code implementations • 28 Mar 2022 • Siyuan Yuan, Zhepei Wang, Umut Isik, Ritwik Giri, Jean-Marc Valin, Michael M. Goodwin, Arvindh Krishnaswamy
Singing voice separation aims to separate music into vocals and accompaniment components.
1 code implementation • 23 Feb 2022 • Krishna Subramani, Jean-Marc Valin, Umut Isik, Paris Smaragdis, Arvindh Krishnaswamy
Neural vocoders have recently demonstrated high quality speech synthesis, but typically require a high computational complexity.
2 code implementations • 22 Feb 2022 • Jean-Marc Valin, Umut Isik, Paris Smaragdis, Arvindh Krishnaswamy
Neural speech synthesis models can synthesize high quality speech but typically require a high computational complexity to do so.
no code implementations • 15 Jun 2021 • Lukas Drude, Jahn Heymann, Andreas Schwarz, Jean-Marc Valin
Automatic speech recognition (ASR) in the cloud allows the use of larger models and more powerful multi-channel signal processing front-ends compared to on-device processing.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 8 Jun 2021 • Ritwik Giri, Shrikant Venkataramani, Jean-Marc Valin, Umut Isik, Arvindh Krishnaswamy
The presence of multiple talkers in the surrounding environment poses a difficult challenge for real-time speech communication systems considering the constraints on network size and complexity.
no code implementations • 16 Feb 2021 • Zhepei Wang, Ritwik Giri, Umut Isik, Jean-Marc Valin, Arvindh Krishnaswamy
Given a limited set of labeled data, we present a method to leverage a large volume of unlabeled data to improve the model's performance.
no code implementations • 12 Feb 2021 • Jonah Casebeer, Vinjai Vale, Umut Isik, Jean-Marc Valin, Ritwik Giri, Arvindh Krishnaswamy
Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech output.
no code implementations • 11 Aug 2020 • Umut Isik, Ritwik Giri, Neerad Phansalkar, Jean-Marc Valin, Karim Helwani, Arvindh Krishnaswamy
Neural network applications generally benefit from larger-sized models, but for current speech enhancement models, larger scale networks often suffer from decreased robustness to the variety of real-world use cases beyond what is encountered in training data.
2 code implementations • 28 Mar 2019 • Jean-Marc Valin, Jan Skoglund
We demonstrate that LPCNet operating at 1. 6 kb/s achieves significantly higher quality than MELP and that uncompressed LPCNet can exceed the quality of a waveform codec operating at low bitrate.
2 code implementations • 28 Oct 2018 • Jean-Marc Valin, Jan Skoglund
We demonstrate that LPCNet can achieve significantly higher quality than WaveRNN for the same network size and that high quality LPCNet speech synthesis is achievable with a complexity under 3 GFLOPS.
2 code implementations • 24 Sep 2017 • Jean-Marc Valin
Despite noise suppression being a mature area in signal processing, it remains highly dependent on fine tuning of estimator algorithms and parameters.
Sound Audio and Speech Processing