no code implementations • 24 May 2023 • Mayank Kumar Singh, Naoya Takahashi, Onoe Naoyuki
Many existing works on voice conversion (VC) tasks use automatic speech recognition (ASR) models for ensuring linguistic consistency between source and converted samples.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 21 Feb 2023 • Nirmesh Shah, Mayank Kumar Singh, Naoya Takahashi, Naoyuki Onoe
Primary goal of an emotional voice conversion (EVC) system is to convert the emotion of a given speech signal from one style to another style without modifying the linguistic content of the signal.
no code implementations • 20 Oct 2022 • Naoya Takahashi, Mayank Kumar Singh, Yuki Mitsufuji
We then propose a two-stage training method called Robustify that train the one-shot SVC model in the first stage on clean data to ensure high-quality conversion, and introduces enhancement modules to the encoders of the model in the second stage to enhance the feature extraction from distorted singing voices.
no code implementations • 18 Jan 2021 • Naoya Takahashi, Mayank Kumar Singh, Yuki Mitsufuji
Conventional singing voice conversion (SVC) methods often suffer from operating in high-resolution audio owing to a high dimensionality of data.
no code implementations • 25 May 2020 • Mayank Kumar Singh, Sayan Banerjee, Shubhasis Chaudhuri
Text detection in scenes based on deep neural networks have shown promising results.
1 code implementation • 29 Nov 2019 • Naoya Takahashi, Mayank Kumar Singh, Sakya Basak, Parthasaarathy Sudarsanam, Sriram Ganapathy, Yuki Mitsufuji
Despite recent advances in voice separation methods, many challenges remain in realistic scenarios such as noisy recording and the limits of available data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3