no code implementations • 13 Apr 2024 • Zhengyan Sheng, Yang Ai, Li-Juan Liu, Jia Pan, Zhen-Hua Ling
This paper introduces a novel task: voice attribute editing with text prompt, with the goal of making relative modifications to voice attributes according to the actions described in the text prompt.
no code implementations • 12 Jan 2024 • Ye-Xin Lu, Yang Ai, Hui-Peng Du, Zhen-Hua Ling
Speech bandwidth extension (BWE) refers to widening the frequency bandwidth range of speech signals, enhancing the speech quality towards brighter and fuller.
no code implementations • 5 Dec 2023 • Yang Ai, Xi Yang
For the point cloud registration task, a significant challenge arises from non-overlapping points that consume extensive computational resources while negatively affecting registration accuracy.
1 code implementation • 20 Nov 2023 • Hui-Peng Du, Ye-Xin Lu, Yang Ai, Zhen-Hua Ling
APNet demonstrates the capability to generate synthesized speech of comparable quality to the HiFi-GAN vocoder but with a considerably improved inference speed.
no code implementations • 19 Sep 2023 • Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling
Specifically, we guide an audio-lip speech enhancement student model to learn from a pre-trained audio-lip-tongue speech enhancement teacher model, thus transferring tongue-related knowledge.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 18 Sep 2023 • Zheng-Yan Sheng, Yang Ai, Yan-Nian Chen, Zhen-Hua Ling
This paper presents a novel task, zero-shot voice conversion based on face images (zero-shot FaceVC), which aims at converting the voice characteristics of an utterance from any source speaker to a newly coming target speaker, solely relying on a single face image of the target speaker.
1 code implementation • 17 Aug 2023 • Ye-Xin Lu, Yang Ai, Zhen-Hua Ling
Compared to existing phase-aware speech enhancement methods, it further mitigates the compensation effect between the magnitude and phase by explicit phase estimation, elevating the perceptual quality of enhanced speech.
no code implementations • 24 May 2023 • Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling
Audio-visual speech enhancement (AV-SE) aims to enhance degraded speech along with extra visual information such as lip videos, and has been shown to be more effective than audio-only speech enhancement.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 23 May 2023 • Ye-Xin Lu, Yang Ai, Zhen-Hua Ling
This paper proposes MP-SENet, a novel Speech Enhancement Network which directly denoises Magnitude and Phase spectra in parallel.
no code implementations • 9 May 2023 • Zheng-Yan Sheng, Yang Ai, Zhen-Hua Ling
In this paper, we propose a zero-shot personalized Lip2Speech synthesis method, in which face images control speaker identities.
1 code implementation • 26 Apr 2023 • Ye-Xin Lu, Yang Ai, Zhen-Hua Ling
This paper proposes a source-filter-based generative adversarial neural vocoder named SF-GAN, which achieves high-fidelity waveform generation from input acoustic features by introducing F0-based source excitation signals to a neural filter framework.
no code implementations • 12 Apr 2023 • Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling
This paper studies the task of speech reconstruction from ultrasound tongue images and optical lip videos recorded in a silent speaking mode, where people only activate their intra-oral and extra-oral articulators without producing sound.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 21 Jun 2019 • Yuan-Hao Yi, Yang Ai, Zhen-Hua Ling, Li-Rong Dai
This paper presents a method of using autoregressive neural networks for the acoustic modeling of singing voice synthesis (SVS).