Search Results for author: Yang Ai

Found 13 papers, 4 papers with code

Voice Attribute Editing with Text Prompt

no code implementations • 13 Apr 2024 • Zhengyan Sheng, Yang Ai, Li-Juan Liu, Jia Pan, Zhen-Hua Ling

This paper introduces a novel task: voice attribute editing with text prompt, with the goal of making relative modifications to voice attributes according to the actions described in the text prompt.

Attribute

Paper
Add Code

Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction

no code implementations • 12 Jan 2024 • Ye-Xin Lu, Yang Ai, Hui-Peng Du, Zhen-Hua Ling

Speech bandwidth extension (BWE) refers to widening the frequency bandwidth range of speech signals, enhancing the speech quality towards brighter and fuller.

Bandwidth Extension Generative Adversarial Network

Paper
Add Code

A Dynamic Network for Efficient Point Cloud Registration

no code implementations • 5 Dec 2023 • Yang Ai, Xi Yang

For the point cloud registration task, a significant challenge arises from non-overlapping points that consume extensive computational resources while negatively affecting registration accuracy.

Point Cloud Registration

Paper
Add Code

APNet2: High-quality and High-efficiency Neural Vocoder with Direct Prediction of Amplitude and Phase Spectra

1 code implementation • 20 Nov 2023 • Hui-Peng Du, Ye-Xin Lu, Yang Ai, Zhen-Hua Ling

APNet demonstrates the capability to generate synthesized speech of comparable quality to the HiFi-GAN vocoder but with a considerably improved inference speed.

Speech Synthesis

Paper
Code

Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement

no code implementations • 19 Sep 2023 • Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling

Specifically, we guide an audio-lip speech enhancement student model to learn from a pre-trained audio-lip-tongue speech enhancement teacher model, thus transferring tongue-related knowledge.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment

no code implementations • 18 Sep 2023 • Zheng-Yan Sheng, Yang Ai, Yan-Nian Chen, Zhen-Hua Ling

This paper presents a novel task, zero-shot voice conversion based on face images (zero-shot FaceVC), which aims at converting the voice characteristics of an utterance from any source speaker to a newly coming target speaker, solely relying on a single face image of the target speaker.

Voice Conversion

Paper
Add Code

Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement

1 code implementation • 17 Aug 2023 • Ye-Xin Lu, Yang Ai, Zhen-Hua Ling

Compared to existing phase-aware speech enhancement methods, it further mitigates the compensation effect between the magnitude and phase by explicit phase estimation, elevating the perceptual quality of enhanced speech.

Ranked #1 on Speech Enhancement on Deep Noise Suppression (DNS) Challenge

Bandwidth Extension Denoising +1

180

Paper
Code

Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation

no code implementations • 24 May 2023 • Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling

Audio-visual speech enhancement (AV-SE) aims to enhance degraded speech along with extra visual information such as lip videos, and has been shown to be more effective than audio-only speech enhancement.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra

1 code implementation • 23 May 2023 • Ye-Xin Lu, Yang Ai, Zhen-Hua Ling

This paper proposes MP-SENet, a novel Speech Enhancement Network which directly denoises Magnitude and Phase spectra in parallel.

Denoising Speech Enhancement

180

Paper
Code

Zero-shot personalized lip-to-speech synthesis with face image based voice control

no code implementations • 9 May 2023 • Zheng-Yan Sheng, Yang Ai, Zhen-Hua Ling

In this paper, we propose a zero-shot personalized Lip2Speech synthesis method, in which face images control speaker identities.

Lip to Speech Synthesis Representation Learning +1

Paper
Add Code

Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis

1 code implementation • 26 Apr 2023 • Ye-Xin Lu, Yang Ai, Zhen-Hua Ling

This paper proposes a source-filter-based generative adversarial neural vocoder named SF-GAN, which achieves high-fidelity waveform generation from input acoustic features by introducing F0-based source excitation signals to a neural filter framework.

Speech Synthesis

180

Paper
Code

Speech Reconstruction from Silent Tongue and Lip Articulation By Pseudo Target Generation and Domain Adversarial Training

no code implementations • 12 Apr 2023 • Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling

This paper studies the task of speech reconstruction from ultrasound tongue images and optical lip videos recorded in a silent speaking mode, where people only activate their intra-oral and extra-oral articulators without producing sound.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling

no code implementations • 21 Jun 2019 • Yuan-Hao Yi, Yang Ai, Zhen-Hua Ling, Li-Rong Dai

This paper presents a method of using autoregressive neural networks for the acoustic modeling of singing voice synthesis (SVS).

Singing Voice Synthesis

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.