Search Results for author: Yang Ai

Found 13 papers, 4 papers with code

Voice Attribute Editing with Text Prompt

no code implementations13 Apr 2024 Zhengyan Sheng, Yang Ai, Li-Juan Liu, Jia Pan, Zhen-Hua Ling

This paper introduces a novel task: voice attribute editing with text prompt, with the goal of making relative modifications to voice attributes according to the actions described in the text prompt.

Attribute

Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction

no code implementations12 Jan 2024 Ye-Xin Lu, Yang Ai, Hui-Peng Du, Zhen-Hua Ling

Speech bandwidth extension (BWE) refers to widening the frequency bandwidth range of speech signals, enhancing the speech quality towards brighter and fuller.

Bandwidth Extension Generative Adversarial Network

A Dynamic Network for Efficient Point Cloud Registration

no code implementations5 Dec 2023 Yang Ai, Xi Yang

For the point cloud registration task, a significant challenge arises from non-overlapping points that consume extensive computational resources while negatively affecting registration accuracy.

Point Cloud Registration

APNet2: High-quality and High-efficiency Neural Vocoder with Direct Prediction of Amplitude and Phase Spectra

1 code implementation20 Nov 2023 Hui-Peng Du, Ye-Xin Lu, Yang Ai, Zhen-Hua Ling

APNet demonstrates the capability to generate synthesized speech of comparable quality to the HiFi-GAN vocoder but with a considerably improved inference speed.

Speech Synthesis

Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement

no code implementations19 Sep 2023 Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling

Specifically, we guide an audio-lip speech enhancement student model to learn from a pre-trained audio-lip-tongue speech enhancement teacher model, thus transferring tongue-related knowledge.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment

no code implementations18 Sep 2023 Zheng-Yan Sheng, Yang Ai, Yan-Nian Chen, Zhen-Hua Ling

This paper presents a novel task, zero-shot voice conversion based on face images (zero-shot FaceVC), which aims at converting the voice characteristics of an utterance from any source speaker to a newly coming target speaker, solely relying on a single face image of the target speaker.

Voice Conversion

Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement

1 code implementation17 Aug 2023 Ye-Xin Lu, Yang Ai, Zhen-Hua Ling

Compared to existing phase-aware speech enhancement methods, it further mitigates the compensation effect between the magnitude and phase by explicit phase estimation, elevating the perceptual quality of enhanced speech.

Bandwidth Extension Denoising +1

Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation

no code implementations24 May 2023 Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling

Audio-visual speech enhancement (AV-SE) aims to enhance degraded speech along with extra visual information such as lip videos, and has been shown to be more effective than audio-only speech enhancement.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra

1 code implementation23 May 2023 Ye-Xin Lu, Yang Ai, Zhen-Hua Ling

This paper proposes MP-SENet, a novel Speech Enhancement Network which directly denoises Magnitude and Phase spectra in parallel.

Denoising Speech Enhancement

Zero-shot personalized lip-to-speech synthesis with face image based voice control

no code implementations9 May 2023 Zheng-Yan Sheng, Yang Ai, Zhen-Hua Ling

In this paper, we propose a zero-shot personalized Lip2Speech synthesis method, in which face images control speaker identities.

Lip to Speech Synthesis Representation Learning +1

Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis

1 code implementation26 Apr 2023 Ye-Xin Lu, Yang Ai, Zhen-Hua Ling

This paper proposes a source-filter-based generative adversarial neural vocoder named SF-GAN, which achieves high-fidelity waveform generation from input acoustic features by introducing F0-based source excitation signals to a neural filter framework.

Speech Synthesis

Speech Reconstruction from Silent Tongue and Lip Articulation By Pseudo Target Generation and Domain Adversarial Training

no code implementations12 Apr 2023 Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling

This paper studies the task of speech reconstruction from ultrasound tongue images and optical lip videos recorded in a silent speaking mode, where people only activate their intra-oral and extra-oral articulators without producing sound.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling

no code implementations21 Jun 2019 Yuan-Hao Yi, Yang Ai, Zhen-Hua Ling, Li-Rong Dai

This paper presents a method of using autoregressive neural networks for the acoustic modeling of singing voice synthesis (SVS).

Singing Voice Synthesis

Cannot find the paper you are looking for? You can Submit a new open access paper.