Search Results for author: Ziyue Jiang

Found 16 papers, 5 papers with code

Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models

1 code implementation • 19 Feb 2024 • Shengpeng Ji, Minghui Fang, Ziyue Jiang, Rongjie Huang, Jialung Zuo, Shulei Wang, Zhou Zhao

Furthermore, we also validate the efficiency of the Language-Codec on downstream speech language models.

Audio Compression Audio Generation +1

139

Paper
Code

MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech

no code implementations • 14 Feb 2024 • Shengpeng Ji, Ziyue Jiang, Hanting Wang, Jialong Zuo, Zhou Zhao

Moreover, to bridge the gap between text and speech, we introduce a high-level probabilistic mask that simulates the progression of information flow from less to more during speech generation.

Voice Cloning

Paper
Add Code

AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension

no code implementations • 12 Feb 2024 • Qian Yang, Jin Xu, Wenrui Liu, Yunfei Chu, Ziyue Jiang, Xiaohuan Zhou, Yichong Leng, YuanJun Lv, Zhou Zhao, Chang Zhou, Jingren Zhou

By revealing the limitations of existing LALMs through evaluation results, AIR-Bench can provide insights into the direction of future research.

2k Automatic Speech Recognition +4

Paper
Add Code

Zero-shot Explainable Mental Health Analysis on Social Media by Incorporating Mental Scales

no code implementations • 9 Feb 2024 • Wenyu Li, Yinuo Zhu, Xin Lin, Ming Li, Ziyue Jiang, Ziqian Zeng

Traditional discriminative approaches in mental health analysis are known for their strong capacity but lack interpretability and demand large-scale annotated data.

Paper
Add Code

Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis

1 code implementation • 16 Jan 2024 • Zhenhui Ye, Tianyun Zhong, Yi Ren, Jiaqi Yang, Weichuang Li, Jiawei Huang, Ziyue Jiang, Jinzheng He, Rongjie Huang, Jinglin Liu, Chen Zhang, Xiang Yin, Zejun Ma, Zhou Zhao

One-shot 3D talking portrait generation aims to reconstruct a 3D avatar from an unseen image, and then animate it with a reference video or audio to generate a talking portrait video.

3D Reconstruction Super-Resolution +1

574

Paper
Code

FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody Consistency

1 code implementation • 21 Sep 2023 • Rui Liu, Jiatian Xi, Ziyue Jiang, Haizhou Li

Text-based speech editing (TSE) techniques are designed to enable users to edit the output audio by modifying the input text transcript instead of the audio itself.

Paper
Code

TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models

no code implementations • 28 Aug 2023 • Shengpeng Ji, Jialong Zuo, Minghui Fang, Ziyue Jiang, Feiyang Chen, Xinyu Duan, Baoxing Huai, Zhou Zhao

The dataset comprises 236, 220 pairs of style prompt in natural text descriptions with five style factors and corresponding speech samples.

Language Modelling

Paper
Add Code

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis

no code implementations • 14 Jul 2023 • Ziyue Jiang, Jinglin Liu, Yi Ren, Jinzheng He, Zhenhui Ye, Shengpeng Ji, Qian Yang, Chen Zhang, Pengfei Wei, Chunfeng Wang, Xiang Yin, Zejun Ma, Zhou Zhao

However, the prompting mechanisms of zero-shot TTS still face challenges in the following aspects: 1) previous works of zero-shot TTS are typically trained with single-sentence prompts, which significantly restricts their performance when the data is relatively sufficient during the inference stage.

In-Context Learning Language Modelling +3

Paper
Add Code

Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis

no code implementations • 6 Jun 2023 • Zhenhui Ye, Ziyue Jiang, Yi Ren, Jinglin Liu, Chen Zhang, Xiang Yin, Zejun Ma, Zhou Zhao

We are interested in a novel task, namely low-resource text-to-talking avatar.

Neural Rendering Video Generation +1

Paper
Add Code

Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias

no code implementations • 6 Jun 2023 • Ziyue Jiang, Yi Ren, Zhenhui Ye, Jinglin Liu, Chen Zhang, Qian Yang, Shengpeng Ji, Rongjie Huang, Chunfeng Wang, Xiang Yin, Zejun Ma, Zhou Zhao

3) We further use a VQGAN-based acoustic model to generate the spectrogram and a latent code language model to fit the distribution of prosody, since prosody changes quickly over time in a sentence, and language models can capture both local and long-range dependencies.

Attribute Inductive Bias +3

Paper
Add Code

Make-A-Voice: Unified Voice Synthesis With Discrete Representation

no code implementations • 30 May 2023 • Rongjie Huang, Chunlei Zhang, Yongqi Wang, Dongchao Yang, Luping Liu, Zhenhui Ye, Ziyue Jiang, Chao Weng, Zhou Zhao, Dong Yu

Various applications of voice synthesis have been developed independently despite the fact that they generate "voice" as output in common.

Singing Voice Synthesis Voice Conversion

Paper
Add Code

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation

no code implementations • 1 May 2023 • Zhenhui Ye, Jinzheng He, Ziyue Jiang, Rongjie Huang, Jiawei Huang, Jinglin Liu, Yi Ren, Xiang Yin, Zejun Ma, Zhou Zhao

Recently, neural radiance field (NeRF) has become a popular rendering technique in this field since it could achieve high-fidelity and 3D-consistent talking face generation with a few-minute-long training video.

motion prediction Talking Face Generation

Paper
Add Code

GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis

1 code implementation • 31 Jan 2023 • Zhenhui Ye, Ziyue Jiang, Yi Ren, Jinglin Liu, Jinzheng He, Zhou Zhao

Generating photo-realistic video portrait with arbitrary speech audio is a crucial problem in film-making and virtual reality.

Lip Reading Talking Face Generation +1

2,318

Paper
Code

Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech

1 code implementation • 5 Jun 2022 • Ziyue Jiang, Zhe Su, Zhou Zhao, Qian Yang, Yi Ren, Jinglin Liu, Zhenhui Ye

This paper tackles the polyphone disambiguation problem from a concise and novel perspective: we propose Dict-TTS, a semantic-aware generative text-to-speech model with an online website dictionary (the existing prior information in the natural language).

Polyphone disambiguation

130

Paper
Code

FedSpeech: Federated Text-to-Speech with Continual Learning

no code implementations • 14 Oct 2021 • Ziyue Jiang, Yi Ren, Ming Lei, Zhou Zhao

Federated learning enables collaborative training of machine learning models under strict privacy restrictions and federated text-to-speech aims to synthesize natural speech of multiple users with a few audio training samples stored in their devices locally.

Continual Learning Federated Learning

Paper
Add Code

PhaseFool: Phase-oriented Audio Adversarial Examples via Energy Dissipation

no code implementations • 29 Sep 2021 • Ziyue Jiang, Yi Ren, Zhou Zhao

In this work, we propose a novel phase-oriented algorithm named PhaseFool that can efficiently construct imperceptible audio adversarial examples with energy dissipation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.