Search Results for author: Xiaohai Tian

Found 9 papers, 0 papers with code

CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing

no code implementations22 Jan 2024 Xianghu Yue, Xiaohai Tian, Lu Lu, Malu Zhang, Zhizheng Wu, Haizhou Li

To bridge the gap between modalities, CoAVT employs a query encoder, which contains a set of learnable query embeddings, and extracts the most informative audiovisual features of the corresponding text.

AudioCaps Audio-Visual Synchronization +4

Phonetic and Prosody-aware Self-supervised Learning Approach for Non-native Fluency Scoring

no code implementations19 May 2023 Kaiqi Fu, Shaojun Gao, Shuju Shi, Xiaohai Tian, Wei Li, Zejun Ma

Specifically, we first pre-train the model using a reconstruction loss function, by masking phones and their durations jointly on a large amount of unlabeled speech and text prompts.

Self-Supervised Learning

Leveraging phone-level linguistic-acoustic similarity for utterance-level pronunciation scoring

no code implementations21 Feb 2023 Wei Liu, Kaiqi Fu, Xiaohai Tian, Shuju Shi, Wei Li, Zejun Ma, Tan Lee

Recent studies on pronunciation scoring have explored the effect of introducing phone embeddings as reference pronunciation, but mostly in an implicit manner, i. e., addition or concatenation of reference phone embedding and actual pronunciation of the target phone as the phone-level pronunciation quality representation.

An ASR-free Fluency Scoring Approach with Self-Supervised Learning

no code implementations20 Feb 2023 Wei Liu, Kaiqi Fu, Xiaohai Tian, Shuju Shi, Wei Li, Zejun Ma, Tan Lee

A typical fluency scoring system generally relies on an automatic speech recognition (ASR) system to obtain time stamps in input speech for either the subsequent calculation of fluency-related features or directly modeling speech fluency with an end-to-end approach.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

TTS-Guided Training for Accent Conversion Without Parallel Data

no code implementations20 Dec 2022 Yi Zhou, Zhizheng Wu, Mingyang Zhang, Xiaohai Tian, Haizhou Li

Specifically, a text-to-speech (TTS) system is first pretrained with target-accented speech data.

The Multi-speaker Multi-style Voice Cloning Challenge 2021

no code implementations5 Apr 2021 Qicong Xie, Xiaohai Tian, Guanghou Liu, Kun Song, Lei Xie, Zhiyong Wu, Hai Li, Song Shi, Haizhou Li, Fen Hong, Hui Bu, Xin Xu

The challenge consists of two tracks, namely few-shot track and one-shot track, where the participants are required to clone multiple target voices with 100 and 5 samples respectively.

Benchmarking Voice Cloning

Spoofing detection under noisy conditions: a preliminary investigation and an initial database

no code implementations9 Feb 2016 Xiaohai Tian, Zhizheng Wu, Xiong Xiao, Eng Siong Chng, Haizhou Li

To simulate the real-life scenarios, we perform a preliminary investigation of spoofing detection under additive noisy conditions, and also describe an initial database for this task.

Speaker Verification

A Waveform Representation Framework for High-quality Statistical Parametric Speech Synthesis

no code implementations6 Oct 2015 Bo Fan, Siu Wa Lee, Xiaohai Tian, Lei Xie, Minghui Dong

State-of-the-art statistical parametric speech synthesis (SPSS) generally uses a vocoder to represent speech signals and parameterize them into features for subsequent modeling.

Speech Synthesis Vocal Bursts Intensity Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.