Search Results for author: Jing-Xuan Zhang

Found 8 papers, 1 papers with code

Non-Parallel Sequence-to-Sequence Voice Conversion with Disentangled Linguistic and Speaker Representations

1 code implementation • 25 Jun 2019 • Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai

In this method, disentangled linguistic and speaker representations are extracted from acoustic features, and voice conversion is achieved by preserving the linguistic representations of source utterances while replacing the speaker representations with the target ones.

Audio and Speech Processing Sound

246

Paper
Code

Deep Learning in Software Engineering

no code implementations • 13 May 2018 • Xiaochen Li, He Jiang, Zhilei Ren, Ge Li, Jing-Xuan Zhang

To answer these questions, we conduct a bibliography analysis on 98 research papers in SE that use deep learning techniques.

Software Engineering

Paper
Add Code

Forward Attention in Sequence-to-sequence Acoustic Modelling for Speech Synthesis

no code implementations • 18 Jul 2018 • Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai

This paper proposes a forward attention method for the sequenceto- sequence acoustic modeling of speech synthesis.

Acoustic Modelling Speech Synthesis

Paper
Add Code

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

no code implementations • 5 Nov 2019 • Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Hector Delgado, Andreas Nautsch, Nicholas Evans, Md Sahidullah, Ville Vestman, Tomi Kinnunen, Kong Aik Lee, Lauri Juvela, Paavo Alku, Yu-Huai Peng, Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Sebastien Le Maguer, Markus Becker, Fergus Henderson, Rob Clark, Yu Zhang, Quan Wang, Ye Jia, Kai Onuma, Koji Mushika, Takashi Kaneda, Yuan Jiang, Li-Juan Liu, Yi-Chiao Wu, Wen-Chin Huang, Tomoki Toda, Kou Tanaka, Hirokazu Kameoka, Ingmar Steiner, Driss Matrouf, Jean-Francois Bonastre, Avashna Govender, Srikanth Ronanki, Jing-Xuan Zhang, Zhen-Hua Ling

Spoofing attacks within a logical access (LA) scenario are generated with the latest speech synthesis and voice conversion technologies, including state-of-the-art neural acoustic and waveform model techniques.

Person Recognition Speaker Verification +2

Paper
Add Code

TaL: a synchronised multi-speaker corpus of ultrasound tongue imaging, audio, and lip videos

no code implementations • 19 Nov 2020 • Manuel Sam Ribeiro, Jennifer Sanger, Jing-Xuan Zhang, Aciel Eshky, Alan Wrench, Korin Richmond, Steve Renals

We present the Tongue and Lips corpus (TaL), a multi-speaker corpus of audio, ultrasound tongue imaging, and lip videos.

speech-recognition Speech Recognition +1

Paper
Add Code

Voice Conversion by Cascading Automatic Speech Recognition and Text-to-Speech Synthesis with Prosody Transfer

no code implementations • 3 Sep 2020 • Jing-Xuan Zhang, Li-Juan Liu, Yan-Nian Chen, Ya-Jun Hu, Yuan Jiang, Zhen-Hua Ling, Li-Rong Dai

In this paper, we present a ASR-TTS method for voice conversion, which used iFLYTEK ASR engine to transcribe the source speech into text and a Transformer TTS model with WaveNet vocoder to synthesize the converted speech from the decoded text.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Is Lip Region-of-Interest Sufficient for Lipreading?

no code implementations • 28 May 2022 • Jing-Xuan Zhang, Gen-Shun Wan, Jia Pan

In this work, we propose to adopt the entire face for lipreading with self-supervised learning.

Lipreading Self-Supervised Learning +2

Paper
Add Code

Self-Supervised Audio-Visual Speech Representations Learning By Multimodal Self-Distillation

no code implementations • 6 Dec 2022 • Jing-Xuan Zhang, Genshun Wan, Zhen-Hua Ling, Jia Pan, Jianqing Gao, Cong Liu

AV2vec has a student and a teacher module, in which the student performs a masked latent feature regression task using the multimodal target features generated online by the teacher.

Language Modelling

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.