no code implementations • 14 Mar 2023 • Xiangwen Deng, Yingshuang Zou, Yuanhao Cai, Chendong Zhao, Yang Liu, Zhifang Liu, Yuxiao Liu, Jiawei Zhou, Haoqian Wang
To solve this problem, we propose a novel method, namely Face-guided Dual Style Transfer (FDST).
no code implementations • 15 Oct 2022 • Chendong Zhao, Jianzong Wang, Xiaoyang Qu, Haoqian Wang, Jing Xiao
Unsupervised representation learning for speech audios attained impressive performances for speech recognition tasks, particularly when annotated speech is limited.
no code implementations • 30 Sep 2022 • Chendong Zhao, Jianzong Wang, Wen qi Wei, Xiaoyang Qu, Haoqian Wang, Jing Xiao
For multi-head attention in Transformer ASR, it is not easy to model monotonic alignments in different heads.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 26 May 2022 • Nan Zhang, Jianzong Wang, Zhenhou Hong, Chendong Zhao, Xiaoyang Qu, Jing Xiao
Therefore, we propose an approach to derive utterance-level speaker embeddings via a Transformer architecture that uses a novel loss function named diffluence loss to integrate the feature information of different Transformer layers.
1 code implementation • 26 May 2022 • Zhenhou Hong, Jianzong Wang, Xiaoyang Qu, Chendong Zhao, Wei Tao, Jing Xiao
However, Quantum Neural Network (QNN) running on low-qubit quantum devices would be difficult since it is based on Variational Quantum Circuit (VQC), which requires many qubits.
no code implementations • 24 May 2022 • Chendong Zhao, Jianzong Wang, Leilai Li, Xiaoyang Qu, Jing Xiao
In this work, we propose a novel task-adaptive module which is easy to plant into any metric-based few-shot learning frameworks.
no code implementations • 21 Feb 2022 • Chendong Zhao, Jianzong Wang, Xiaoyang Qu, Haoqian Wang, Jing Xiao
In this paper, we aim to evaluate and enhance the robustness of G2P models.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 9 Jul 2021 • Zhenhou Hong, Jianzong Wang, Xiaoyang Qu, Jie Liu, Chendong Zhao, Jing Xiao
Text to speech (TTS) is a crucial task for user interaction, but TTS model training relies on a sizable set of high-quality original datasets.