1 code implementation • 17 Sep 2023 • Qiuming Zhao, Guangzhi Sun, Chao Zhang, Mingxing Xu, Thomas Fang Zheng
Recent end-to-end automatic speech recognition (ASR) models have become increasingly larger, making them particularly challenging to be deployed on resource-constrained devices.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 19 May 2021 • Weiyi Zhang, Shuning Zhao, Le Liu, Jianmin Li, Xingliang Cheng, Thomas Fang Zheng, Xiaolin Hu
In authentication scenarios, applications of practical speaker verification systems usually require a person to read a dynamic authentication text.
Real-World Adversarial Attack Room Impulse Response (RIR) +3
no code implementations • 27 Oct 2020 • Haoran Sun, Lantian Li, Yunqi Cai, Yang Zhang, Thomas Fang Zheng, Dong Wang
Various information factors are blended in speech signals, which forms the primary difficulty for most speech information processing tasks.
no code implementations • 27 Oct 2020 • Lantian Li, Yang Zhang, Jiawen Kang, Thomas Fang Zheng, Dong Wang
Domain mismatch often occurs in real applications and causes serious performance reduction on speaker verification systems.
no code implementations • 15 Sep 2020 • Linlin Zheng, Jiakang Li, Meng Sun, Xiongwei Zhang, Thomas Fang Zheng
The proposed approach generalizes well to restore the disguise with nonlinear frequency warping in VTLN by reducing its EER from 34. 3% to 18. 5%.
1 code implementation • 25 May 2020 • Jiawen Kang, Ruiqi Liu, Lantian Li, Yunqi Cai, Dong Wang, Thomas Fang Zheng
Domain generalization remains a critical problem for speaker recognition, even with the state-of-the-art architectures based on deep neural nets.
Audio and Speech Processing
no code implementations • 27 Feb 2018 • Lantian Li, Dong Wang, Yixiang Chen, Ying Shi, Zhiyuan Tang, Thomas Fang Zheng
Various informative factors mixed in speech signals, leading to great difficulty when decoding any of the factors.
no code implementations • 31 Oct 2017 • Lantian Li, Zhiyuan Tang, Dong Wang, Thomas Fang Zheng
In recent studies, it has shown that speaker patterns can be learned from very short speech segments (e. g., 0. 3 seconds) by a carefully designed convolutional & time-delay deep neural network (CT-DNN) model.
2 code implementations • 4 Oct 2017 • Aodong Li, Shiyue Zhang, Dong Wang, Thomas Fang Zheng
Neural machine translation (NMT) has recently achieved impressive results.
no code implementations • 22 Jun 2017 • Dong Wang, Lantian Li, Zhiyuan Tang, Thomas Fang Zheng
This principle has recently been applied to several prototype research on speaker verification (SV), where the feature learning and classifier are learned together with an objective function that is consistent with the evaluation metric.
no code implementations • 22 Jun 2017 • Lantian Li, Dong Wang, Askar Rozi, Thomas Fang Zheng
The experiments demonstrated that the feature-based system outperformed the i-vector system with a large margin, particularly with language mismatch between enrollment and test.
no code implementations • 27 Sep 2016 • Lantian Li, Renyu Wang, Gang Wang, Caixia Wang, Thomas Fang Zheng
In this paper, we propose a decision making approach based on multiple scores derived from a set of cohort GMMs (cohort scores).
no code implementations • 31 Mar 2016 • Lantian Li, Dong Wang, Xiaodong Zhang, Thomas Fang Zheng, Panshi Jin
This paper presents a combination approach to the SUSR tasks with two phonetic-aware systems: one is the DNN-based i-vector system and the other is our recently proposed subregion-based GMM-UBM system.
no code implementations • 19 Nov 2015 • Dong Wang, Thomas Fang Zheng
Transfer learning is a vital technique that generalizes models trained for one setting or task to other settings or tasks.
no code implementations • 20 Oct 2015 • Lantian Li, Dong Wang, Chao Xing, Kaimin Yu, Thomas Fang Zheng
The popular i-vector model represents speakers as low-dimensional continuous vectors (i-vectors), and hence it is a way of continuous speaker embedding.
no code implementations • 20 Oct 2015 • Lantian Li, Dong Wang, Chao Xing, Thomas Fang Zheng
Probabilistic linear discriminant analysis (PLDA) is a popular normalization approach for the i-vector model, and has delivered state-of-the-art performance in speaker recognition.
no code implementations • 24 May 2015 • Lantian Li, Dong Wang, Zhiyong Zhang, Thomas Fang Zheng
Recent research shows that deep neural networks (DNNs) can be used to extract deep speaker vectors (d-vectors) that preserve speaker characteristics and can be used in speaker verification.
no code implementations • PACLIC 2015 • Miao Fan, Qiang Zhou, Thomas Fang Zheng
In this paper, we propose a new paradigm named distantly supervised entity linking (DSEL), in the sense that the disambiguated entities that belong to a huge knowledge repository (Freebase) are automatically aligned to the corresponding descriptive webpages (Wiki pages).
no code implementations • 10 May 2015 • Miao Fan, Qiang Zhou, Andrew Abel, Thomas Fang Zheng, Ralph Grishman
This paper contributes a novel embedding model which measures the probability of each belief $\langle h, r, t, m\rangle$ in a large-scale knowledge repository via simultaneously learning distributed representations for entities ($h$ and $t$), relations ($r$), and the words in relation mentions ($m$).
no code implementations • 7 Apr 2015 • Miao Fan, Qiang Zhou, Thomas Fang Zheng, Ralph Grishman
Traditional way of storing facts in triplets ({\it head\_entity, relation, tail\_entity}), abbreviated as ({\it h, r, t}), makes the knowledge intuitively displayed and easily acquired by mankind, but hardly computed or even reasoned by AI machines.
no code implementations • 27 Mar 2015 • Miao Fan, Qiang Zhou, Thomas Fang Zheng
This paper considers the problem of knowledge inference on large-scale imperfect repositories with incomplete coverage by means of embedding entities and relations at the first attempt.
no code implementations • 17 Nov 2014 • Miao Fan, Deli Zhao, Qiang Zhou, Zhiyuan Liu, Thomas Fang Zheng, Edward Y. Chang
The essence of distantly supervised relation extraction is that it is an incomplete multi-label classification problem with sparse and noisy features.