no code implementations • WASSA (ACL) 2022 • Hao Lin, Pradeep Nalluri, Lantian Li, Yifan Sun, Yongjun Zhang
We introduce new datasets from Twitter related to anti-Asian hate sentiment before and during the pandemic.
no code implementations • 29 Sep 2024 • Chen Chen, Xiaolou Li, Zehua Liu, Lantian Li, Dong Wang
In the field of spoken language processing, audio-visual speech processing is receiving increasing research attention.
no code implementations • 4 Jul 2024 • Ying Shi, Lantian Li, Shi Yin, Dong Wang, Jiqing Han
Further analysis shows that the serialization module identifies dominant speech components in a mixture by factors including loudness and gender, and orders speech components based on the dominance score.
no code implementations • 14 Jun 2024 • Chen Chen, Zehua Liu, Xiaolou Li, Lantian Li, Dong Wang
The first Chinese Continuous Visual Speech Recognition Challenge aimed to probe the performance of Large Vocabulary Continuous Visual Speech Recognition (LVC-VSR) on two tasks: (1) Single-speaker VSR for a particular speaker and (2) Multi-speaker VSR for a set of registered speakers.
no code implementations • 5 Feb 2024 • Zhenyu Zhou, Junhui Chen, Namin Wang, Lantian Li, Dong Wang
This adversarial learning empowers the network to generate speaker embeddings that can deceive the augmentation classifier, making the learned speaker embeddings more robust in the face of augmentation variations.
no code implementations • 9 Oct 2023 • Ying Shi, Dong Wang, Lantian Li, Jiqing Han
This paper investigates the possibility of extracting a target sentence from multi-talker speech using only a keyword as input.
no code implementations • 28 May 2023 • Ying Shi, Dong Wang, Lantian Li, Jiqing Han, Shi Yin
We propose a novel Mix Training (MT) strategy that encourages the model to discover low-energy keywords from noisy and mixed speech.
no code implementations • 25 May 2023 • Lantian Li, Xiaolou Li, Haoyu Jiang, Chen Chen, Ruihai Hou, Dong Wang
A comprehensive study was conducted to compare CN-Celeb-AV with two popular public AVPR benchmark datasets, and the results demonstrated that CN-Celeb-AV is more in line with real-world scenarios and can be regarded as a new benchmark dataset for AVPR research.
no code implementations • 25 May 2023 • Jiaying Wang, Xianglong Wang, Namin Wang, Lantian Li, Dong Wang
Modern speaker recognition systems represent utterances by embedding vectors.
no code implementations • 18 Oct 2021 • Haoran Sun, Chen Chen, Lantian Li, Dong Wang
SpeechFlow is a powerful factorization model based on information bottleneck (IB), and its effectiveness has been reported by several studies.
1 code implementation • 18 Oct 2021 • Lantian Li, Ruiqian Nai, Dong Wang
The additive margin softmax (AM-Softmax) loss has delivered remarkable performance in speaker verification.
no code implementations • 4 Nov 2020 • Ying Shi, Haolin Chen, Zhiyuan Tang, Lantian Li, Dong Wang, Jiqing Han
Recently, speech enhancement (SE) based on deep speech prior has attracted much attention, such as the variational auto-encoder with non-negative matrix factorization (VAE-NMF) architecture.
1 code implementation • 30 Oct 2020 • Yunqi Cai, Lantian Li, Dong Wang, Andrew Abel
In this paper, we argue that this problem is largely attributed to the maximum-likelihood (ML) training criterion of the DNF model, which aims to maximize the likelihood of the observations but not necessarily improve the Gaussianality of the latent codes.
no code implementations • 27 Oct 2020 • Lantian Li, Yang Zhang, Jiawen Kang, Thomas Fang Zheng, Dong Wang
Domain mismatch often occurs in real applications and causes serious performance reduction on speaker verification systems.
no code implementations • 27 Oct 2020 • Haoran Sun, Lantian Li, Yunqi Cai, Yang Zhang, Thomas Fang Zheng, Dong Wang
Various information factors are blended in speech signals, which forms the primary difficulty for most speech information processing tasks.
1 code implementation • 25 May 2020 • Jiawen Kang, Ruiqi Liu, Lantian Li, Yunqi Cai, Dong Wang, Thomas Fang Zheng
Domain generalization remains a critical problem for speaker recognition, even with the state-of-the-art architectures based on deep neural nets.
Audio and Speech Processing
1 code implementation • 7 Apr 2020 • Yunqi Cai, Lantian Li, Dong Wang, Andrew Abel
Deep speaker embedding has demonstrated state-of-the-art performance in speaker recognition tasks.
2 code implementations • 31 Oct 2019 • Yue Fan, Jiawen Kang, Lantian Li, Kaicheng Li, Haolin Chen, Sitong Cheng, Pengyuan Zhang, Ziya Zhou, Yunqi Cai, Dong Wang
These datasets tend to deliver over optimistic performance and do not meet the request of research on speaker recognition in unconstrained conditions.
no code implementations • 29 Oct 2019 • Haoran Sun, Yunqi Cai, Lantian Li, Dong Wang
Speech signals are complex composites of various information, including phonetic content, speaker traits, channel effect, etc.
no code implementations • 27 Aug 2019 • Xueyi Wang, Lantian Li, Dong Wang
By enforcing the neural model to discriminate the speakers in the training set, deep speaker embedding (called `x-vectors`) can be derived from the hidden layers.
no code implementations • 7 Apr 2019 • Yang Zhang, Lantian Li, Dong Wang
Deep speaker embedding has achieved state-of-the-art performance in speaker recognition.
no code implementations • 8 Nov 2018 • Lantian Li, Zhiyuan Tang, Ying Shi, Dong Wang
This paper proposes a Gaussian-constrained training approach that (1) discards the parametric classifier, and (2) enforces the distribution of the derived speaker vectors to be Gaussian.
no code implementations • 8 Nov 2018 • Lantian Li, Zhiyuan Tang, Ying Shi, Dong Wang
This score reflects the similarity of the two frames in phonetic content, and is used to weigh the contribution of this frame pair in the utterance-based scoring.
no code implementations • 27 Feb 2018 • Lantian Li, Dong Wang, Yixiang Chen, Ying Shi, Zhiyuan Tang, Thomas Fang Zheng
Various informative factors mixed in speech signals, leading to great difficulty when decoding any of the factors.
no code implementations • 15 Nov 2017 • Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhiyuan Tang, Haisheng Dai, Dong Wang
Trivial events are ubiquitous in human to human conversations, e. g., cough, laugh and sniff.
no code implementations • 31 Oct 2017 • Lantian Li, Zhiyuan Tang, Dong Wang, Thomas Fang Zheng
In recent studies, it has shown that speaker patterns can be learned from very short speech segments (e. g., 0. 3 seconds) by a carefully designed convolutional & time-delay deep neural network (CT-DNN) model.
no code implementations • 22 Jun 2017 • Dong Wang, Lantian Li, Zhiyuan Tang, Thomas Fang Zheng
This principle has recently been applied to several prototype research on speaker verification (SV), where the feature learning and classifier are learned together with an objective function that is consistent with the evaluation metric.
no code implementations • 22 Jun 2017 • Lantian Li, Dong Wang, Askar Rozi, Thomas Fang Zheng
The experiments demonstrated that the feature-based system outperformed the i-vector system with a large margin, particularly with language mismatch between enrollment and test.
no code implementations • 22 Jun 2017 • Miao Zhang, Yixiang Chen, Lantian Li, Dong Wang
This paper proposes a speaker recognition (SRE) task with trivial speech events, such as cough and laugh.
no code implementations • 5 Jun 2017 • Dong Wang, Lantian Li, Ying Shi, Yixiang Chen, Zhiyuan Tang
In this paper, we demonstrated that the speaker factor is also a short-time spectral pattern and can be largely identified with just a few frames using a simple deep neural network (DNN).
no code implementations • 10 May 2017 • Lantian Li, Yixiang Chen, Ying Shi, Zhiyuan Tang, Dong Wang
Recently deep neural networks (DNNs) have been used to learn speaker features.
no code implementations • 9 May 2017 • Zhiyuan Tang, Dong Wang, Yixiang Chen, Lantian Li, Andrew Abel
Deep neural models, particularly the LSTM-RNN model, have shown great potential for language identification (LID).
no code implementations • 9 May 2017 • Zhiyuan Tang, Dong Wang, Yixiang Chen, Ying Shi, Lantian Li
Pure acoustic neural models, particularly the LSTM-RNN model, have shown great potential in language identification (LID).
no code implementations • 27 Sep 2016 • Lantian Li, Zhiyuan Tang, Dong Wang, Andrew Abel, Yang Feng, Shiyue Zhang
This paper presents a unified model to perform language and speaker recognition simultaneously and altogether.
no code implementations • 27 Sep 2016 • Lantian Li, Yixiang Chen, Dong Wang, Chenghui Zhao
PLDA is a popular normalization approach for the i-vector model, and it has delivered state-of-the-art performance in speaker verification.
no code implementations • 27 Sep 2016 • Zhiyuan Tang, Lantian Li, Dong Wang
Research on multilingual speech recognition remains attractive yet challenging.
no code implementations • 27 Sep 2016 • Dong Wang, Lantian Li, Difei Tang, Qing Chen
We present the AP16-OL7 database which was released as the training and test data for the oriental language recognition (OLR) challenge on APSIPA 2016.
no code implementations • 27 Sep 2016 • Lantian Li, Renyu Wang, Gang Wang, Caixia Wang, Thomas Fang Zheng
In this paper, we propose a decision making approach based on multiple scores derived from a set of cohort GMMs (cohort scores).
no code implementations • 27 Sep 2016 • Chenghui Zhao, Lantian Li, Dong Wang, April Pu
PLDA is a popular normalization approach for the i-vector model, and it has delivered state-of-the-art performance in speaker verification.
no code implementations • 31 Mar 2016 • Zhiyuan Tang, Lantian Li, Dong Wang
Although highly correlated, speech and speaker recognition have been regarded as two independent tasks and studied by two communities.
no code implementations • 31 Mar 2016 • Lantian Li, Dong Wang, Xiaodong Zhang, Thomas Fang Zheng, Panshi Jin
This paper presents a combination approach to the SUSR tasks with two phonetic-aware systems: one is the DNN-based i-vector system and the other is our recently proposed subregion-based GMM-UBM system.
no code implementations • 20 Oct 2015 • Lantian Li, Dong Wang, Chao Xing, Thomas Fang Zheng
Probabilistic linear discriminant analysis (PLDA) is a popular normalization approach for the i-vector model, and has delivered state-of-the-art performance in speaker recognition.
no code implementations • 20 Oct 2015 • Lantian Li, Dong Wang, Chao Xing, Kaimin Yu, Thomas Fang Zheng
The popular i-vector model represents speakers as low-dimensional continuous vectors (i-vectors), and hence it is a way of continuous speaker embedding.
no code implementations • 28 Jun 2015 • Lantian Li, Yiye Lin, Zhiyong Zhang, Dong Wang
A deep learning approach has been proposed recently to derive speaker identifies (d-vector) by a deep neural network (DNN).
no code implementations • 24 May 2015 • Lantian Li, Dong Wang, Zhiyong Zhang, Thomas Fang Zheng
Recent research shows that deep neural networks (DNNs) can be used to extract deep speaker vectors (d-vectors) that preserve speaker characteristics and can be used in speaker verification.