Search Results for author: Lantian Li

Found 45 papers, 5 papers with code

Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective

no code implementations29 Sep 2024 Chen Chen, Xiaolou Li, Zehua Liu, Lantian Li, Dong Wang

In the field of spoken language processing, audio-visual speech processing is receiving increasing research attention.

Audio-Visual Speech Recognition Lip Reading +3

Serialized Output Training by Learned Dominance

no code implementations4 Jul 2024 Ying Shi, Lantian Li, Shi Yin, Dong Wang, Jiqing Han

Further analysis shows that the serialization module identifies dominant speech components in a mixture by factors including loudness and gender, and orders speech components based on the dominance score.

Decoder speech-recognition +1

CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition Challenge

no code implementations14 Jun 2024 Chen Chen, Zehua Liu, Xiaolou Li, Lantian Li, Dong Wang

The first Chinese Continuous Visual Speech Recognition Challenge aimed to probe the performance of Large Vocabulary Continuous Visual Speech Recognition (LVC-VSR) on two tasks: (1) Single-speaker VSR for a particular speaker and (2) Multi-speaker VSR for a set of registered speakers.

speech-recognition Visual Speech Recognition

Adversarial Data Augmentation for Robust Speaker Verification

no code implementations5 Feb 2024 Zhenyu Zhou, Junhui Chen, Namin Wang, Lantian Li, Dong Wang

This adversarial learning empowers the network to generate speaker embeddings that can deceive the augmentation classifier, making the learned speaker embeddings more robust in the face of augmentation variations.

Data Augmentation Speaker Verification

A Glance is Enough: Extract Target Sentence By Looking at A keyword

no code implementations9 Oct 2023 Ying Shi, Dong Wang, Lantian Li, Jiqing Han

This paper investigates the possibility of extracting a target sentence from multi-talker speech using only a keyword as input.

Sentence

Spot keywords from very noisy and mixed speech

no code implementations28 May 2023 Ying Shi, Dong Wang, Lantian Li, Jiqing Han, Shi Yin

We propose a novel Mix Training (MT) strategy that encourages the model to discover low-energy keywords from noisy and mixed speech.

Data Augmentation Keyword Spotting

CN-Celeb-AV: A Multi-Genre Audio-Visual Dataset for Person Recognition

no code implementations25 May 2023 Lantian Li, Xiaolou Li, Haoyu Jiang, Chen Chen, Ruihai Hou, Dong Wang

A comprehensive study was conducted to compare CN-Celeb-AV with two popular public AVPR benchmark datasets, and the results demonstrated that CN-Celeb-AV is more in line with real-world scenarios and can be regarded as a new benchmark dataset for AVPR research.

Person Recognition

Ordered and Binary Speaker Embedding

no code implementations25 May 2023 Jiaying Wang, Xianglong Wang, Namin Wang, Lantian Li, Dong Wang

Modern speaker recognition systems represent utterances by embedding vectors.

Clustering Retrieval +2

CycleFlow: Purify Information Factors by Cycle Loss

no code implementations18 Oct 2021 Haoran Sun, Chen Chen, Lantian Li, Dong Wang

SpeechFlow is a powerful factorization model based on information bottleneck (IB), and its effectiveness has been reported by several studies.

Voice Conversion

Real Additive Margin Softmax for Speaker Verification

1 code implementation18 Oct 2021 Lantian Li, Ruiqian Nai, Dong Wang

The additive margin softmax (AM-Softmax) loss has delivered remarkable performance in speaker verification.

Speaker Verification

Can We Trust Deep Speech Prior?

no code implementations4 Nov 2020 Ying Shi, Haolin Chen, Zhiyuan Tang, Lantian Li, Dong Wang, Jiqing Han

Recently, speech enhancement (SE) based on deep speech prior has attracted much attention, such as the variational auto-encoder with non-negative matrix factorization (VAE-NMF) architecture.

Speech Enhancement

Deep Speaker Vector Normalization with Maximum Gaussianality Training

1 code implementation30 Oct 2020 Yunqi Cai, Lantian Li, Dong Wang, Andrew Abel

In this paper, we argue that this problem is largely attributed to the maximum-likelihood (ML) training criterion of the DNF model, which aims to maximize the likelihood of the observations but not necessarily improve the Gaussianality of the latent codes.

Speaker Recognition

Squeezing value of cross-domain labels: a decoupled scoring approach for speaker verification

no code implementations27 Oct 2020 Lantian Li, Yang Zhang, Jiawen Kang, Thomas Fang Zheng, Dong Wang

Domain mismatch often occurs in real applications and causes serious performance reduction on speaker verification systems.

Speaker Verification

Deep generative factorization for speech signal

no code implementations27 Oct 2020 Haoran Sun, Lantian Li, Yunqi Cai, Yang Zhang, Thomas Fang Zheng, Dong Wang

Various information factors are blended in speech signals, which forms the primary difficulty for most speech information processing tasks.

Domain-Invariant Speaker Vector Projection by Model-Agnostic Meta-Learning

1 code implementation25 May 2020 Jiawen Kang, Ruiqi Liu, Lantian Li, Yunqi Cai, Dong Wang, Thomas Fang Zheng

Domain generalization remains a critical problem for speaker recognition, even with the state-of-the-art architectures based on deep neural nets.

Audio and Speech Processing

Deep Normalization for Speaker Vectors

1 code implementation7 Apr 2020 Yunqi Cai, Lantian Li, Dong Wang, Andrew Abel

Deep speaker embedding has demonstrated state-of-the-art performance in speaker recognition tasks.

Speaker Recognition

CN-CELEB: a challenging Chinese speaker recognition dataset

2 code implementations31 Oct 2019 Yue Fan, Jiawen Kang, Lantian Li, Kaicheng Li, Haolin Chen, Sitong Cheng, Pengyuan Zhang, Ziya Zhou, Yunqi Cai, Dong Wang

These datasets tend to deliver over optimistic performance and do not meet the request of research on speaker recognition in unconstrained conditions.

Speaker Recognition

On Investigation of Unsupervised Speech Factorization Based on Normalization Flow

no code implementations29 Oct 2019 Haoran Sun, Yunqi Cai, Lantian Li, Dong Wang

Speech signals are complex composites of various information, including phonetic content, speaker traits, channel effect, etc.

VAE-based Domain Adaptation for Speaker Verification

no code implementations27 Aug 2019 Xueyi Wang, Lantian Li, Dong Wang

By enforcing the neural model to discriminate the speakers in the training set, deep speaker embedding (called `x-vectors`) can be derived from the hidden layers.

Domain Adaptation Speaker Verification

VAE-based regularization for deep speaker embedding

no code implementations7 Apr 2019 Yang Zhang, Lantian Li, Dong Wang

Deep speaker embedding has achieved state-of-the-art performance in speaker recognition.

Speaker Recognition

Gaussian-Constrained training for speaker verification

no code implementations8 Nov 2018 Lantian Li, Zhiyuan Tang, Ying Shi, Dong Wang

This paper proposes a Gaussian-constrained training approach that (1) discards the parametric classifier, and (2) enforces the distribution of the derived speaker vectors to be Gaussian.

Speaker Verification

Phonetic-attention scoring for deep speaker features in speaker verification

no code implementations8 Nov 2018 Lantian Li, Zhiyuan Tang, Ying Shi, Dong Wang

This score reflects the similarity of the two frames in phonetic content, and is used to weigh the contribution of this frame pair in the utterance-based scoring.

Machine Translation Speaker Verification +1

Deep factorization for speech signal

no code implementations27 Feb 2018 Lantian Li, Dong Wang, Yixiang Chen, Ying Shi, Zhiyuan Tang, Thomas Fang Zheng

Various informative factors mixed in speech signals, leading to great difficulty when decoding any of the factors.

Emotion Recognition Speaker Recognition

Full-info Training for Deep Speaker Feature Learning

no code implementations31 Oct 2017 Lantian Li, Zhiyuan Tang, Dong Wang, Thomas Fang Zheng

In recent studies, it has shown that speaker patterns can be learned from very short speech segments (e. g., 0. 3 seconds) by a carefully designed convolutional & time-delay deep neural network (CT-DNN) model.

Speaker Verification

Deep Speaker Verification: Do We Need End to End?

no code implementations22 Jun 2017 Dong Wang, Lantian Li, Zhiyuan Tang, Thomas Fang Zheng

This principle has recently been applied to several prototype research on speaker verification (SV), where the feature learning and classifier are learned together with an objective function that is consistent with the evaluation metric.

Speaker Verification

Cross-lingual Speaker Verification with Deep Feature Learning

no code implementations22 Jun 2017 Lantian Li, Dong Wang, Askar Rozi, Thomas Fang Zheng

The experiments demonstrated that the feature-based system outperformed the i-vector system with a large margin, particularly with language mismatch between enrollment and test.

Speaker Verification

Speaker Recognition with Cough, Laugh and "Wei"

no code implementations22 Jun 2017 Miao Zhang, Yixiang Chen, Lantian Li, Dong Wang

This paper proposes a speaker recognition (SRE) task with trivial speech events, such as cough and laugh.

Speaker Recognition

Deep Factorization for Speech Signal

no code implementations5 Jun 2017 Dong Wang, Lantian Li, Ying Shi, Yixiang Chen, Zhiyuan Tang

In this paper, we demonstrated that the speaker factor is also a short-time spectral pattern and can be largely identified with just a few frames using a simple deep neural network (DNN).

Emotion Recognition

Phonetic Temporal Neural Model for Language Identification

no code implementations9 May 2017 Zhiyuan Tang, Dong Wang, Yixiang Chen, Lantian Li, Andrew Abel

Deep neural models, particularly the LSTM-RNN model, have shown great potential for language identification (LID).

Language Identification

Phone-aware Neural Language Identification

no code implementations9 May 2017 Zhiyuan Tang, Dong Wang, Yixiang Chen, Ying Shi, Lantian Li

Pure acoustic neural models, particularly the LSTM-RNN model, have shown great potential in language identification (LID).

Language Identification

Collaborative Learning for Language and Speaker Recognition

no code implementations27 Sep 2016 Lantian Li, Zhiyuan Tang, Dong Wang, Andrew Abel, Yang Feng, Shiyue Zhang

This paper presents a unified model to perform language and speaker recognition simultaneously and altogether.

Speaker Recognition

Weakly Supervised PLDA Training

no code implementations27 Sep 2016 Lantian Li, Yixiang Chen, Dong Wang, Chenghui Zhao

PLDA is a popular normalization approach for the i-vector model, and it has delivered state-of-the-art performance in speaker verification.

Speaker Verification

AP16-OL7: A Multilingual Database for Oriental Languages and A Language Recognition Baseline

no code implementations27 Sep 2016 Dong Wang, Lantian Li, Difei Tang, Qing Chen

We present the AP16-OL7 database which was released as the training and test data for the oriental language recognition (OLR) challenge on APSIPA 2016.

Decision Making Based on Cohort Scores for Speaker Verification

no code implementations27 Sep 2016 Lantian Li, Renyu Wang, Gang Wang, Caixia Wang, Thomas Fang Zheng

In this paper, we propose a decision making approach based on multiple scores derived from a set of cohort GMMs (cohort scores).

Decision Making Speaker Verification

Local Training for PLDA in Speaker Verification

no code implementations27 Sep 2016 Chenghui Zhao, Lantian Li, Dong Wang, April Pu

PLDA is a popular normalization approach for the i-vector model, and it has delivered state-of-the-art performance in speaker verification.

Speaker Verification

Multi-task Recurrent Model for Speech and Speaker Recognition

no code implementations31 Mar 2016 Zhiyuan Tang, Lantian Li, Dong Wang

Although highly correlated, speech and speaker recognition have been regarded as two independent tasks and studied by two communities.

Speaker Recognition

System Combination for Short Utterance Speaker Recognition

no code implementations31 Mar 2016 Lantian Li, Dong Wang, Xiaodong Zhang, Thomas Fang Zheng, Panshi Jin

This paper presents a combination approach to the SUSR tasks with two phonetic-aware systems: one is the DNN-based i-vector system and the other is our recently proposed subregion-based GMM-UBM system.

Speaker Recognition

Max-margin Metric Learning for Speaker Recognition

no code implementations20 Oct 2015 Lantian Li, Dong Wang, Chao Xing, Thomas Fang Zheng

Probabilistic linear discriminant analysis (PLDA) is a popular normalization approach for the i-vector model, and has delivered state-of-the-art performance in speaker recognition.

Metric Learning Speaker Recognition

Binary Speaker Embedding

no code implementations20 Oct 2015 Lantian Li, Dong Wang, Chao Xing, Kaimin Yu, Thomas Fang Zheng

The popular i-vector model represents speakers as low-dimensional continuous vectors (i-vectors), and hence it is a way of continuous speaker embedding.

Binarization Speaker Verification

Improved Deep Speaker Feature Learning for Text-Dependent Speaker Recognition

no code implementations28 Jun 2015 Lantian Li, Yiye Lin, Zhiyong Zhang, Dong Wang

A deep learning approach has been proposed recently to derive speaker identifies (d-vector) by a deep neural network (DNN).

Dynamic Time Warping Speaker Recognition

Deep Speaker Vectors for Semi Text-independent Speaker Verification

no code implementations24 May 2015 Lantian Li, Dong Wang, Zhiyong Zhang, Thomas Fang Zheng

Recent research shows that deep neural networks (DNNs) can be used to extract deep speaker vectors (d-vectors) that preserve speaker characteristics and can be used in speaker verification.

Speaker Recognition Text-Dependent Speaker Verification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.