Search Results for author: Kong Aik Lee

Found 45 papers, 13 papers with code

A Randomized Link Transformer for Diverse Open-Domain Dialogue Generation

no code implementations NLP4ConvAI (ACL) 2022 Jing Yang Lee, Kong Aik Lee, Woon Seng Gan

A major issue in open-domain dialogue generation is the agent’s tendency to generate repetitive and generic responses.

Dialogue Generation

Cosine Scoring with Uncertainty for Neural Speaker Embedding

no code implementations11 Mar 2024 Qiongqiong Wang, Kong Aik Lee

Uncertainty modeling in speaker representation aims to learn the variability present in speech utterances.

Speaker Recognition

Generalizing Speaker Verification for Spoof Awareness in the Embedding Space

no code implementations20 Jan 2024 Xuechen Liu, Md Sahidullah, Kong Aik Lee, Tomi Kinnunen

To this end, we propose to generalize the standalone ASV (G-SASV) against spoofing attacks, where we leverage limited training data from CM to enhance a simple backend in the embedding space, without the involvement of a separate CM module during the test (authentication) phase.

Domain Adaptation Speaker Verification

Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification

1 code implementation6 Dec 2023 Tianchi Liu, Kong Aik Lee, Qiongqiong Wang, Haizhou Li

We represent the stride space on a trellis diagram, and conduct a systematic study on the impact of temporal and frequency resolutions on the performance and further identify two optimal points, namely Golden Gemini, which serves as a guiding principle for designing 2D ResNet-based speaker verification models.

Speaker Verification

Partially Randomizing Transformer Weights for Dialogue Response Diversity

no code implementations18 Nov 2023 Jing Yang Lee, Kong Aik Lee, Woon-Seng Gan

Despite recent progress in generative open-domain dialogue, the issue of low response diversity persists.

An Empirical Bayes Framework for Open-Domain Dialogue Generation

no code implementations18 Nov 2023 Jing Yang Lee, Kong Aik Lee, Woon-Seng Gan

To engage human users in meaningful conversation, open-domain dialogue agents are required to generate diverse and contextually coherent dialogue.

Dialogue Generation

t-EER: Parameter-Free Tandem Evaluation of Countermeasures and Biometric Comparators

1 code implementation21 Sep 2023 Tomi Kinnunen, Kong Aik Lee, Hemlata Tak, Nicholas Evans, Andreas Nautsch

The proposed approach is a strong candidate metric for the tandem evaluation of PAD systems and biometric comparators.

Towards single integrated spoofing-aware speaker verification embeddings

1 code implementation30 May 2023 Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang, Xuechen Liu, Md Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung

Second, competitive performance should be demonstrated compared to the fusion of automatic speaker verification (ASV) and countermeasure (CM) embeddings, which outperformed single embedding solutions by a large margin in the SASV2022 challenge.

Speaker Verification

Generalized domain adaptation framework for parametric back-end in speaker recognition

no code implementations24 May 2023 Qiongqiong Wang, Koji Okabe, Kong Aik Lee, Takafumi Koshinaka

The efficacy of the proposed techniques has been experimentally validated on NIST 2016, 2018, and 2019 Speaker Recognition Evaluation (SRE'16, SRE'18, and SRE'19) datasets.

Speaker Recognition Unsupervised Domain Adaptation

Incorporating Uncertainty from Speaker Embedding Estimation to Speaker Verification

no code implementations23 Feb 2023 Qiongqiong Wang, Kong Aik Lee, Tianchi Liu

We propose a log-likelihood ratio function for the PLDA scoring with the uncertainty propagation.

Speaker Verification

Cross-modal Audio-visual Co-learning for Text-independent Speaker Verification

1 code implementation22 Feb 2023 Meng Liu, Kong Aik Lee, Longbiao Wang, Hanyi Zhang, Chang Zeng, Jianwu Dang

Visual speech (i. e., lip motion) is highly related to auditory speech due to the co-occurrence and synchronization in speech production.

Text-Independent Speaker Verification

Probabilistic Back-ends for Online Speaker Recognition and Clustering

1 code implementation19 Feb 2023 Alexey Sholokhov, Nikita Kuzmin, Kong Aik Lee, Eng Siong Chng

This paper focuses on multi-enrollment speaker recognition which naturally occurs in the task of online speaker clustering, and studies the properties of different scoring back-ends in this scenario.

Clustering Online Clustering +1

Speaker recognition with two-step multi-modal deep cleansing

1 code implementation28 Oct 2022 Ruijie Tao, Kong Aik Lee, Zhan Shi, Haizhou Li

However, noisy samples (i. e., with wrong labels) in the training set induce confusion and cause the network to learn the incorrect representation.

Representation Learning Speaker Recognition +1

Self-Supervised Training of Speaker Encoder with Multi-Modal Diverse Positive Pairs

no code implementations27 Oct 2022 Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li

We study a novel neural architecture and its training strategies of speaker encoder for speaker recognition without using any identity labels.

Contrastive Learning Self-Supervised Learning +1

Scoring of Large-Margin Embeddings for Speaker Verification: Cosine or PLDA?

no code implementations8 Apr 2022 Qiongqiong Wang, Kong Aik Lee, Tianchi Liu

The emergence of large-margin softmax cross-entropy losses in training deep speaker embedding neural networks has triggered a gradual shift from parametric back-ends to a simpler cosine similarity measure for speaker verification.

Speaker Verification

Improving Contextual Coherence in Variational Personalized and Empathetic Dialogue Agents

no code implementations12 Feb 2022 Jing Yang Lee, Kong Aik Lee, Woon Seng Gan

Empirical results show that our framework significantly improves the contextual coherence of the generated response.

Dialogue Generation Response Generation

MFA: TDNN with Multi-scale Frequency-channel Attention for Text-independent Speaker Verification with Short Utterances

no code implementations3 Feb 2022 Tianchi Liu, Rohan Kumar Das, Kong Aik Lee, Haizhou Li

The time delay neural network (TDNN) represents one of the state-of-the-art of neural solutions to text-independent speaker verification.

Text-Independent Speaker Verification

PL-EESR: Perceptual Loss Based END-TO-END Robust Speaker Representation Extraction

1 code implementation3 Oct 2021 Yi Ma, Kong Aik Lee, Ville Hautamaki, Haizhou Li

Speech enhancement aims to improve the perceptual quality of the speech signal by suppression of the background noise.

Speaker Identification Speaker Verification +1

ASVspoof 2021: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan

1 code implementation1 Sep 2021 Héctor Delgado, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Xuechen Liu, Andreas Nautsch, Jose Patino, Md Sahidullah, Massimiliano Todisco, Xin Wang, Junichi Yamagishi

The automatic speaker verification spoofing and countermeasures (ASVspoof) challenge series is a community-led initiative which aims to promote the consideration of spoofing and the development of countermeasures.

Face Swapping Speaker Verification

ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection

no code implementations1 Sep 2021 Junichi Yamagishi, Xin Wang, Massimiliano Todisco, Md Sahidullah, Jose Patino, Andreas Nautsch, Xuechen Liu, Kong Aik Lee, Tomi Kinnunen, Nicholas Evans, Héctor Delgado

In addition to a continued focus upon logical and physical access tasks in which there are a number of advances compared to previous editions, ASVspoof 2021 introduces a new task involving deepfake speech detection.

Face Swapping Speaker Verification

Task-aware Warping Factors in Mask-based Speech Enhancement

no code implementations27 Aug 2021 Qiongqiong Wang, Kong Aik Lee, Takafumi Koshinaka, Koji Okabe, Hitoshi Yamamoto

It is easy to apply the proposed dual-warping factors approach to any mask-based SE method, and it allows a single SE system to handle multiple tasks without task-dependent training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Xi-Vector Embedding for Speaker Recognition

no code implementations12 Aug 2021 Kong Aik Lee, Qiongqiong Wang, Takafumi Koshinaka

We present a Bayesian formulation for deep speaker embedding, wherein the xi-vector is the Bayesian counterpart of the x-vector, taking into account the uncertainty estimate.

Speaker Recognition

Generating Personalized Dialogue via Multi-Task Meta-Learning

no code implementations7 Aug 2021 Jing Yang Lee, Kong Aik Lee, Woon Seng Gan

To address these practical limitations, we propose a novel multi-task meta-learning approach which involves training a model to adapt to new personas without relying on a large corpus, or on any predefined persona information.

Dialogue Generation Meta-Learning

Serialized Multi-Layer Multi-Head Attention for Neural Speaker Embedding

no code implementations14 Jul 2021 Hongning Zhu, Kong Aik Lee, Haizhou Li

Instead of utilizing multi-head attention in parallel, the proposed serialized multi-layer multi-head attention is designed to aggregate and propagate attentive statistics from one layer to the next in a serialized manner.

Text-Independent Speaker Verification

Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing

1 code implementation11 Jun 2021 Tomi Kinnunen, Andreas Nautsch, Md Sahidullah, Nicholas Evans, Xin Wang, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee

Whether it be for results summarization, or the analysis of classifier fusion, some means to compare different classifiers can often provide illuminating insight into their behaviour, (dis)similarity or complementarity.

Speaker Verification Voice Anti-spoofing

Exploring Deep Learning for Joint Audio-Visual Lip Biometrics

1 code implementation17 Apr 2021 Meng Liu, Longbiao Wang, Kong Aik Lee, Hanyi Zhang, Chang Zeng, Jianwu Dang

Audio-visual (AV) lip biometrics is a promising authentication technique that leverages the benefits of both the audio and visual modalities in speech communication.

Speaker Recognition

Extrapolating false alarm rates in automatic speaker verification

no code implementations8 Aug 2020 Alexey Sholokhov, Tomi Kinnunen, Ville Vestman, Kong Aik Lee

Automatic speaker verification (ASV) vendors and corpus providers would both benefit from tools to reliably extrapolate performance metrics for large speaker populations without collecting new speakers.

Speaker Verification

Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals

no code implementations12 Jul 2020 Tomi Kinnunen, Héctor Delgado, Nicholas Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang, Md Sahidullah, Junichi Yamagishi, Douglas A. Reynolds

Recent years have seen growing efforts to develop spoofing countermeasures (CMs) to protect automatic speaker verification (ASV) systems from being deceived by manipulated or artificial inputs.

Speaker Verification

Neural i-vectors

no code implementations3 Apr 2020 Ville Vestman, Kong Aik Lee, Tomi H. Kinnunen

To combine the benefits of high performance and generative interpretation, we investigate the use of deep embedding extractor and i-vector extractor in succession.

Speaker Recognition Speaker Verification

Voice Biometrics Security: Extrapolating False Alarm Rate via Hierarchical Bayesian Modeling of Speaker Verification Scores

no code implementations4 Nov 2019 Alexey Sholokhov, Tomi Kinnunen, Ville Vestman, Kong Aik Lee

We put forward a novel performance assessment framework to address both the inadequacy of the random-impostor evaluation model and the size limitation of evaluation corpora by addressing ASV security against closest impostors on arbitrarily large datasets.

Speaker Verification

Unleashing the Unused Potential of I-Vectors Enabled by GPU Acceleration

no code implementations20 Jun 2019 Ville Vestman, Kong Aik Lee, Tomi H. Kinnunen, Takafumi Koshinaka

In particular, we achieve an acceleration of 3000 times in frame posterior computation compared to real time and 25 times in training the i-vector extractor compared to the CPU baseline from Kaldi toolkit.

Speaker Verification

Fantastic 4 system for NIST 2015 Language Recognition Evaluation

no code implementations5 Feb 2016 Kong Aik Lee, Ville Hautamäki, Anthony Larcher, Wei Rao, Hanwu Sun, Trung Hieu Nguyen, Guangsen Wang, Aleksandr Sizov, Ivan Kukanov, Amir Poorjam, Trung Ngo Trong, Xiong Xiao, Cheng-Lin Xu, Hai-Hua Xu, Bin Ma, Haizhou Li, Sylvain Meignier

This article describes the systems jointly submitted by Institute for Infocomm (I$^2$R), the Laboratoire d'Informatique de l'Universit\'e du Maine (LIUM), Nanyang Technology University (NTU) and the University of Eastern Finland (UEF) for 2015 NIST Language Recognition Evaluation (LRE).

regression

Cannot find the paper you are looking for? You can Submit a new open access paper.