Search Results for author: Kong Aik Lee

Found 45 papers, 13 papers with code

A Randomized Link Transformer for Diverse Open-Domain Dialogue Generation

no code implementations • NLP4ConvAI (ACL) 2022 • Jing Yang Lee, Kong Aik Lee, Woon Seng Gan

A major issue in open-domain dialogue generation is the agent’s tendency to generate repetitive and generic responses.

Dialogue Generation

Paper
Add Code

Cosine Scoring with Uncertainty for Neural Speaker Embedding

no code implementations • 11 Mar 2024 • Qiongqiong Wang, Kong Aik Lee

Uncertainty modeling in speaker representation aims to learn the variability present in speech utterances.

Speaker Recognition

Paper
Add Code

VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis

no code implementations • 1 Mar 2024 • Weiwei Lin, Chenhang He, Man-Wai Mak, Jiachen Lian, Kong Aik Lee

This forces the model to learn a speaker distribution disentangled from the semantic content.

Speech Synthesis

Paper
Add Code

Generalizing Speaker Verification for Spoof Awareness in the Embedding Space

no code implementations • 20 Jan 2024 • Xuechen Liu, Md Sahidullah, Kong Aik Lee, Tomi Kinnunen

To this end, we propose to generalize the standalone ASV (G-SASV) against spoofing attacks, where we leverage limited training data from CM to enhance a simple backend in the embedding space, without the involvement of a separate CM module during the test (authentication) phase.

Domain Adaptation Speaker Verification

Paper
Add Code

Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification

1 code implementation • 6 Dec 2023 • Tianchi Liu, Kong Aik Lee, Qiongqiong Wang, Haizhou Li

We represent the stride space on a trellis diagram, and conduct a systematic study on the impact of temporal and frequency resolutions on the performance and further identify two optimal points, namely Golden Gemini, which serves as a guiding principle for designing 2D ResNet-based speaker verification models.

Speaker Verification

534

Paper
Code

Partially Randomizing Transformer Weights for Dialogue Response Diversity

no code implementations • 18 Nov 2023 • Jing Yang Lee, Kong Aik Lee, Woon-Seng Gan

Despite recent progress in generative open-domain dialogue, the issue of low response diversity persists.

Paper
Add Code

An Empirical Bayes Framework for Open-Domain Dialogue Generation

no code implementations • 18 Nov 2023 • Jing Yang Lee, Kong Aik Lee, Woon-Seng Gan

To engage human users in meaningful conversation, open-domain dialogue agents are required to generate diverse and contextually coherent dialogue.

Dialogue Generation

Paper
Add Code

t-EER: Parameter-Free Tandem Evaluation of Countermeasures and Biometric Comparators

1 code implementation • 21 Sep 2023 • Tomi Kinnunen, Kong Aik Lee, Hemlata Tak, Nicholas Evans, Andreas Nautsch

The proposed approach is a strong candidate metric for the tandem evaluation of PAD systems and biometric comparators.

Paper
Code

Towards single integrated spoofing-aware speaker verification embeddings

1 code implementation • 30 May 2023 • Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang, Xuechen Liu, Md Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung

Second, competitive performance should be demonstrated compared to the fusion of automatic speaker verification (ASV) and countermeasure (CM) embeddings, which outperformed single embedding solutions by a large margin in the SASV2022 challenge.

Speaker Verification

Paper
Code

Generalized domain adaptation framework for parametric back-end in speaker recognition

no code implementations • 24 May 2023 • Qiongqiong Wang, Koji Okabe, Kong Aik Lee, Takafumi Koshinaka

The efficacy of the proposed techniques has been experimentally validated on NIST 2016, 2018, and 2019 Speaker Recognition Evaluation (SRE'16, SRE'18, and SRE'19) datasets.

Speaker Recognition Unsupervised Domain Adaptation

Paper
Add Code

Incorporating Uncertainty from Speaker Embedding Estimation to Speaker Verification

no code implementations • 23 Feb 2023 • Qiongqiong Wang, Kong Aik Lee, Tianchi Liu

We propose a log-likelihood ratio function for the PLDA scoring with the uncertainty propagation.

Speaker Verification

Paper
Add Code

Cross-modal Audio-visual Co-learning for Text-independent Speaker Verification

1 code implementation • 22 Feb 2023 • Meng Liu, Kong Aik Lee, Longbiao Wang, Hanyi Zhang, Chang Zeng, Jianwu Dang

Visual speech (i. e., lip motion) is highly related to auditory speech due to the co-occurrence and synchronization in speech production.

Text-Independent Speaker Verification

Paper
Code

Probabilistic Back-ends for Online Speaker Recognition and Clustering

1 code implementation • 19 Feb 2023 • Alexey Sholokhov, Nikita Kuzmin, Kong Aik Lee, Eng Siong Chng

This paper focuses on multi-enrollment speaker recognition which naturally occurs in the task of online speaker clustering, and studies the properties of different scoring back-ends in this scenario.

Clustering Online Clustering +1

Paper
Code

I4U System Description for NIST SRE'20 CTS Challenge

no code implementations • 2 Nov 2022 • Kong Aik Lee, Tomi Kinnunen, Daniele Colibro, Claudio Vair, Andreas Nautsch, Hanwu Sun, Liang He, Tianyu Liang, Qiongqiong Wang, Mickael Rouvier, Pierre-Michel Bousquet, Rohan Kumar Das, Ignacio Viñals Bailo, Meng Liu, Héctor Deldago, Xuechen Liu, Md Sahidullah, Sandro Cumani, Boning Zhang, Koji Okabe, Hitoshi Yamamoto, Ruijie Tao, Haizhou Li, Alfonso Ortega Giménez, Longbiao Wang, Luis Buera

This manuscript describes the I4U submission to the 2020 NIST Speaker Recognition Evaluation (SRE'20) Conversational Telephone Speech (CTS) Challenge.

Speaker Recognition

Paper
Add Code

Speaker recognition with two-step multi-modal deep cleansing

1 code implementation • 28 Oct 2022 • Ruijie Tao, Kong Aik Lee, Zhan Shi, Haizhou Li

However, noisy samples (i. e., with wrong labels) in the training set induce confusion and cause the network to learn the incorrect representation.

Representation Learning Speaker Recognition +1

Paper
Code

Self-Supervised Training of Speaker Encoder with Multi-Modal Diverse Positive Pairs

no code implementations • 27 Oct 2022 • Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li

We study a novel neural architecture and its training strategies of speaker encoder for speaker recognition without using any identity labels.

Contrastive Learning Self-Supervised Learning +1

Paper
Add Code

Deep Spectro-temporal Artifacts for Detecting Synthesized Speech

no code implementations • 11 Oct 2022 • Xiaohui Liu, Meng Liu, Lin Zhang, Linjuan Zhang, Chang Zeng, Kai Li, Nan Li, Kong Aik Lee, Longbiao Wang, Jianwu Dang

The Audio Deep Synthesis Detection (ADD) Challenge has been held to detect generated human-like speech.

Data Augmentation Domain Adaptation +1

Paper
Add Code

The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines

1 code implementation • 17 Aug 2022 • Gaofeng Cheng, Yifan Chen, Runyan Yang, Qingxuan Li, Zehui Yang, Lingxuan Ye, Pengyuan Zhang, Qingqing Zhang, Lei Xie, Yanmin Qian, Kong Aik Lee, Yonghong Yan

In the metric aspect, we design the new conversational DER (CDER) evaluation metric, which calculates the SD accuracy at the utterance level.

Machine Translation speaker-diarization +1

Paper
Code

Scoring of Large-Margin Embeddings for Speaker Verification: Cosine or PLDA?

no code implementations • 8 Apr 2022 • Qiongqiong Wang, Kong Aik Lee, Tianchi Liu

The emergence of large-margin softmax cross-entropy losses in training deep speaker embedding neural networks has triggered a gradual shift from parametric back-ends to a simpler cosine similarity measure for speaker verification.

Speaker Verification

Paper
Add Code

Improving Contextual Coherence in Variational Personalized and Empathetic Dialogue Agents

no code implementations • 12 Feb 2022 • Jing Yang Lee, Kong Aik Lee, Woon Seng Gan

Empirical results show that our framework significantly improves the contextual coherence of the generated response.

Dialogue Generation Response Generation

Paper
Add Code

MFA: TDNN with Multi-scale Frequency-channel Attention for Text-independent Speaker Verification with Short Utterances

no code implementations • 3 Feb 2022 • Tianchi Liu, Rohan Kumar Das, Kong Aik Lee, Haizhou Li

The time delay neural network (TDNN) represents one of the state-of-the-art of neural solutions to text-independent speaker verification.

Text-Independent Speaker Verification

Paper
Add Code

DLVGen: A Dual Latent Variable Approach to Personalized Dialogue Generation

no code implementations • 22 Nov 2021 • Jing Yang Lee, Kong Aik Lee, Woon Seng Gan

The generation of personalized dialogue is vital to natural and human-like conversation.

Dialogue Generation

Paper
Add Code

Self-supervised Speaker Recognition with Loss-gated Learning

1 code implementation • 8 Oct 2021 • Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li

In self-supervised learning for speaker recognition, pseudo labels are useful as the supervision signals.

Self-Supervised Learning Speaker Recognition

Paper
Code

PL-EESR: Perceptual Loss Based END-TO-END Robust Speaker Representation Extraction

1 code implementation • 3 Oct 2021 • Yi Ma, Kong Aik Lee, Ville Hautamaki, Haizhou Li

Speech enhancement aims to improve the perceptual quality of the speech signal by suppression of the background noise.

Speaker Identification Speaker Verification +1

Paper
Code

ASVspoof 2021: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan

1 code implementation • 1 Sep 2021 • Héctor Delgado, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Xuechen Liu, Andreas Nautsch, Jose Patino, Md Sahidullah, Massimiliano Todisco, Xin Wang, Junichi Yamagishi

The automatic speaker verification spoofing and countermeasures (ASVspoof) challenge series is a community-led initiative which aims to promote the consideration of spoofing and the development of countermeasures.

Face Swapping Speaker Verification

167

Paper
Code

ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection

no code implementations • 1 Sep 2021 • Junichi Yamagishi, Xin Wang, Massimiliano Todisco, Md Sahidullah, Jose Patino, Andreas Nautsch, Xuechen Liu, Kong Aik Lee, Tomi Kinnunen, Nicholas Evans, Héctor Delgado

In addition to a continued focus upon logical and physical access tasks in which there are a number of advances compared to previous editions, ASVspoof 2021 introduces a new task involving deepfake speech detection.

Face Swapping Speaker Verification

Paper
Add Code

Task-aware Warping Factors in Mask-based Speech Enhancement

no code implementations • 27 Aug 2021 • Qiongqiong Wang, Kong Aik Lee, Takafumi Koshinaka, Koji Okabe, Hitoshi Yamamoto

It is easy to apply the proposed dual-warping factors approach to any mask-based SE method, and it allows a single SE system to handle multiple tasks without task-dependent training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Xi-Vector Embedding for Speaker Recognition

no code implementations • 12 Aug 2021 • Kong Aik Lee, Qiongqiong Wang, Takafumi Koshinaka

We present a Bayesian formulation for deep speaker embedding, wherein the xi-vector is the Bayesian counterpart of the x-vector, taking into account the uncertainty estimate.

Speaker Recognition

Paper
Add Code

Generating Personalized Dialogue via Multi-Task Meta-Learning

no code implementations • 7 Aug 2021 • Jing Yang Lee, Kong Aik Lee, Woon Seng Gan

To address these practical limitations, we propose a novel multi-task meta-learning approach which involves training a model to adapt to new personas without relying on a large corpus, or on any predefined persona information.

Dialogue Generation Meta-Learning

Paper
Add Code

Serialized Multi-Layer Multi-Head Attention for Neural Speaker Embedding

no code implementations • 14 Jul 2021 • Hongning Zhu, Kong Aik Lee, Haizhou Li

Instead of utilizing multi-head attention in parallel, the proposed serialized multi-layer multi-head attention is designed to aggregate and propagate attentive statistics from one layer to the next in a serialized manner.

Text-Independent Speaker Verification

Paper
Add Code

Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing

1 code implementation • 11 Jun 2021 • Tomi Kinnunen, Andreas Nautsch, Md Sahidullah, Nicholas Evans, Xin Wang, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee

Whether it be for results summarization, or the analysis of classifier fusion, some means to compare different classifiers can often provide illuminating insight into their behaviour, (dis)similarity or complementarity.

Speaker Verification Voice Anti-spoofing

Paper
Code

Exploring Deep Learning for Joint Audio-Visual Lip Biometrics

1 code implementation • 17 Apr 2021 • Meng Liu, Longbiao Wang, Kong Aik Lee, Hanyi Zhang, Chang Zeng, Jianwu Dang

Audio-visual (AV) lip biometrics is a promising authentication technique that leverages the benefits of both the audio and visual modalities in speech communication.

Speaker Recognition

Paper
Code

ASVspoof 2019: spoofing countermeasures for the detection of synthesized, converted and replayed speech

no code implementations • 11 Feb 2021 • Andreas Nautsch, Xin Wang, Nicholas Evans, Tomi Kinnunen, Ville Vestman, Massimiliano Todisco, Héctor Delgado, Md Sahidullah, Junichi Yamagishi, Kong Aik Lee

The ASVspoof initiative was conceived to spearhead research in anti-spoofing for automatic speaker verification (ASV).

Speaker Verification Speech Synthesis +2

Paper
Add Code

Extrapolating false alarm rates in automatic speaker verification

no code implementations • 8 Aug 2020 • Alexey Sholokhov, Tomi Kinnunen, Ville Vestman, Kong Aik Lee

Automatic speaker verification (ASV) vendors and corpus providers would both benefit from tools to reliably extrapolate performance metrics for large speaker populations without collecting new speakers.

Speaker Verification

Paper
Add Code

Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals

no code implementations • 12 Jul 2020 • Tomi Kinnunen, Héctor Delgado, Nicholas Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang, Md Sahidullah, Junichi Yamagishi, Douglas A. Reynolds

Recent years have seen growing efforts to develop spoofing countermeasures (CMs) to protect automatic speaker verification (ASV) systems from being deceived by manipulated or artificial inputs.

Speaker Verification

Paper
Add Code

Neural i-vectors

no code implementations • 3 Apr 2020 • Ville Vestman, Kong Aik Lee, Tomi H. Kinnunen

To combine the benefits of high performance and generative interpretation, we investigate the use of deep embedding extractor and i-vector extractor in succession.

Speaker Recognition Speaker Verification

Paper
Add Code

Short-duration Speaker Verification (SdSV) Challenge 2021: the Challenge Evaluation Plan

no code implementations • 13 Dec 2019 • Hossein Zeinali, Kong Aik Lee, Jahangir Alam, Lukas Burget

This document describes the Short-duration Speaker Verification (SdSV) Challenge 2021.

Speaker Recognition Text-Dependent Speaker Verification +1

Paper
Add Code

Speaker detection in the wild: Lessons learned from JSALT 2019

1 code implementation • 2 Dec 2019 • Paola Garcia, Jesus Villalba, Herve Bredin, Jun Du, Diego Castan, Alejandrina Cristia, Latane Bullock, Ling Guo, Koji Okabe, Phani Sankar Nidadavolu, Saurabh Kataria, Sizhu Chen, Leo Galmant, Marvin Lavechin, Lei Sun, Marie-Philippe Gill, Bar Ben-Yair, Sajjad Abdoli, Xin Wang, Wassim Bouaziz, Hadrien Titeux, Emmanuel Dupoux, Kong Aik Lee, Najim Dehak

This paper presents the problems and solutions addressed at the JSALT workshop when using a single microphone for speaker detection in adverse scenarios.

Audio and Speech Processing Sound

Paper
Code

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

no code implementations • 5 Nov 2019 • Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Hector Delgado, Andreas Nautsch, Nicholas Evans, Md Sahidullah, Ville Vestman, Tomi Kinnunen, Kong Aik Lee, Lauri Juvela, Paavo Alku, Yu-Huai Peng, Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Sebastien Le Maguer, Markus Becker, Fergus Henderson, Rob Clark, Yu Zhang, Quan Wang, Ye Jia, Kai Onuma, Koji Mushika, Takashi Kaneda, Yuan Jiang, Li-Juan Liu, Yi-Chiao Wu, Wen-Chin Huang, Tomoki Toda, Kou Tanaka, Hirokazu Kameoka, Ingmar Steiner, Driss Matrouf, Jean-Francois Bonastre, Avashna Govender, Srikanth Ronanki, Jing-Xuan Zhang, Zhen-Hua Ling

Spoofing attacks within a logical access (LA) scenario are generated with the latest speech synthesis and voice conversion technologies, including state-of-the-art neural acoustic and waveform model techniques.

Person Recognition Speaker Verification +2

Paper
Add Code

Voice Biometrics Security: Extrapolating False Alarm Rate via Hierarchical Bayesian Modeling of Speaker Verification Scores

no code implementations • 4 Nov 2019 • Alexey Sholokhov, Tomi Kinnunen, Ville Vestman, Kong Aik Lee

We put forward a novel performance assessment framework to address both the inadequacy of the random-impostor evaluation model and the size limitation of evaluation corpora by addressing ASV security against closest impostors on arbitrarily large datasets.

Speaker Verification

Paper
Add Code

Unleashing the Unused Potential of I-Vectors Enabled by GPU Acceleration

no code implementations • 20 Jun 2019 • Ville Vestman, Kong Aik Lee, Tomi H. Kinnunen, Takafumi Koshinaka

In particular, we achieve an acceleration of 3000 times in frame posterior computation compared to real time and 25 times in training the i-vector extractor compared to the CPU baseline from Kaldi toolkit.

Speaker Verification

Paper
Add Code

I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

no code implementations • 16 Apr 2019 • Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda, Trung Ngo Trong, Md Sahidullah, Fan Lu, Yun Tang, Ming Tu, Kah Kuan Teh, Huy Dat Tran, Kuruvachan K. George, Ivan Kukanov, Florent Desnous, Jichen Yang, Emre Yilmaz, Longting Xu, Jean-Francois Bonastre, Cheng-Lin Xu, Zhi Hao Lim, Eng Siong Chng, Shivesh Ranjan, John H. L. Hansen, Massimiliano Todisco, Nicholas Evans

The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE).

Domain Adaptation Speaker Recognition

Paper
Add Code

The CORAL+ Algorithm for Unsupervised Domain Adaptation of PLDA

no code implementations • 26 Dec 2018 • Kong Aik Lee, Qiongqiong Wang, Takafumi Koshinaka

We refer to the model-based adaptation technique proposed in this paper as CORAL+.

Speaker Recognition Unsupervised Domain Adaptation

Paper
Add Code

t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification

no code implementations • 25 Apr 2018 • Tomi Kinnunen, Kong Aik Lee, Hector Delgado, Nicholas Evans, Massimiliano Todisco, Md Sahidullah, Junichi Yamagishi, Douglas A. Reynolds

The two challenge editions in 2015 and 2017 involved the assessment of spoofing countermeasures (CMs) in isolation from ASV using an equal error rate (EER) metric.

Speaker Verification

Paper
Add Code

Fantastic 4 system for NIST 2015 Language Recognition Evaluation

no code implementations • 5 Feb 2016 • Kong Aik Lee, Ville Hautamäki, Anthony Larcher, Wei Rao, Hanwu Sun, Trung Hieu Nguyen, Guangsen Wang, Aleksandr Sizov, Ivan Kukanov, Amir Poorjam, Trung Ngo Trong, Xiong Xiao, Cheng-Lin Xu, Hai-Hua Xu, Bin Ma, Haizhou Li, Sylvain Meignier

This article describes the systems jointly submitted by Institute for Infocomm (I$^2$R), the Laboratoire d'Informatique de l'Universit\'e du Maine (LIUM), Nanyang Technology University (NTU) and the University of Eastern Finland (UEF) for 2015 NIST Language Recognition Evaluation (LRE).

regression

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.