no code implementations • 16 Aug 2024 • Xin Wang, Hector Delgado, Hemlata Tak, Jee-weon Jung, Hye-jin Shim, Massimiliano Todisco, Ivan Kukanov, Xuechen Liu, Md Sahidullah, Tomi Kinnunen, Nicholas Evans, Kong Aik Lee, Junichi Yamagishi
ASVspoof 5 is the fifth edition in a series of challenges that promote the study of speech spoofing and deepfake attacks, and the design of detection solutions.
no code implementations • 25 Jun 2024 • Hye-jin Shim, Md Sahidullah, Jee-weon Jung, Shinji Watanabe, Tomi Kinnunen
Our investigations highlight the significant differences in training dynamics between the two classes, emphasizing the need for future research to focus on robust modeling of the bonafide class.
1 code implementation • 16 Jun 2024 • Xin Wang, Tomi Kinnunen, Kong Aik Lee, Paul-Gauthier Noé, Junichi Yamagishi
The outcomes of these findings, namely, the score calibration before fusion, improved linear fusion, and better non-linear fusion, were found to be effective on the SASV challenge database.
no code implementations • 14 Jun 2024 • Vishwanath Pratap Singh, Federico Malato, Ville Hautamaki, Md. Sahidullah, Tomi Kinnunen
While automatic speech recognition (ASR) greatly benefits from data augmentation, the augmentation recipes themselves tend to be heuristic.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 3 Mar 2024 • Hye-jin Shim, Jee-weon Jung, Tomi Kinnunen, Nicholas Evans, Jean-Francois Bonastre, Itshak Lapidot
Spoofing detection is today a mainstream research topic.
1 code implementation • 23 Feb 2024 • Vishwanath Pratap Singh, Md Sahidullah, Tomi Kinnunen
One promising approach is to align vocal-tract parameters between adults and children through children-specific data augmentation, referred here to as ChildAugment.
no code implementations • 20 Jan 2024 • Xuechen Liu, Md Sahidullah, Kong Aik Lee, Tomi Kinnunen
To this end, we propose to generalize the standalone ASV (G-SASV) against spoofing attacks, where we leverage limited training data from CM to enhance a simple backend in the embedding space, without the involvement of a separate CM module during the test (authentication) phase.
1 code implementation • 21 Sep 2023 • Tomi Kinnunen, Kong Aik Lee, Hemlata Tak, Nicholas Evans, Andreas Nautsch
The proposed approach is a strong candidate metric for the tandem evaluation of PAD systems and biometric comparators.
no code implementations • 13 Jun 2023 • Vishwanath Pratap Singh, Md Sahidullah, Tomi Kinnunen
The first dataset, used for addressing short-term ageing (up to 10 years time difference between enrollment and test) under uncontrolled conditions, is VoxCeleb.
1 code implementation • 31 May 2023 • Hye-jin Shim, Jee-weon Jung, Tomi Kinnunen
Audio anti-spoofing for automatic speaker verification aims to safeguard users' identities from spoofing attacks.
no code implementations • 31 May 2023 • Hye-jin Shim, Rosa González Hautamäki, Md Sahidullah, Tomi Kinnunen
Shortcut learning, or `Clever Hans effect` refers to situations where a learning agent (e. g., deep neural networks) learns spurious correlations present in data, resulting in biased models.
1 code implementation • 30 May 2023 • Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang, Xuechen Liu, Md Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung
Second, competitive performance should be demonstrated compared to the fusion of automatic speaker verification (ASV) and countermeasure (CM) embeddings, which outperformed single embedding solutions by a large margin in the SASV2022 challenge.
no code implementations • 2 Mar 2023 • Xuechen Liu, Md Sahidullah, Tomi Kinnunen
Even though deep speaker models have demonstrated impressive accuracy in speaker verification tasks, this often comes at the expense of increased model size and computation time, presenting challenges for deployment in resource-constrained environments.
no code implementations • 20 Feb 2023 • Mark Anderson, Tomi Kinnunen, Naomi Harte
We show that although performance is overall improved, the filterbanks exhibit strong sensitivity to their initialisation strategy.
no code implementations • 2 Nov 2022 • Kong Aik Lee, Tomi Kinnunen, Daniele Colibro, Claudio Vair, Andreas Nautsch, Hanwu Sun, Liang He, Tianyu Liang, Qiongqiong Wang, Mickael Rouvier, Pierre-Michel Bousquet, Rohan Kumar Das, Ignacio Viñals Bailo, Meng Liu, Héctor Deldago, Xuechen Liu, Md Sahidullah, Sandro Cumani, Boning Zhang, Koji Okabe, Hitoshi Yamamoto, Ruijie Tao, Haizhou Li, Alfonso Ortega Giménez, Longbiao Wang, Luis Buera
This manuscript describes the I4U submission to the 2020 NIST Speaker Recognition Evaluation (SRE'20) Conversational Telephone Speech (CTS) Challenge.
1 code implementation • 14 May 2022 • Anssi Kanervisto, Tomi Kinnunen, Ville Hautamäki
Playing games with cheaters is not fun, and in a multi-billion-dollar video game industry with hundreds of millions of players, game developers aim to improve the security and, consequently, the user experience of their games by preventing cheating.
1 code implementation • 30 Apr 2022 • Alexey Sholokhov, Xuechen Liu, Md Sahidullah, Tomi Kinnunen
Speaker recognition on household devices, such as smart speakers, features several challenges: (i) robustness across a vast number of heterogeneous domains (households), (ii) short utterances, (iii) possibly absent speaker labels of the enrollment data (passive enrollment), and (iv) presence of unknown persons (guests).
1 code implementation • 31 Mar 2022 • Lauri Tavi, Tomi Kinnunen, Rosa González Hautamäki
Due to a constantly increasing amount of speech data that is stored in different types of databases, voice privacy has become a major concern.
no code implementations • 28 Mar 2022 • Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas Evans, Tomi Kinnunen
Pre-trained spoofing detection and speaker verification models are provided as open source and are used in two baseline SASV solutions.
no code implementations • 21 Mar 2022 • Xuechen Liu, Md Sahidullah, Tomi Kinnunen
In this paper, we initiate the concern of enhancing the spoofing robustness of the automatic speaker verification (ASV) system, without the primary presence of a separate countermeasure module.
no code implementations • 10 Feb 2022 • Xuechen Liu, Md Sahidullah, Tomi Kinnunen
We consider different kinds of channel-dependent (CD) nonlinear compression methods optimized in a data-driven manner.
no code implementations • 24 Jan 2022 • Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen, Junichi Yamagishi
As automatic speaker verification (ASV) systems are vulnerable to spoofing attacks, they are typically used in conjunction with spoofing countermeasure (CM) systems to improve security.
no code implementations • 21 Oct 2021 • Xuechen Liu, Md Sahidullah, Tomi Kinnunen
Multi-taper estimators provide low-variance power spectrum estimates that can be used in place of the windowed discrete Fourier transform (DFT) to extract speech features such as mel-frequency cepstral coefficients (MFCCs).
1 code implementation • 28 Sep 2021 • Khaled Hechmi, Trung Ngo Trong, Ville Hautamaki, Tomi Kinnunen
VoxCeleb datasets are widely used in speaker recognition studies.
no code implementations • 24 Sep 2021 • Xuechen Liu, Md Sahidullah, Tomi Kinnunen
We address far-field speaker verification with deep neural network (DNN) based speaker embedding extractor, where mismatch between enrollment and test data often comes from convolutive effects (e. g. room reverberation) and noise.
no code implementations • 24 Sep 2021 • Xuechen Liu, Md Sahidullah, Tomi Kinnunen
After their introduction to robust speech recognition, power normalized cepstral coefficient (PNCC) features were successfully adopted to other tasks, including speaker verification.
no code implementations • 1 Sep 2021 • Junichi Yamagishi, Xin Wang, Massimiliano Todisco, Md Sahidullah, Jose Patino, Andreas Nautsch, Xuechen Liu, Kong Aik Lee, Tomi Kinnunen, Nicholas Evans, Héctor Delgado
In addition to a continued focus upon logical and physical access tasks in which there are a number of advances compared to previous editions, ASVspoof 2021 introduces a new task involving deepfake speech detection.
1 code implementation • 1 Sep 2021 • Héctor Delgado, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Xuechen Liu, Andreas Nautsch, Jose Patino, Md Sahidullah, Massimiliano Todisco, Xin Wang, Junichi Yamagishi
The automatic speaker verification spoofing and countermeasures (ASVspoof) challenge series is a community-led initiative which aims to promote the consideration of spoofing and the development of countermeasures.
1 code implementation • 11 Jun 2021 • Tomi Kinnunen, Andreas Nautsch, Md Sahidullah, Nicholas Evans, Xin Wang, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee
Whether it be for results summarization, or the analysis of classifier fusion, some means to compare different classifiers can often provide illuminating insight into their behaviour, (dis)similarity or complementarity.
no code implementations • 26 Mar 2021 • Bhusan Chettri, Rosa González Hautamäki, Md Sahidullah, Tomi Kinnunen
Voice anti-spoofing aims at classifying a given utterance either as a bonafide human sample, or a spoofing attack (e. g. synthetic or replayed sample).
no code implementations • 20 Feb 2021 • Xuechen Liu, Md Sahidullah, Tomi Kinnunen
We propose a learnable mel-frequency cepstral coefficient (MFCC) frontend architecture for deep neural network (DNN) based automatic speaker verification.
no code implementations • 11 Feb 2021 • Andreas Nautsch, Xin Wang, Nicholas Evans, Tomi Kinnunen, Ville Vestman, Massimiliano Todisco, Héctor Delgado, Md Sahidullah, Junichi Yamagishi, Kong Aik Lee
The ASVspoof initiative was conceived to spearhead research in anti-spoofing for automatic speaker verification (ASV).
1 code implementation • 2 Dec 2020 • Anssi Kanervisto, Tomi Kinnunen, Ville Hautamäki
Behavioural characterizations (BCs) of decision-making agents, or their policies, are used to study outcomes of training algorithms and as part of the algorithms themselves to encourage unique policies, match expert policy or restrict changes to policy per update.
no code implementations • 8 Aug 2020 • Alexey Sholokhov, Tomi Kinnunen, Ville Vestman, Kong Aik Lee
Automatic speaker verification (ASV) vendors and corpus providers would both benefit from tools to reliably extrapolate performance metrics for large speaker populations without collecting new speakers.
no code implementations • 30 Jul 2020 • Xuechen Liu, Md Sahidullah, Tomi Kinnunen
Modern automatic speaker verification relies largely on deep neural networks (DNNs) trained on mel-frequency cepstral coefficient (MFCC) features.
no code implementations • 26 Jul 2020 • Md Sahidullah, Achintya Kumar Sarkar, Ville Vestman, Xuechen Liu, Romain Serizel, Tomi Kinnunen, Zheng-Hua Tan, Emmanuel Vincent
Our primary submission to the challenge is the fusion of seven subsystems which yields a normalized minimum detection cost function (minDCF) of 0. 072 and an equal error rate (EER) of 2. 14% on the evaluation set.
no code implementations • 12 Jul 2020 • Tomi Kinnunen, Héctor Delgado, Nicholas Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang, Md Sahidullah, Junichi Yamagishi, Douglas A. Reynolds
Recent years have seen growing efforts to develop spoofing countermeasures (CMs) to protect automatic speaker verification (ASV) systems from being deceived by manipulated or artificial inputs.
1 code implementation • 6 Feb 2020 • Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen, Junichi Yamagishi
The spoofing countermeasure (CM) systems in automatic speaker verification (ASV) are not typically used in isolation of each other.
no code implementations • 5 Nov 2019 • Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Hector Delgado, Andreas Nautsch, Nicholas Evans, Md Sahidullah, Ville Vestman, Tomi Kinnunen, Kong Aik Lee, Lauri Juvela, Paavo Alku, Yu-Huai Peng, Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Sebastien Le Maguer, Markus Becker, Fergus Henderson, Rob Clark, Yu Zhang, Quan Wang, Ye Jia, Kai Onuma, Koji Mushika, Takashi Kaneda, Yuan Jiang, Li-Juan Liu, Yi-Chiao Wu, Wen-Chin Huang, Tomoki Toda, Kou Tanaka, Hirokazu Kameoka, Ingmar Steiner, Driss Matrouf, Jean-Francois Bonastre, Avashna Govender, Srikanth Ronanki, Jing-Xuan Zhang, Zhen-Hua Ling
Spoofing attacks within a logical access (LA) scenario are generated with the latest speech synthesis and voice conversion technologies, including state-of-the-art neural acoustic and waveform model techniques.
no code implementations • 4 Nov 2019 • Alexey Sholokhov, Tomi Kinnunen, Ville Vestman, Kong Aik Lee
We put forward a novel performance assessment framework to address both the inadequacy of the random-impostor evaluation model and the size limitation of evaluation corpora by addressing ASV security against closest impostors on arbitrarily large datasets.
no code implementations • 3 Jun 2019 • Ville Vestman, Tomi Kinnunen, Rosa González Hautamäki, Md Sahidullah
Our goal is to gain insights how well similarity rankings transfer from the attacker's ASV system to the attacked ASV system, whether the attackers are able to improve their attacks by mimicking, and how the properties of the voices of attackers change due to mimicking.
no code implementations • 16 Apr 2019 • Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda, Trung Ngo Trong, Md Sahidullah, Fan Lu, Yun Tang, Ming Tu, Kah Kuan Teh, Huy Dat Tran, Kuruvachan K. George, Ivan Kukanov, Florent Desnous, Jichen Yang, Emre Yilmaz, Longting Xu, Jean-Francois Bonastre, Cheng-Lin Xu, Zhi Hao Lim, Eng Siong Chng, Shivesh Ranjan, John H. L. Hansen, Massimiliano Todisco, Nicholas Evans
The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE).
no code implementations • 4 Jan 2019 • Md Sahidullah, Hector Delgado, Massimiliano Todisco, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, Kong-Aik Lee
Over the past few years significant progress has been made in the field of presentation attack detection (PAD) for automatic speaker recognition (ASV).
no code implementations • 9 Nov 2018 • Tomi Kinnunen, Rosa González Hautamäki, Ville Vestman, Md Sahidullah
We consider technology-assisted mimicry attacks in the context of automatic speaker verification (ASV).
1 code implementation • 8 Nov 2018 • Ville Vestman, Bilal Soomro, Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen
The popularization of science can often be disregarded by scientists as it may be challenging to put highly sophisticated research into words that general public can understand.
Audio and Speech Processing Sound
no code implementations • 2 Jul 2018 • Akihiro Kato, Tomi Kinnunen
The fundamental frequency (F0) represents pitch in speech that determines prosodic characteristics of speech and is needed in various tasks for speech analysis and synthesis.
no code implementations • 8 May 2018 • Akihiro Kato, Tomi Kinnunen
The latest prior research addresses this problem first as a frame-by-frame-classification problem followed by sequence tracking using deep neural network hidden Markov model (DNN-HMM) hybrid architecture.
no code implementations • 3 May 2018 • Ville Vestman, Tomi Kinnunen
The results suggest that, in terms of ASV accuracy, the supervector compression approaches are on a par with FEFA.
1 code implementation • 25 Apr 2018 • Tomi Kinnunen, Kong Aik Lee, Hector Delgado, Nicholas Evans, Massimiliano Todisco, Md Sahidullah, Junichi Yamagishi, Douglas A. Reynolds
The two challenge editions in 2015 and 2017 involved the assessment of spoofing countermeasures (CMs) in isolation from ASV using an equal error rate (EER) metric.
no code implementations • 23 Apr 2018 • Tomi Kinnunen, Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Zhen-Hua Ling
As a supplement to subjective results for the 2018 Voice Conversion Challenge (VCC'18) data, we configure a standard constant-Q cepstral coefficient CM to quantify the extent of processing artifacts.
no code implementations • 12 Apr 2018 • Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Tomi Kinnunen, Zhen-Hua Ling
We present the Voice Conversion Challenge 2018, designed as a follow up to the 2016 edition with the aim of providing a common framework for evaluating and comparing different state-of-the-art voice conversion (VC) systems.
no code implementations • 2 Mar 2018 • Jaime Lorenzo-Trueba, Fuming Fang, Xin Wang, Isao Echizen, Junichi Yamagishi, Tomi Kinnunen
Thanks to the growing availability of spoofing databases and rapid advances in using them, systems for detecting voice spoofing attacks are becoming more and more capable, and error rates close to zero are being reached for the ASVspoof2015 database.