Search Results for author: Tomi Kinnunen

Found 53 papers, 16 papers with code

Beyond Silence: Bias Analysis through Loss and Asymmetric Approach in Audio Anti-Spoofing

no code implementations25 Jun 2024 Hye-jin Shim, Md Sahidullah, Jee-weon Jung, Shinji Watanabe, Tomi Kinnunen

Our investigations highlight the significant differences in training dynamics between the two classes, emphasizing the need for future research to focus on robust modeling of the bonafide class.

Revisiting and Improving Scoring Fusion for Spoofing-aware Speaker Verification Using Compositional Data Analysis

1 code implementation16 Jun 2024 Xin Wang, Tomi Kinnunen, Kong Aik Lee, Paul-Gauthier Noé, Junichi Yamagishi

The outcomes of these findings, namely, the score calibration before fusion, improved linear fusion, and better non-linear fusion, were found to be effective on the SASV challenge database.

Speaker Verification

ChildAugment: Data Augmentation Methods for Zero-Resource Children's Speaker Verification

1 code implementation23 Feb 2024 Vishwanath Pratap Singh, Md Sahidullah, Tomi Kinnunen

One promising approach is to align vocal-tract parameters between adults and children through children-specific data augmentation, referred here to as ChildAugment.

Data Augmentation Speaker Verification

Generalizing Speaker Verification for Spoof Awareness in the Embedding Space

no code implementations20 Jan 2024 Xuechen Liu, Md Sahidullah, Kong Aik Lee, Tomi Kinnunen

To this end, we propose to generalize the standalone ASV (G-SASV) against spoofing attacks, where we leverage limited training data from CM to enhance a simple backend in the embedding space, without the involvement of a separate CM module during the test (authentication) phase.

Domain Adaptation Speaker Verification

t-EER: Parameter-Free Tandem Evaluation of Countermeasures and Biometric Comparators

1 code implementation21 Sep 2023 Tomi Kinnunen, Kong Aik Lee, Hemlata Tak, Nicholas Evans, Andreas Nautsch

The proposed approach is a strong candidate metric for the tandem evaluation of PAD systems and biometric comparators.

Speaker Verification Across Ages: Investigating Deep Speaker Embedding Sensitivity to Age Mismatch in Enrollment and Test Speech

no code implementations13 Jun 2023 Vishwanath Pratap Singh, Md Sahidullah, Tomi Kinnunen

The first dataset, used for addressing short-term ageing (up to 10 years time difference between enrollment and test) under uncontrolled conditions, is VoxCeleb.

Speaker Verification

Multi-Dataset Co-Training with Sharpness-Aware Optimization for Audio Anti-spoofing

1 code implementation31 May 2023 Hye-jin Shim, Jee-weon Jung, Tomi Kinnunen

Audio anti-spoofing for automatic speaker verification aims to safeguard users' identities from spoofing attacks.

Speaker Verification

How to Construct Perfect and Worse-than-Coin-Flip Spoofing Countermeasures: A Word of Warning on Shortcut Learning

no code implementations31 May 2023 Hye-jin Shim, Rosa González Hautamäki, Md Sahidullah, Tomi Kinnunen

Shortcut learning, or `Clever Hans effect` refers to situations where a learning agent (e. g., deep neural networks) learns spurious correlations present in data, resulting in biased models.

Towards single integrated spoofing-aware speaker verification embeddings

1 code implementation30 May 2023 Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang, Xuechen Liu, Md Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung

Second, competitive performance should be demonstrated compared to the fusion of automatic speaker verification (ASV) and countermeasure (CM) embeddings, which outperformed single embedding solutions by a large margin in the SASV2022 challenge.

Speaker Verification

Distilling Multi-Level X-vector Knowledge for Small-footprint Speaker Verification

no code implementations2 Mar 2023 Xuechen Liu, Md Sahidullah, Tomi Kinnunen

Even though deep speaker models have demonstrated impressive accuracy in speaker verification tasks, this often comes at the expense of increased model size and computation time, presenting challenges for deployment in resource-constrained environments.

Knowledge Distillation Speaker Verification

Learnable Frontends that do not Learn: Quantifying Sensitivity to Filterbank Initialisation

no code implementations20 Feb 2023 Mark Anderson, Tomi Kinnunen, Naomi Harte

We show that although performance is overall improved, the filterbanks exhibit strong sensitivity to their initialisation strategy.

Action Detection Activity Detection

GAN-Aimbots: Using Machine Learning for Cheating in First Person Shooters

1 code implementation14 May 2022 Anssi Kanervisto, Tomi Kinnunen, Ville Hautamäki

Playing games with cheaters is not fun, and in a multi-billion-dollar video game industry with hundreds of millions of players, game developers aim to improve the security and, consequently, the user experience of their games by preventing cheating.

BIG-bench Machine Learning

Baselines and Protocols for Household Speaker Recognition

1 code implementation30 Apr 2022 Alexey Sholokhov, Xuechen Liu, Md Sahidullah, Tomi Kinnunen

Speaker recognition on household devices, such as smart speakers, features several challenges: (i) robustness across a vast number of heterogeneous domains (households), (ii) short utterances, (iii) possibly absent speaker labels of the enrollment data (passive enrollment), and (iv) presence of unknown persons (guests).

Speaker Recognition

Improving speaker de-identification with functional data analysis of f0 trajectories

1 code implementation31 Mar 2022 Lauri Tavi, Tomi Kinnunen, Rosa González Hautamäki

Due to a constantly increasing amount of speech data that is stored in different types of databases, voice privacy has become a major concern.

De-identification

SASV 2022: The First Spoofing-Aware Speaker Verification Challenge

no code implementations28 Mar 2022 Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas Evans, Tomi Kinnunen

Pre-trained spoofing detection and speaker verification models are provided as open source and are used in two baseline SASV solutions.

Speaker Verification

Spoofing-Aware Speaker Verification with Unsupervised Domain Adaptation

no code implementations21 Mar 2022 Xuechen Liu, Md Sahidullah, Tomi Kinnunen

In this paper, we initiate the concern of enhancing the spoofing robustness of the automatic speaker verification (ASV) system, without the primary presence of a separate countermeasure module.

Speaker Verification Unsupervised Domain Adaptation

Learnable Nonlinear Compression for Robust Speaker Verification

no code implementations10 Feb 2022 Xuechen Liu, Md Sahidullah, Tomi Kinnunen

We consider different kinds of channel-dependent (CD) nonlinear compression methods optimized in a data-driven manner.

Speaker Verification

Optimizing Tandem Speaker Verification and Anti-Spoofing Systems

no code implementations24 Jan 2022 Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen, Junichi Yamagishi

As automatic speaker verification (ASV) systems are vulnerable to spoofing attacks, they are typically used in conjunction with spoofing countermeasure (CM) systems to improve security.

Speaker Verification

Optimizing Multi-Taper Features for Deep Speaker Verification

no code implementations21 Oct 2021 Xuechen Liu, Md Sahidullah, Tomi Kinnunen

Multi-taper estimators provide low-variance power spectrum estimates that can be used in place of the windowed discrete Fourier transform (DFT) to extract speech features such as mel-frequency cepstral coefficients (MFCCs).

Open-Ended Question Answering Speaker Verification

Parameterized Channel Normalization for Far-field Deep Speaker Verification

no code implementations24 Sep 2021 Xuechen Liu, Md Sahidullah, Tomi Kinnunen

We address far-field speaker verification with deep neural network (DNN) based speaker embedding extractor, where mismatch between enrollment and test data often comes from convolutive effects (e. g. room reverberation) and noise.

Speaker Verification

Optimized Power Normalized Cepstral Coefficients towards Robust Deep Speaker Verification

no code implementations24 Sep 2021 Xuechen Liu, Md Sahidullah, Tomi Kinnunen

After their introduction to robust speech recognition, power normalized cepstral coefficient (PNCC) features were successfully adopted to other tasks, including speaker verification.

Robust Speech Recognition Speaker Verification +1

ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection

no code implementations1 Sep 2021 Junichi Yamagishi, Xin Wang, Massimiliano Todisco, Md Sahidullah, Jose Patino, Andreas Nautsch, Xuechen Liu, Kong Aik Lee, Tomi Kinnunen, Nicholas Evans, Héctor Delgado

In addition to a continued focus upon logical and physical access tasks in which there are a number of advances compared to previous editions, ASVspoof 2021 introduces a new task involving deepfake speech detection.

Face Swapping Speaker Verification

ASVspoof 2021: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan

1 code implementation1 Sep 2021 Héctor Delgado, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Xuechen Liu, Andreas Nautsch, Jose Patino, Md Sahidullah, Massimiliano Todisco, Xin Wang, Junichi Yamagishi

The automatic speaker verification spoofing and countermeasures (ASVspoof) challenge series is a community-led initiative which aims to promote the consideration of spoofing and the development of countermeasures.

Face Swapping Speaker Verification

Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing

1 code implementation11 Jun 2021 Tomi Kinnunen, Andreas Nautsch, Md Sahidullah, Nicholas Evans, Xin Wang, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee

Whether it be for results summarization, or the analysis of classifier fusion, some means to compare different classifiers can often provide illuminating insight into their behaviour, (dis)similarity or complementarity.

Speaker Verification Voice Anti-spoofing

Data Quality as Predictor of Voice Anti-Spoofing Generalization

no code implementations26 Mar 2021 Bhusan Chettri, Rosa González Hautamäki, Md Sahidullah, Tomi Kinnunen

Voice anti-spoofing aims at classifying a given utterance either as a bonafide human sample, or a spoofing attack (e. g. synthetic or replayed sample).

Voice Anti-spoofing

Learnable MFCCs for Speaker Verification

no code implementations20 Feb 2021 Xuechen Liu, Md Sahidullah, Tomi Kinnunen

We propose a learnable mel-frequency cepstral coefficient (MFCC) frontend architecture for deep neural network (DNN) based automatic speaker verification.

Speaker Verification

General Characterization of Agents by States they Visit

1 code implementation2 Dec 2020 Anssi Kanervisto, Tomi Kinnunen, Ville Hautamäki

Behavioural characterizations (BCs) of decision-making agents, or their policies, are used to study outcomes of training algorithms and as part of the algorithms themselves to encourage unique policies, match expert policy or restrict changes to policy per update.

Imitation Learning

Extrapolating false alarm rates in automatic speaker verification

no code implementations8 Aug 2020 Alexey Sholokhov, Tomi Kinnunen, Ville Vestman, Kong Aik Lee

Automatic speaker verification (ASV) vendors and corpus providers would both benefit from tools to reliably extrapolate performance metrics for large speaker populations without collecting new speakers.

Speaker Verification

A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings

no code implementations30 Jul 2020 Xuechen Liu, Md Sahidullah, Tomi Kinnunen

Modern automatic speaker verification relies largely on deep neural networks (DNNs) trained on mel-frequency cepstral coefficient (MFCC) features.

Speaker Verification

UIAI System for Short-Duration Speaker Verification Challenge 2020

no code implementations26 Jul 2020 Md Sahidullah, Achintya Kumar Sarkar, Ville Vestman, Xuechen Liu, Romain Serizel, Tomi Kinnunen, Zheng-Hua Tan, Emmanuel Vincent

Our primary submission to the challenge is the fusion of seven subsystems which yields a normalized minimum detection cost function (minDCF) of 0. 072 and an equal error rate (EER) of 2. 14% on the evaluation set.

Text-Dependent Speaker Verification

Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals

no code implementations12 Jul 2020 Tomi Kinnunen, Héctor Delgado, Nicholas Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang, Md Sahidullah, Junichi Yamagishi, Douglas A. Reynolds

Recent years have seen growing efforts to develop spoofing countermeasures (CMs) to protect automatic speaker verification (ASV) systems from being deceived by manipulated or artificial inputs.

Speaker Verification

Voice Biometrics Security: Extrapolating False Alarm Rate via Hierarchical Bayesian Modeling of Speaker Verification Scores

no code implementations4 Nov 2019 Alexey Sholokhov, Tomi Kinnunen, Ville Vestman, Kong Aik Lee

We put forward a novel performance assessment framework to address both the inadequacy of the random-impostor evaluation model and the size limitation of evaluation corpora by addressing ASV security against closest impostors on arbitrarily large datasets.

Speaker Verification

Voice Mimicry Attacks Assisted by Automatic Speaker Verification

no code implementations3 Jun 2019 Ville Vestman, Tomi Kinnunen, Rosa González Hautamäki, Md Sahidullah

Our goal is to gain insights how well similarity rankings transfer from the attacker's ASV system to the attacked ASV system, whether the attackers are able to improve their attacks by mimicking, and how the properties of the voices of attackers change due to mimicking.

Speaker Verification

Introduction to Voice Presentation Attack Detection and Recent Advances

no code implementations4 Jan 2019 Md Sahidullah, Hector Delgado, Massimiliano Todisco, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, Kong-Aik Lee

Over the past few years significant progress has been made in the field of presentation attack detection (PAD) for automatic speaker recognition (ASV).

Benchmarking Speaker Recognition

Who Do I Sound Like? Showcasing Speaker Recognition Technology by YouTube Voice Search

1 code implementation8 Nov 2018 Ville Vestman, Bilal Soomro, Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen

The popularization of science can often be disregarded by scientists as it may be challenging to put highly sophisticated research into words that general public can understand.

Audio and Speech Processing Sound

Waveform to Single Sinusoid Regression to Estimate the F0 Contour from Noisy Speech Using Recurrent Deep Neural Networks

no code implementations2 Jul 2018 Akihiro Kato, Tomi Kinnunen

The fundamental frequency (F0) represents pitch in speech that determines prosodic characteristics of speech and is needed in various tasks for speech analysis and synthesis.

A Regression Model of Recurrent Deep Neural Networks for Noise Robust Estimation of the Fundamental Frequency Contour of Speech

no code implementations8 May 2018 Akihiro Kato, Tomi Kinnunen

The latest prior research addresses this problem first as a frame-by-frame-classification problem followed by sequence tracking using deep neural network hidden Markov model (DNN-HMM) hybrid architecture.

Language Identification Speech Synthesis +1

Supervector Compression Strategies to Speed up I-Vector System Development

no code implementations3 May 2018 Ville Vestman, Tomi Kinnunen

The results suggest that, in terms of ASV accuracy, the supervector compression approaches are on a par with FEFA.

Speaker Verification

t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification

1 code implementation25 Apr 2018 Tomi Kinnunen, Kong Aik Lee, Hector Delgado, Nicholas Evans, Massimiliano Todisco, Md Sahidullah, Junichi Yamagishi, Douglas A. Reynolds

The two challenge editions in 2015 and 2017 involved the assessment of spoofing countermeasures (CMs) in isolation from ASV using an equal error rate (EER) metric.

Speaker Verification

A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment

no code implementations23 Apr 2018 Tomi Kinnunen, Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Zhen-Hua Ling

As a supplement to subjective results for the 2018 Voice Conversion Challenge (VCC'18) data, we configure a standard constant-Q cepstral coefficient CM to quantify the extent of processing artifacts.

Benchmarking Speaker Verification +1

The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods

no code implementations12 Apr 2018 Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Tomi Kinnunen, Zhen-Hua Ling

We present the Voice Conversion Challenge 2018, designed as a follow up to the 2016 edition with the aim of providing a common framework for evaluating and comparing different state-of-the-art voice conversion (VC) systems.

Voice Conversion

Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data

no code implementations2 Mar 2018 Jaime Lorenzo-Trueba, Fuming Fang, Xin Wang, Isao Echizen, Junichi Yamagishi, Tomi Kinnunen

Thanks to the growing availability of spoofing databases and rapid advances in using them, systems for detecting voice spoofing attacks are becoming more and more capable, and error rates close to zero are being reached for the ASVspoof2015 database.

Generative Adversarial Network Speech Enhancement +2

Cannot find the paper you are looking for? You can Submit a new open access paper.