Search Results for author: Md Sahidullah

Found 58 papers, 7 papers with code

Self-Tuning Spectral Clustering for Speaker Diarization

no code implementations16 Sep 2024 Nikhil Raghav, Avisek Gupta, Md Sahidullah, Swagatam Das

Spectral clustering has proven effective in grouping speech representations for speaker diarization tasks, although post-processing the affinity matrix remains difficult due to the need for careful tuning before constructing the Laplacian.

TCG CREST System Description for the Second DISPLACE Challenge

no code implementations16 Sep 2024 Nikhil Raghav, Subhajit Saha, Md Sahidullah, Swagatam Das

In this report, we describe the speaker diarization (SD) and language diarization (LD) systems developed by our team for the Second DISPLACE Challenge, 2024.

Graph Neural Networks for Parkinsons Disease Detection

no code implementations12 Sep 2024 Shakeel A. Sheikh, Yacouba Kaloga, Md Sahidullah, Ina Kodrasi

Additionally, not all speech segments from PD patients exhibit clear dysarthric symptoms, introducing label noise that can negatively affect the performance and generalizability of current approaches.

Beyond Silence: Bias Analysis through Loss and Asymmetric Approach in Audio Anti-Spoofing

no code implementations25 Jun 2024 Hye-jin Shim, Md Sahidullah, Jee-weon Jung, Shinji Watanabe, Tomi Kinnunen

Our investigations highlight the significant differences in training dynamics between the two classes, emphasizing the need for future research to focus on robust modeling of the bonafide class.

Assessing the Robustness of Spectral Clustering for Deep Speaker Diarization

no code implementations21 Mar 2024 Nikhil Raghav, Md Sahidullah

Clustering speaker embeddings is crucial in speaker diarization but hasn't received as much focus as other components.

Clustering speaker-diarization +1

Exploring Green AI for Audio Deepfake Detection

1 code implementation21 Mar 2024 Subhajit Saha, Md Sahidullah, Swagatam Das

In contrast to existing methods that fine-tune SSL models and employ additional deep neural networks for downstream tasks, we exploit classical machine learning algorithms such as logistic regression and shallow neural networks using the SSL embeddings extracted using the pre-trained model.

Audio Deepfake Detection DeepFake Detection +3

ChildAugment: Data Augmentation Methods for Zero-Resource Children's Speaker Verification

1 code implementation23 Feb 2024 Vishwanath Pratap Singh, Md Sahidullah, Tomi Kinnunen

One promising approach is to align vocal-tract parameters between adults and children through children-specific data augmentation, referred here to as ChildAugment.

Data Augmentation Speaker Verification

Generalizing Speaker Verification for Spoof Awareness in the Embedding Space

no code implementations20 Jan 2024 Xuechen Liu, Md Sahidullah, Kong Aik Lee, Tomi Kinnunen

To this end, we propose to generalize the standalone ASV (G-SASV) against spoofing attacks, where we leverage limited training data from CM to enhance a simple backend in the embedding space, without the involvement of a separate CM module during the test (authentication) phase.

Domain Adaptation Speaker Verification

Speaker Verification Across Ages: Investigating Deep Speaker Embedding Sensitivity to Age Mismatch in Enrollment and Test Speech

no code implementations13 Jun 2023 Vishwanath Pratap Singh, Md Sahidullah, Tomi Kinnunen

The first dataset, used for addressing short-term ageing (up to 10 years time difference between enrollment and test) under uncontrolled conditions, is VoxCeleb.

Speaker Verification

Stuttering Detection Using Speaker Representations and Self-supervised Contextual Embeddings

no code implementations1 Jun 2023 Shakeel A. Sheikh, Md Sahidullah, Fabrice Hirsch, Slim Ouni

The adoption of advanced deep learning architectures in stuttering detection (SD) tasks is challenging due to the limited size of the available datasets.

How to Construct Perfect and Worse-than-Coin-Flip Spoofing Countermeasures: A Word of Warning on Shortcut Learning

no code implementations31 May 2023 Hye-jin Shim, Rosa González Hautamäki, Md Sahidullah, Tomi Kinnunen

Shortcut learning, or `Clever Hans effect` refers to situations where a learning agent (e. g., deep neural networks) learns spurious correlations present in data, resulting in biased models.

Towards single integrated spoofing-aware speaker verification embeddings

1 code implementation30 May 2023 Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang, Xuechen Liu, Md Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung

Second, competitive performance should be demonstrated compared to the fusion of automatic speaker verification (ASV) and countermeasure (CM) embeddings, which outperformed single embedding solutions by a large margin in the SASV2022 challenge.

Speaker Verification

Distilling Multi-Level X-vector Knowledge for Small-footprint Speaker Verification

no code implementations2 Mar 2023 Xuechen Liu, Md Sahidullah, Tomi Kinnunen

Even though deep speaker models have demonstrated impressive accuracy in speaker verification tasks, this often comes at the expense of increased model size and computation time, presenting challenges for deployment in resource-constrained environments.

Knowledge Distillation Speaker Verification

Advancing Stuttering Detection via Data Augmentation, Class-Balanced Loss and Multi-Contextual Deep Learning

no code implementations21 Feb 2023 Shakeel A. Sheikh, Md Sahidullah, Fabrice Hirsch, Slim Ouni

In addition, we propose a multi-contextual (MC) StutterNet, which exploits different contexts of the stuttered speech, resulting in an overall improvement of 4. 48% in F 1 over the single context based MB StutterNet.

Data Augmentation

Cross-Corpora Spoken Language Identification with Domain Diversification and Generalization

no code implementations10 Feb 2023 Spandan Dey, Md Sahidullah, Goutam Saha

Our experiments demonstrate that the proposed domain diversification is more promising over commonly used simple augmentation methods.

Data Augmentation Domain Generalization +2

Modulation spectral features for speech emotion recognition using deep neural networks

no code implementations14 Jan 2023 Premjeet Singh, Md Sahidullah, Goutam Saha

This work explores the use of constant-Q transform based modulation spectral features (CQT-MSF) for speech emotion recognition (SER).

Speech Emotion Recognition

An Overview of Indian Spoken Language Recognition from Machine Learning Perspective

no code implementations30 Nov 2022 Spandan Dey, Md Sahidullah, Goutam Saha

In this work, we have conducted one of the very first attempts to present a comprehensive review of the Indian spoken language recognition research field.

Language Identification Spoken language identification

Analysis of constant-Q filterbank based representations for speech emotion recognition

no code implementations29 Nov 2022 Premjeet Singh, Shefali Waldekar, Md Sahidullah, Goutam Saha

This work analyzes the constant-Q filterbank-based time-frequency representations for speech emotion recognition (SER).

Speech Emotion Recognition

End-to-End and Self-Supervised Learning for ComParE 2022 Stuttering Sub-Challenge

no code implementations20 Jul 2022 Shakeel Ahmad Sheikh, Md Sahidullah, Fabrice Hirsch, Slim Ouni

In this paper, we present end-to-end and speech embedding based systems trained in a self-supervised fashion to participate in the ACM Multimedia 2022 ComParE Challenge, specifically the stuttering sub-challenge.

Self-Supervised Learning

Baselines and Protocols for Household Speaker Recognition

1 code implementation30 Apr 2022 Alexey Sholokhov, Xuechen Liu, Md Sahidullah, Tomi Kinnunen

Speaker recognition on household devices, such as smart speakers, features several challenges: (i) robustness across a vast number of heterogeneous domains (households), (ii) short utterances, (iii) possibly absent speaker labels of the enrollment data (passive enrollment), and (iv) presence of unknown persons (guests).

Speaker Recognition

Robust Stuttering Detection via Multi-task and Adversarial Learning

no code implementations4 Apr 2022 Shakeel Ahmad Sheikh, Md Sahidullah, Fabrice Hirsch, Slim Ouni

By automatic detection and identification of stuttering, speech pathologists can track the progression of disfluencies of persons who stutter (PWS).

Introducing ECAPA-TDNN and Wav2Vec2.0 Embeddings to Stuttering Detection

no code implementations4 Apr 2022 Shakeel Ahmad Sheikh, Md Sahidullah, Fabrice Hirsch, Slim Ouni

The adoption of advanced deep learning (DL) architecture in stuttering detection (SD) tasks is challenging due to the limited size of the available datasets.

Spoofing-Aware Speaker Verification with Unsupervised Domain Adaptation

no code implementations21 Mar 2022 Xuechen Liu, Md Sahidullah, Tomi Kinnunen

In this paper, we initiate the concern of enhancing the spoofing robustness of the automatic speaker verification (ASV) system, without the primary presence of a separate countermeasure module.

Speaker Verification Unsupervised Domain Adaptation

Learnable Nonlinear Compression for Robust Speaker Verification

no code implementations10 Feb 2022 Xuechen Liu, Md Sahidullah, Tomi Kinnunen

We consider different kinds of channel-dependent (CD) nonlinear compression methods optimized in a data-driven manner.

Speaker Verification

Optimizing Multi-Taper Features for Deep Speaker Verification

no code implementations21 Oct 2021 Xuechen Liu, Md Sahidullah, Tomi Kinnunen

Multi-taper estimators provide low-variance power spectrum estimates that can be used in place of the windowed discrete Fourier transform (DFT) to extract speech features such as mel-frequency cepstral coefficients (MFCCs).

Open-Ended Question Answering Speaker Verification

Parameterized Channel Normalization for Far-field Deep Speaker Verification

no code implementations24 Sep 2021 Xuechen Liu, Md Sahidullah, Tomi Kinnunen

We address far-field speaker verification with deep neural network (DNN) based speaker embedding extractor, where mismatch between enrollment and test data often comes from convolutive effects (e. g. room reverberation) and noise.

Speaker Verification

Optimized Power Normalized Cepstral Coefficients towards Robust Deep Speaker Verification

no code implementations24 Sep 2021 Xuechen Liu, Md Sahidullah, Tomi Kinnunen

After their introduction to robust speech recognition, power normalized cepstral coefficient (PNCC) features were successfully adopted to other tasks, including speaker verification.

Robust Speech Recognition Speaker Verification +1

ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection

no code implementations1 Sep 2021 Junichi Yamagishi, Xin Wang, Massimiliano Todisco, Md Sahidullah, Jose Patino, Andreas Nautsch, Xuechen Liu, Kong Aik Lee, Tomi Kinnunen, Nicholas Evans, Héctor Delgado

In addition to a continued focus upon logical and physical access tasks in which there are a number of advances compared to previous editions, ASVspoof 2021 introduces a new task involving deepfake speech detection.

Face Swapping Speaker Verification

ASVspoof 2021: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan

1 code implementation1 Sep 2021 Héctor Delgado, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Xuechen Liu, Andreas Nautsch, Jose Patino, Md Sahidullah, Massimiliano Todisco, Xin Wang, Junichi Yamagishi

The automatic speaker verification spoofing and countermeasures (ASVspoof) challenge series is a community-led initiative which aims to promote the consideration of spoofing and the development of countermeasures.

Face Swapping Speaker Verification

Machine Learning for Stuttering Identification: Review, Challenges and Future Directions

no code implementations8 Jul 2021 Shakeel Ahmad Sheikh, Md Sahidullah, Fabrice Hirsch, Slim Ouni

Stuttering is a speech disorder during which the flow of speech is interrupted by involuntary pauses and repetition of sounds.

BIG-bench Machine Learning

Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing

1 code implementation11 Jun 2021 Tomi Kinnunen, Andreas Nautsch, Md Sahidullah, Nicholas Evans, Xin Wang, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee

Whether it be for results summarization, or the analysis of classifier fusion, some means to compare different classifiers can often provide illuminating insight into their behaviour, (dis)similarity or complementarity.

Speaker Verification Voice Anti-spoofing

Deep scattering network for speech emotion recognition

no code implementations11 May 2021 Premjeet Singh, Goutam Saha, Md Sahidullah

We also investigate layer-wise scattering coefficients to analyse the importance of time shift and deformation stable scalogram and modulation spectrum coefficients for SER.

Speech Emotion Recognition

Cross-Corpora Language Recognition: A Preliminary Investigation with Indian Languages

no code implementations10 May 2021 Spandan Dey, Goutam Saha, Md Sahidullah

In this paper, we conduct one of the very first studies for cross-corpora performance evaluation in the spoken language identification (LID) problem.

Language Identification Spoken language identification

Data Quality as Predictor of Voice Anti-Spoofing Generalization

no code implementations26 Mar 2021 Bhusan Chettri, Rosa González Hautamäki, Md Sahidullah, Tomi Kinnunen

Voice anti-spoofing aims at classifying a given utterance either as a bonafide human sample, or a spoofing attack (e. g. synthetic or replayed sample).

Voice Anti-spoofing

Learnable MFCCs for Speaker Verification

no code implementations20 Feb 2021 Xuechen Liu, Md Sahidullah, Tomi Kinnunen

We propose a learnable mel-frequency cepstral coefficient (MFCC) frontend architecture for deep neural network (DNN) based automatic speaker verification.

Speaker Verification

Data Generation Using Pass-phrase-dependent Deep Auto-encoders for Text-Dependent Speaker Verification

no code implementations3 Feb 2021 Achintya Kumar Sarkar, Md Sahidullah, Zheng-Hua Tan

In this paper, we propose a novel method that trains pass-phrase specific deep neural network (PP-DNN) based auto-encoders for creating augmented data for text-dependent speaker verification (TD-SV).

Decision Making Text-Dependent Speaker Verification +1

Domain-Dependent Speaker Diarization for the Third DIHARD Challenge

no code implementations25 Jan 2021 A Kishore Kumar, Shefali Waldekar, Goutam Saha, Md Sahidullah

This report presents the system developed by the ABSP Laboratory team for the third DIHARD speech diarization challenge.

Clustering Dimensionality Reduction +2

A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings

no code implementations30 Jul 2020 Xuechen Liu, Md Sahidullah, Tomi Kinnunen

Modern automatic speaker verification relies largely on deep neural networks (DNNs) trained on mel-frequency cepstral coefficient (MFCC) features.

Speaker Verification

UIAI System for Short-Duration Speaker Verification Challenge 2020

no code implementations26 Jul 2020 Md Sahidullah, Achintya Kumar Sarkar, Ville Vestman, Xuechen Liu, Romain Serizel, Tomi Kinnunen, Zheng-Hua Tan, Emmanuel Vincent

Our primary submission to the challenge is the fusion of seven subsystems which yields a normalized minimum detection cost function (minDCF) of 0. 072 and an equal error rate (EER) of 2. 14% on the evaluation set.

Text-Dependent Speaker Verification

Optimization of data-driven filterbank for automatic speaker verification

no code implementations21 Jul 2020 Susanta Sarangi, Md Sahidullah, Goutam Saha

Then, we propose a new method for computing the filter frequency responses by using principal component analysis (PCA).

Speaker Verification

Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals

no code implementations12 Jul 2020 Tomi Kinnunen, Héctor Delgado, Nicholas Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang, Md Sahidullah, Junichi Yamagishi, Douglas A. Reynolds

Recent years have seen growing efforts to develop spoofing countermeasures (CMs) to protect automatic speaker verification (ASV) systems from being deceived by manipulated or artificial inputs.

Speaker Verification

Voice Mimicry Attacks Assisted by Automatic Speaker Verification

no code implementations3 Jun 2019 Ville Vestman, Tomi Kinnunen, Rosa González Hautamäki, Md Sahidullah

Our goal is to gain insights how well similarity rankings transfer from the attacker's ASV system to the attacked ASV system, whether the attackers are able to improve their attacks by mimicking, and how the properties of the voices of attackers change due to mimicking.

Speaker Verification

Quality Measures for Speaker Verification with Short Utterances

no code implementations29 Jan 2019 Arnab Poddar, Md Sahidullah, Goutam Saha

We have used the proposed quality measures as side information for combining ASV systems based on Gaussian mixture model-universal background model (GMM-UBM) and i-vector.

Speaker Recognition Speaker Verification

Introduction to Voice Presentation Attack Detection and Recent Advances

no code implementations4 Jan 2019 Md Sahidullah, Hector Delgado, Massimiliano Todisco, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, Kong-Aik Lee

Over the past few years significant progress has been made in the field of presentation attack detection (PAD) for automatic speaker recognition (ASV).

Benchmarking Speaker Recognition

Novel Quality Metric for Duration Variability Compensation in Speaker Verification using i-Vectors

no code implementations3 Dec 2018 Arnab Poddar, Md Sahidullah, Goutam Saha

In experiments with the NIST SRE 2008 corpus, We have shown that inclusion of proposed quality metric exhibits considerable improvement in speaker verification performance.

Speaker Verification

t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification

1 code implementation25 Apr 2018 Tomi Kinnunen, Kong Aik Lee, Hector Delgado, Nicholas Evans, Massimiliano Todisco, Md Sahidullah, Junichi Yamagishi, Douglas A. Reynolds

The two challenge editions in 2015 and 2017 involved the assessment of spoofing countermeasures (CMs) in isolation from ASV using an equal error rate (EER) metric.

Speaker Verification

Robustness of Voice Conversion Techniques Under Mismatched Conditions

no code implementations22 Dec 2016 Monisankha Pal, Dipjyoti Paul, Md Sahidullah, Goutam Saha

Most of the existing studies on voice conversion (VC) are conducted in acoustically matched conditions between source and target signal.

Speech Enhancement Voice Conversion

Cannot find the paper you are looking for? You can Submit a new open access paper.