Text-Independent Speaker Verification
17 papers with code • 0 benchmarks • 0 datasets
Benchmarks
These leaderboards are used to track progress in Text-Independent Speaker Verification
Libraries
Use these libraries to find Text-Independent Speaker Verification models and implementationsLatest papers
DS-TDNN: Dual-stream Time-delay Neural Network with Global-aware Filter for Speaker Verification
To effectively leverage the long-term dependencies of audio signals and constrain model complexity, we introduce a novel module called Global-aware Filter layer (GF layer) in this work, which employs a set of learnable transform-domain filters between a 1D discrete Fourier transform and its inverse transform to capture global context.
Cross-modal Audio-visual Co-learning for Text-independent Speaker Verification
Visual speech (i. e., lip motion) is highly related to auditory speech due to the co-occurrence and synchronization in speech production.
The effect of speech pathology on automatic speaker verification -- a large-scale study
Navigating the challenges of data-driven speech processing, one of the primary hurdles is accessing reliable pathological speech data.
Decomposed Temporal Dynamic CNN: Efficient Time-Adaptive Network for Text-Independent Speaker Verification Explained with Speaker Activation Map
To extract accurate speaker information for text-independent speaker verification, temporal dynamic CNNs (TDY-CNNs) adapting kernels to each time bin was proposed.
FilterAugment: An Acoustic Environmental Data Augmentation Method
Thus, training acoustic models for audio and speech tasks requires regularization on various acoustic environments in order to achieve robust performance in real life applications.
Temporal Dynamic Convolutional Neural Network for Text-Independent Speaker Verification and Phonemetic Analysis
The temporal dynamic model adapts itself to phonemes without explicitly given phoneme information during training, and results show the necessity to consider phoneme variation within utterances for more accurate and robust text-independent speaker verification.
FDN: Finite Difference Network with Hierarchical Convolutional Features for Text-independent Speaker Verification
For example, RawNet and RawNet2 extracted speaker's feature embeddings from waveforms automatically for recognizing their voice, which can vastly reduce the front-end computation and obtain state-of-the-art performance.
Self-supervised Text-independent Speaker Verification using Prototypical Momentum Contrastive Learning
First, we examine a simple contrastive learning approach (SimCLR) with a momentum contrastive (MoCo) learning framework, where the MoCo speaker embedding system utilizes a queue to maintain a large set of negative examples.
Masked Proxy Loss For Text-Independent Speaker Verification
We further propose Multinomial Masked Proxy (MMP) loss to leverage the hardness of speaker pairs.
Y-Vector: Multiscale Waveform Encoder for Speaker Embedding
State-of-the-art text-independent speaker verification systems typically use cepstral features or filter bank energies as speech features.