Speaker Identification

61 papers with code • 4 benchmarks • 4 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Word-level Embeddings for Cross-Task Transfer Learning in Speech Processing

bepierre/SpeechVGG 22 Oct 2019

Recent breakthroughs in deep learning often rely on representation learning and knowledge transfer.

Generative Pre-Training for Speech with Autoregressive Predictive Coding

iamyuanchung/Autoregressive-Predictive-Coding 23 Oct 2019

Learning meaningful and general representations from unannotated speech that are applicable to a wide range of tasks remains challenging.

Speech2Phone: A Novel and Efficient Method for Training Speaker Recognition Models

Edresson/Speech2Phone 25 Feb 2020

We compare the three best architectures trained using our method to select the best one, which is the one with a shallow architecture.

Contrastive Learning of General-Purpose Audio Representations

google-research/google-research 21 Oct 2020

We introduce COLA, a self-supervised pre-training approach for learning a general-purpose representation of audio.

FoolHD: Fooling speaker identification by Highly imperceptible adversarial Disturbances

fsept11/FoolHD 17 Nov 2020

Speaker identification models are vulnerable to carefully designed adversarial perturbations of their input signals that induce misclassification.

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations

facebookresearch/speech-resynthesis 1 Apr 2021

We propose using self-supervised discrete representations for the task of speech resynthesis.

SSAST: Self-Supervised Audio Spectrogram Transformer

YuanGongND/ssast 19 Oct 2021

However, pure Transformer models tend to require more training data compared to CNNs, and the success of the AST relies on supervised pretraining that requires a large amount of labeled data and a complex training pipeline, thus limiting the practical usage of AST.

A Generative Product-of-Filters Model of Audio

dawenl/pof 20 Dec 2013

We propose the product-of-filters (PoF) model, a generative model that decomposes audio spectra as sparse linear combinations of "filters" in the log-spectral domain.

A domain-agnostic approach for opinion prediction on speech

UKPLab/coling-peoples2016-opinion-prediction WS 2016

We explore a domain-agnostic approach for analyzing speech with the goal of opinion prediction.