Audio

Target Speaker Extraction

9 papers with code • 0 benchmarks • 0 datasets

Extract the dialogue content of the specified target in a multi-person dialogue.

Benchmarks

Add a Result

These leaderboards are used to track progress in Target Speaker Extraction

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Most implemented papers

Most implemented Social Latest No code

GPU-accelerated Guided Source Separation for Meeting Transcription

desh2608/gss • 10 Dec 2022

In this paper, we describe our improved implementation of GSS that leverages the power of modern GPU-based pipelines, including batched processing of frequencies and segments, to provide 300x speed-up over CPU-based inference.

Paper
Code

Muse: Multi-modal target speaker extraction with visual cues

lin9x/av-sepformer • • 15 Oct 2020

Speaker extraction algorithm relies on the speech sample from the target speaker as the reference point to focus its attention.

Paper
Code

Target Speaker Verification with Selective Auditory Attention for Single and Multi-talker Speech

xuchenglin28/speaker_extraction • • 30 Mar 2021

Inspired by the study on target speaker extraction, e. g., SpEx, we propose a unified speaker verification framework for both single- and multi-talker speech, that is able to pay selective auditory attention to the target speaker.

Paper
Code

Selective Listening by Synchronizing Speech with Lips

zexupan/reentry • • 14 Jun 2021

A speaker extraction algorithm seeks to extract the speech of a target speaker from a multi-talker speech mixture when given a cue that represents the target speaker, such as a pre-enrolled speech utterance, or an accompanying video track.

Paper
Code

L-SpEx: Localized Target Speaker Extraction

gemengtju/l-spex • • 21 Feb 2022

Speaker extraction aims to extract the target speaker's voice from a multi-talker speech mixture given an auxiliary reference utterance.

Paper
Code

A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction

zexupan/avse_hybrid_loss • • 31 Mar 2022

We propose a hybrid continuity loss function for time-domain speaker extraction algorithms to settle the over-suppression problem.

Paper
Code

ImagineNET: Target Speaker Extraction with Intermittent Visual Cue through Embedding Inpainting

zexupan/imaginenet • • 31 Oct 2022

In this paper, we study the audio-visual speaker extraction algorithms with intermittent visual cue.

Paper
Code

RTFS-Net: Recurrent Time-Frequency Modelling for Efficient Audio-Visual Speech Separation

spkgyk/RTFS-Net • • 29 Sep 2023

This is the first time-frequency domain audio-visual speech separation method to outperform all contemporary time-domain counterparts.

Paper
Code

Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction

haoxiangsnr/llm-tse • 11 Oct 2023

However, the effectiveness of these models is hindered in real-world scenarios due to the unreliable or even absence of pre-registered cues.

Paper
Code

Target Speaker Extraction

Benchmarks Add a Result

Most implemented papers

Content

Benchmarks

Add a Result