Search Results for author: Zexu Pan

Found 17 papers, 10 papers with code

NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization

1 code implementation • 27 Feb 2024 • Yoshiki Masuyama, Gordon Wichern, François G. Germain, Zexu Pan, Sameer Khurana, Chiori Hori, Jonathan Le Roux

Existing NF-based methods focused on estimating the magnitude of the HRTF from a given sound source direction, and the magnitude is converted to a finite impulse response (FIR) filter.

Spatial Interpolation

Paper
Code

NeuroHeed+: Improving Neuro-steered Speaker Extraction with Joint Auditory Attention Detection

no code implementations • 12 Dec 2023 • Zexu Pan, Gordon Wichern, Francois G. Germain, Sameer Khurana, Jonathan Le Roux

Neuro-steered speaker extraction aims to extract the listener's brain-attended speech signal from a multi-talker speech signal, in which the attention is derived from the cortical activity.

EEG

Paper
Add Code

Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction

no code implementations • 30 Oct 2023 • Zexu Pan, Gordon Wichern, Yoshiki Masuyama, Francois G. Germain, Sameer Khurana, Chiori Hori, Jonathan Le Roux

Target speech extraction aims to extract, based on a given conditioning cue, a target speech signal that is corrupted by interfering sources, such as noise or competing speakers.

Speaker Separation Speech Enhancement +1

Paper
Add Code

LocSelect: Target Speaker Localization with an Auditory Selective Hearing Mechanism

no code implementations • 16 Oct 2023 • Yu Chen, Xinyuan Qian, Zexu Pan, Kainan Chen, Haizhou Li

The prevailing noise-resistant and reverberation-resistant localization algorithms primarily emphasize separating and providing directional output for each speaker in multi-speaker scenarios, without association with the identity of speakers.

Paper
Add Code

Generation or Replication: Auscultating Audio Latent Diffusion Models

no code implementations • 16 Oct 2023 • Dimitrios Bralios, Gordon Wichern, François G. Germain, Zexu Pan, Sameer Khurana, Chiori Hori, Jonathan Le Roux

The introduction of audio latent diffusion models possessing the ability to generate realistic sound clips on demand from a text description has the potential to revolutionize how we work with audio.

AudioCaps Memorization +1

Paper
Add Code

NeuroHeed: Neuro-Steered Speaker Extraction using EEG Signals

no code implementations • 26 Jul 2023 • Zexu Pan, Marvin Borsdorf, Siqi Cai, Tanja Schultz, Haizhou Li

We propose both an offline and an online NeuroHeed, with the latter designed for real-time inference.

EEG

Paper
Add Code

Target Active Speaker Detection with Audio-visual Cues

1 code implementation • 22 May 2023 • Yidi Jiang, Ruijie Tao, Zexu Pan, Haizhou Li

To benefit from both facial cue and reference speech, we propose the Target Speaker TalkNet (TS-TalkNet), which leverages a pre-enrolled speaker embedding to complement the audio-visual synchronization cue in detecting whether the target speaker is speaking.

Audio-Visual Synchronization

Paper
Code

Late Audio-Visual Fusion for In-The-Wild Speaker Diarization

no code implementations • 2 Nov 2022 • Zexu Pan, Gordon Wichern, François G. Germain, Aswin Subramanian, Jonathan Le Roux

Speaker diarization is well studied for constrained audios but little explored for challenging in-the-wild videos, which have more speakers, shorter utterances, and inconsistent on-screen speakers.

speaker-diarization Speaker Diarization +1

Paper
Add Code

ImagineNET: Target Speaker Extraction with Intermittent Visual Cue through Embedding Inpainting

1 code implementation • 31 Oct 2022 • Zexu Pan, Wupeng Wang, Marvin Borsdorf, Haizhou Li

In this paper, we study the audio-visual speaker extraction algorithms with intermittent visual cue.

Target Speaker Extraction

Paper
Code

VCSE: Time-Domain Visual-Contextual Speaker Extraction Network

no code implementations • 9 Oct 2022 • Junjie Li, Meng Ge, Zexu Pan, Longbiao Wang, Jianwu Dang

In the first stage, we pre-extract a target speech with visual cues and estimate the underlying phonetic sequence.

Lip Reading

Paper
Add Code

A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction

1 code implementation • 31 Mar 2022 • Zexu Pan, Meng Ge, Haizhou Li

We propose a hybrid continuity loss function for time-domain speaker extraction algorithms to settle the over-suppression problem.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Speaker Extraction with Co-Speech Gestures Cue

1 code implementation • 31 Mar 2022 • Zexu Pan, Xinyuan Qian, Haizhou Li

Speaker extraction seeks to extract the clean speech of a target speaker from a multi-talker mixture speech.

Speech Separation

Paper
Code

USEV: Universal Speaker Extraction with Visual Cue

1 code implementation • 30 Sep 2021 • Zexu Pan, Meng Ge, Haizhou Li

The speaker extraction algorithm requires an auxiliary reference, such as a video recording or a pre-recorded speech, to form top-down auditory attention on the target speaker.

Paper
Code

Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection

4 code implementations • 14 Jul 2021 • Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li

Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or more speakers.

Audio-Visual Active Speaker Detection

254

Paper
Code

Selective Listening by Synchronizing Speech with Lips

1 code implementation • 14 Jun 2021 • Zexu Pan, Ruijie Tao, Chenglin Xu, Haizhou Li

A speaker extraction algorithm seeks to extract the speech of a target speaker from a multi-talker speech mixture when given a cue that represents the target speaker, such as a pre-enrolled speech utterance, or an accompanying video track.

Lip Reading Target Speaker Extraction