Search Results for author: Zexu Pan

Found 17 papers, 10 papers with code

NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization

1 code implementation27 Feb 2024 Yoshiki Masuyama, Gordon Wichern, François G. Germain, Zexu Pan, Sameer Khurana, Chiori Hori, Jonathan Le Roux

Existing NF-based methods focused on estimating the magnitude of the HRTF from a given sound source direction, and the magnitude is converted to a finite impulse response (FIR) filter.

Spatial Interpolation

NeuroHeed+: Improving Neuro-steered Speaker Extraction with Joint Auditory Attention Detection

no code implementations12 Dec 2023 Zexu Pan, Gordon Wichern, Francois G. Germain, Sameer Khurana, Jonathan Le Roux

Neuro-steered speaker extraction aims to extract the listener's brain-attended speech signal from a multi-talker speech signal, in which the attention is derived from the cortical activity.

EEG

Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction

no code implementations30 Oct 2023 Zexu Pan, Gordon Wichern, Yoshiki Masuyama, Francois G. Germain, Sameer Khurana, Chiori Hori, Jonathan Le Roux

Target speech extraction aims to extract, based on a given conditioning cue, a target speech signal that is corrupted by interfering sources, such as noise or competing speakers.

Speaker Separation Speech Enhancement +1

LocSelect: Target Speaker Localization with an Auditory Selective Hearing Mechanism

no code implementations16 Oct 2023 Yu Chen, Xinyuan Qian, Zexu Pan, Kainan Chen, Haizhou Li

The prevailing noise-resistant and reverberation-resistant localization algorithms primarily emphasize separating and providing directional output for each speaker in multi-speaker scenarios, without association with the identity of speakers.

Generation or Replication: Auscultating Audio Latent Diffusion Models

no code implementations16 Oct 2023 Dimitrios Bralios, Gordon Wichern, François G. Germain, Zexu Pan, Sameer Khurana, Chiori Hori, Jonathan Le Roux

The introduction of audio latent diffusion models possessing the ability to generate realistic sound clips on demand from a text description has the potential to revolutionize how we work with audio.

AudioCaps Memorization +1

NeuroHeed: Neuro-Steered Speaker Extraction using EEG Signals

no code implementations26 Jul 2023 Zexu Pan, Marvin Borsdorf, Siqi Cai, Tanja Schultz, Haizhou Li

We propose both an offline and an online NeuroHeed, with the latter designed for real-time inference.

EEG

Target Active Speaker Detection with Audio-visual Cues

1 code implementation22 May 2023 Yidi Jiang, Ruijie Tao, Zexu Pan, Haizhou Li

To benefit from both facial cue and reference speech, we propose the Target Speaker TalkNet (TS-TalkNet), which leverages a pre-enrolled speaker embedding to complement the audio-visual synchronization cue in detecting whether the target speaker is speaking.

Audio-Visual Synchronization

Late Audio-Visual Fusion for In-The-Wild Speaker Diarization

no code implementations2 Nov 2022 Zexu Pan, Gordon Wichern, François G. Germain, Aswin Subramanian, Jonathan Le Roux

Speaker diarization is well studied for constrained audios but little explored for challenging in-the-wild videos, which have more speakers, shorter utterances, and inconsistent on-screen speakers.

speaker-diarization Speaker Diarization +1

VCSE: Time-Domain Visual-Contextual Speaker Extraction Network

no code implementations9 Oct 2022 Junjie Li, Meng Ge, Zexu Pan, Longbiao Wang, Jianwu Dang

In the first stage, we pre-extract a target speech with visual cues and estimate the underlying phonetic sequence.

Lip Reading

A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction

1 code implementation31 Mar 2022 Zexu Pan, Meng Ge, Haizhou Li

We propose a hybrid continuity loss function for time-domain speaker extraction algorithms to settle the over-suppression problem.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Speaker Extraction with Co-Speech Gestures Cue

1 code implementation31 Mar 2022 Zexu Pan, Xinyuan Qian, Haizhou Li

Speaker extraction seeks to extract the clean speech of a target speaker from a multi-talker mixture speech.

Speech Separation

USEV: Universal Speaker Extraction with Visual Cue

1 code implementation30 Sep 2021 Zexu Pan, Meng Ge, Haizhou Li

The speaker extraction algorithm requires an auxiliary reference, such as a video recording or a pre-recorded speech, to form top-down auditory attention on the target speaker.

Selective Listening by Synchronizing Speech with Lips

1 code implementation14 Jun 2021 Zexu Pan, Ruijie Tao, Chenglin Xu, Haizhou Li

A speaker extraction algorithm seeks to extract the speech of a target speaker from a multi-talker speech mixture when given a cue that represents the target speaker, such as a pre-enrolled speech utterance, or an accompanying video track.

Lip Reading Target Speaker Extraction

Muse: Multi-modal target speaker extraction with visual cues

1 code implementation15 Oct 2020 Zexu Pan, Ruijie Tao, Chenglin Xu, Haizhou Li

Speaker extraction algorithm relies on the speech sample from the target speaker as the reference point to focus its attention.

Target Speaker Extraction

Cannot find the paper you are looking for? You can Submit a new open access paper.