Search Results for author: Gordon Wichern

Found 31 papers, 8 papers with code

Why does music source separation benefit from cacophony?

no code implementations28 Feb 2024 Chang-Bin Jeon, Gordon Wichern, François G. Germain, Jonathan Le Roux

In music source separation, a standard training data augmentation procedure is to create new training samples by randomly combining instrument stems from different songs.

Data Augmentation Music Source Separation

NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization

1 code implementation27 Feb 2024 Yoshiki Masuyama, Gordon Wichern, François G. Germain, Zexu Pan, Sameer Khurana, Chiori Hori, Jonathan Le Roux

Existing NF-based methods focused on estimating the magnitude of the HRTF from a given sound source direction, and the magnitude is converted to a finite impulse response (FIR) filter.

Spatial Interpolation

NeuroHeed+: Improving Neuro-steered Speaker Extraction with Joint Auditory Attention Detection

no code implementations12 Dec 2023 Zexu Pan, Gordon Wichern, Francois G. Germain, Sameer Khurana, Jonathan Le Roux

Neuro-steered speaker extraction aims to extract the listener's brain-attended speech signal from a multi-talker speech signal, in which the attention is derived from the cortical activity.


Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction

no code implementations30 Oct 2023 Zexu Pan, Gordon Wichern, Yoshiki Masuyama, Francois G. Germain, Sameer Khurana, Chiori Hori, Jonathan Le Roux

Target speech extraction aims to extract, based on a given conditioning cue, a target speech signal that is corrupted by interfering sources, such as noise or competing speakers.

Speaker Separation Speech Enhancement +1

Generation or Replication: Auscultating Audio Latent Diffusion Models

no code implementations16 Oct 2023 Dimitrios Bralios, Gordon Wichern, François G. Germain, Zexu Pan, Sameer Khurana, Chiori Hori, Jonathan Le Roux

The introduction of audio latent diffusion models possessing the ability to generate realistic sound clips on demand from a text description has the potential to revolutionize how we work with audio.

AudioCaps Memorization +1

Pac-HuBERT: Self-Supervised Music Source Separation via Primitive Auditory Clustering and Hidden-Unit BERT

no code implementations4 Apr 2023 Ke Chen, Gordon Wichern, François G. Germain, Jonathan Le Roux

In this paper, we propose a self-supervised learning framework for music source separation inspired by the HuBERT speech representation model.

Clustering Music Source Separation +1

TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings

no code implementations7 Mar 2023 Christoph Boeddeker, Aswin Shanmugam Subramanian, Gordon Wichern, Reinhold Haeb-Umbach, Jonathan Le Roux

Since diarization and source separation of meeting data are closely related tasks, we here propose an approach to perform the two objectives jointly.

 Ranked #1 on Speech Recognition on LibriCSS (using extra training data)

Action Detection Activity Detection +1

Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks

no code implementations14 Dec 2022 Darius Petermann, Gordon Wichern, Aswin Shanmugam Subramanian, Zhong-Qiu Wang, Jonathan Le Roux

In this paper, we focus on the cocktail fork problem, which takes a three-pronged approach to source separation by separating an audio mixture such as a movie soundtrack or podcast into the three broad categories of speech, music, and sound effects (SFX - understood to include ambient noise and natural sound events).

Action Detection Activity Detection +4

Hyperbolic Audio Source Separation

no code implementations9 Dec 2022 Darius Petermann, Gordon Wichern, Aswin Subramanian, Jonathan Le Roux

We introduce a framework for audio source separation using embeddings on a hyperbolic manifold that compactly represent the hierarchical relationship between sound sources and time-frequency features.

Audio Source Separation

Latent Iterative Refinement for Modular Source Separation

1 code implementation22 Nov 2022 Dimitrios Bralios, Efthymios Tzinis, Gordon Wichern, Paris Smaragdis, Jonathan Le Roux

During inference, we can dynamically adjust how many processing blocks and iterations of a specific block an input signal needs using a gating module.

Reverberation as Supervision for Speech Separation

no code implementations15 Nov 2022 Rohith Aralikatti, Christoph Boeddeker, Gordon Wichern, Aswin Shanmugam Subramanian, Jonathan Le Roux

This paper proposes reverberation as supervision (RAS), a novel unsupervised loss function for single-channel reverberant speech separation.

Speech Separation

Meta-Learning of Neural State-Space Models Using Data From Similar Systems

no code implementations14 Nov 2022 Ankush Chakrabarty, Gordon Wichern, Christopher R. Laughman

Deep neural state-space models (SSMs) provide a powerful tool for modeling dynamical systems solely using operational data.

Meta-Learning Transfer Learning

Optimal Condition Training for Target Source Separation

1 code implementation11 Nov 2022 Efthymios Tzinis, Gordon Wichern, Paris Smaragdis, Jonathan Le Roux

Recent research has shown remarkable performance in leveraging multiple extraneous conditional and non-mutually exclusive semantic concepts for sound source separation, allowing the flexibility to extract a given target source based on multiple different queries.

Cold Diffusion for Speech Enhancement

no code implementations4 Nov 2022 Hao Yen, François G. Germain, Gordon Wichern, Jonathan Le Roux

Diffusion models have recently shown promising results for difficult enhancement tasks such as the conditional and unconditional restoration of natural images and audio signals.

Speech Enhancement

Late Audio-Visual Fusion for In-The-Wild Speaker Diarization

no code implementations2 Nov 2022 Zexu Pan, Gordon Wichern, François G. Germain, Aswin Subramanian, Jonathan Le Roux

Speaker diarization is well studied for constrained audios but little explored for challenging in-the-wild videos, which have more speakers, shorter utterances, and inconsistent on-screen speakers.

speaker-diarization Speaker Diarization +1

Heterogeneous Target Speech Separation

no code implementations7 Apr 2022 Efthymios Tzinis, Gordon Wichern, Aswin Subramanian, Paris Smaragdis, Jonathan Le Roux

We introduce a new paradigm for single-channel target source separation where the sources of interest can be distinguished using non-mutually exclusive concepts (e. g., loudness, gender, language, spatial location, etc).

Speech Separation

Locate This, Not That: Class-Conditioned Sound Event DOA Estimation

no code implementations8 Mar 2022 Olga Slizovskaia, Gordon Wichern, Zhong-Qiu Wang, Jonathan Le Roux

Existing systems for sound event localization and detection (SELD) typically operate by estimating a source location for all classes at every time instant.

Sound Event Localization and Detection

The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks

3 code implementations19 Oct 2021 Darius Petermann, Gordon Wichern, Zhong-Qiu Wang, Jonathan Le Roux

The cocktail party problem aims at isolating any source of interest within a complex acoustic scene, and has long inspired audio source separation research.

Audio Source Separation

Transcription Is All You Need: Learning to Separate Musical Mixtures with Score as Supervision

no code implementations22 Oct 2020 Yun-Ning Hung, Gordon Wichern, Jonathan Le Roux

Most music source separation systems require large collections of isolated sources for training, which can be difficult to obtain.

Music Source Separation

AutoClip: Adaptive Gradient Clipping for Source Separation Networks

1 code implementation25 Jul 2020 Prem Seetharaman, Gordon Wichern, Bryan Pardo, Jonathan Le Roux

Clipping the gradient is a known approach to improving gradient descent, but requires hand selection of a clipping threshold hyperparameter.

Audio Source Separation

Finding Strength in Weakness: Learning to Separate Sounds with Weak Supervision

no code implementations6 Nov 2019 Fatemeh Pishdadian, Gordon Wichern, Jonathan Le Roux

In this scenario, weak labels are defined in contrast with strong time-frequency (TF) labels such as those obtained from isolated sources, and refer either to frame-level weak labels where one only has access to the time periods when different sources are active in an audio mixture, or to clip-level weak labels that only indicate the presence or absence of sounds in an entire audio clip.

Audio Source Separation

Bootstrapping deep music separation from primitive auditory grouping principles

no code implementations23 Oct 2019 Prem Seetharaman, Gordon Wichern, Jonathan Le Roux, Bryan Pardo

They are trained on synthetic mixtures of audio made from isolated sound source recordings so that ground truth for the separation is known.

Music Source Separation

WHAMR!: Noisy and Reverberant Single-Channel Speech Separation

no code implementations22 Oct 2019 Matthew Maciejewski, Gordon Wichern, Emmett McQuinn, Jonathan Le Roux

While significant advances have been made with respect to the separation of overlapping speech signals, studies have been largely constrained to mixtures of clean, near anechoic speech, not representative of many real-world scenarios.

Sound Audio and Speech Processing

WHAM!: Extending Speech Separation to Noisy Environments

1 code implementation2 Jul 2019 Gordon Wichern, Joe Antognini, Michael Flynn, Licheng Richard Zhu, Emmett McQuinn, Dwight Crow, Ethan Manilow, Jonathan Le Roux

Recent progress in separating the speech signals from multiple overlapping speakers using a single audio channel has brought us closer to solving the cocktail party problem.

Speech Separation

Class-conditional embeddings for music source separation

no code implementations7 Nov 2018 Prem Seetharaman, Gordon Wichern, Shrikant Venkataramani, Jonathan Le Roux

Isolating individual instruments in a musical mixture has a myriad of potential applications, and seems imminently achievable given the levels of performance reached by recent deep learning methods.

Clustering Deep Clustering +1

Bootstrapping single-channel source separation via unsupervised spatial clustering on stereo mixtures

no code implementations6 Nov 2018 Prem Seetharaman, Gordon Wichern, Jonathan Le Roux, Bryan Pardo

These estimates, together with a weighting scheme in the time-frequency domain, based on confidence in the separation quality, are used to train a deep learning model that can be used for single-channel separation, where no source direction information is available.

Clustering Image Segmentation +2

Phasebook and Friends: Leveraging Discrete Representations for Source Separation

no code implementations2 Oct 2018 Jonathan Le Roux, Gordon Wichern, Shinji Watanabe, Andy Sarroff, John R. Hershey

Here, we propose "magbook", "phasebook", and "combook", three new types of layers based on discrete representations that can be used to estimate complex time-frequency masks.

Speaker Separation Speech Enhancement

Cannot find the paper you are looking for? You can Submit a new open access paper.