1 code implementation • 22 Jan 2025 • Yoshiki Masuyama, Gordon Wichern, François G. Germain, Christopher Ick, Jonathan Le Roux
Head-related transfer functions (HRTFs) with dense spatial grids are desired for immersive binaural audio generation, but their recording is time-consuming.
no code implementations • 21 Jan 2025 • Shoko Araki, Nobutaka Ito, Reinhold Haeb-Umbach, Gordon Wichern, Zhong-Qiu Wang, Yuki Mitsufuji
Source separation (SS) of acoustic signals is a research field that emerged in the mid-1990s and has flourished ever since.
no code implementations • 10 Jan 2025 • Ankush Chakrabarty, Gordon Wichern, Vedang M. Deshpande, Abraham P. Vinod, Karl Berntorp, Christopher R. Laughman
We present a gradient-based meta-learning framework for rapid adaptation of neural state-space models (NSSMs) for black-box system identification.
no code implementations • 31 Oct 2024 • Kohei Saijo, Janek Ebbers, François G. Germain, Gordon Wichern, Jonathan Le Roux
These models are trained on large-scale data including speech, instruments, or sound events and can often successfully separate a wide range of sources.
no code implementations • 20 Sep 2024 • Kohei Saijo, Janek Ebbers, François G. Germain, Sameer Khurana, Gordon Wichern, Jonathan Le Roux
A straightforward way to do so is to use a joint audio-text embedding model, such as the contrastive language-audio pre-training (CLAP) model, as a query encoder and train a TSE model using audio embeddings obtained from the ground-truth audio.
1 code implementation • 6 Aug 2024 • Kohei Saijo, Gordon Wichern, François G. Germain, Zexu Pan, Jonathan Le Roux
Separation performance is also boosted by adding a novel loss term where separated signals mapped back to their own input mixture are used as pseudo-targets for the signals separated from other channels and mapped to the same channel.
1 code implementation • 6 Aug 2024 • Kohei Saijo, Gordon Wichern, François G. Germain, Zexu Pan, Jonathan Le Roux
This work presents TF-Locoformer, a Transformer-based model with LOcal-modeling by COnvolution.
Ranked #1 on
Speech Separation
on WHAMR!
1 code implementation • 6 Jun 2024 • Janek Ebbers, Francois G. Germain, Gordon Wichern, Jonathan Le Roux
In this paper, we show that frame-level thresholding degrades the prediction of the event extent by coupling it with the system's sound presence confidence.
no code implementations • 28 Feb 2024 • Chang-Bin Jeon, Gordon Wichern, François G. Germain, Jonathan Le Roux
In music source separation, a standard training data augmentation procedure is to create new training samples by randomly combining instrument stems from different songs.
1 code implementation • 27 Feb 2024 • Yoshiki Masuyama, Gordon Wichern, François G. Germain, Zexu Pan, Sameer Khurana, Chiori Hori, Jonathan Le Roux
Existing NF-based methods focused on estimating the magnitude of the HRTF from a given sound source direction, and the magnitude is converted to a finite impulse response (FIR) filter.
no code implementations • 12 Dec 2023 • Zexu Pan, Gordon Wichern, Francois G. Germain, Sameer Khurana, Jonathan Le Roux
Neuro-steered speaker extraction aims to extract the listener's brain-attended speech signal from a multi-talker speech signal, in which the attention is derived from the cortical activity.
no code implementations • 30 Oct 2023 • Zexu Pan, Gordon Wichern, Yoshiki Masuyama, Francois G. Germain, Sameer Khurana, Chiori Hori, Jonathan Le Roux
Target speech extraction aims to extract, based on a given conditioning cue, a target speech signal that is corrupted by interfering sources, such as noise or competing speakers.
no code implementations • 16 Oct 2023 • Dimitrios Bralios, Gordon Wichern, François G. Germain, Zexu Pan, Sameer Khurana, Chiori Hori, Jonathan Le Roux
The introduction of audio latent diffusion models possessing the ability to generate realistic sound clips on demand from a text description has the potential to revolutionize how we work with audio.
1 code implementation • 14 Aug 2023 • Stefan Uhlich, Giorgio Fabbro, Masato Hirano, Shusuke Takahashi, Gordon Wichern, Jonathan Le Roux, Dipam Chakraborty, Sharada Mohanty, Kai Li, Yi Luo, Jianwei Yu, Rongzhi Gu, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Mikhail Sukhovei, Yuki Mitsufuji
A significant source of this improvement was making the simulated data better match real cinematic audio, which we further investigate in detail.
no code implementations • 4 Apr 2023 • Ke Chen, Gordon Wichern, François G. Germain, Jonathan Le Roux
In this paper, we propose a self-supervised learning framework for music source separation inspired by the HuBERT speech representation model.
1 code implementation • 7 Mar 2023 • Christoph Boeddeker, Aswin Shanmugam Subramanian, Gordon Wichern, Reinhold Haeb-Umbach, Jonathan Le Roux
Since diarization and source separation of meeting data are closely related tasks, we here propose an approach to perform the two objectives jointly.
Ranked #1 on
Speech Recognition
on LibriCSS
(using extra training data)
no code implementations • 14 Dec 2022 • Darius Petermann, Gordon Wichern, Aswin Shanmugam Subramanian, Zhong-Qiu Wang, Jonathan Le Roux
In this paper, we focus on the cocktail fork problem, which takes a three-pronged approach to source separation by separating an audio mixture such as a movie soundtrack or podcast into the three broad categories of speech, music, and sound effects (SFX - understood to include ambient noise and natural sound events).
no code implementations • 9 Dec 2022 • Darius Petermann, Gordon Wichern, Aswin Subramanian, Jonathan Le Roux
We introduce a framework for audio source separation using embeddings on a hyperbolic manifold that compactly represent the hierarchical relationship between sound sources and time-frequency features.
1 code implementation • 22 Nov 2022 • Dimitrios Bralios, Efthymios Tzinis, Gordon Wichern, Paris Smaragdis, Jonathan Le Roux
During inference, we can dynamically adjust how many processing blocks and iterations of a specific block an input signal needs using a gating module.
no code implementations • 15 Nov 2022 • Rohith Aralikatti, Christoph Boeddeker, Gordon Wichern, Aswin Shanmugam Subramanian, Jonathan Le Roux
This paper proposes reverberation as supervision (RAS), a novel unsupervised loss function for single-channel reverberant speech separation.
no code implementations • 14 Nov 2022 • Ankush Chakrabarty, Gordon Wichern, Christopher R. Laughman
Deep neural state-space models (SSMs) provide a powerful tool for modeling dynamical systems solely using operational data.
1 code implementation • 11 Nov 2022 • Efthymios Tzinis, Gordon Wichern, Paris Smaragdis, Jonathan Le Roux
Recent research has shown remarkable performance in leveraging multiple extraneous conditional and non-mutually exclusive semantic concepts for sound source separation, allowing the flexibility to extract a given target source based on multiple different queries.
no code implementations • 4 Nov 2022 • Hao Yen, François G. Germain, Gordon Wichern, Jonathan Le Roux
Diffusion models have recently shown promising results for difficult enhancement tasks such as the conditional and unconditional restoration of natural images and audio signals.
no code implementations • 2 Nov 2022 • Zexu Pan, Gordon Wichern, François G. Germain, Aswin Subramanian, Jonathan Le Roux
Speaker diarization is well studied for constrained audios but little explored for challenging in-the-wild videos, which have more speakers, shorter utterances, and inconsistent on-screen speakers.
no code implementations • 7 Apr 2022 • Efthymios Tzinis, Gordon Wichern, Aswin Subramanian, Paris Smaragdis, Jonathan Le Roux
We introduce a new paradigm for single-channel target source separation where the sources of interest can be distinguished using non-mutually exclusive concepts (e. g., loudness, gender, language, spatial location, etc).
no code implementations • 8 Mar 2022 • Olga Slizovskaia, Gordon Wichern, Zhong-Qiu Wang, Jonathan Le Roux
Existing systems for sound event localization and detection (SELD) typically operate by estimating a source location for all classes at every time instant.
3 code implementations • 19 Oct 2021 • Darius Petermann, Gordon Wichern, Zhong-Qiu Wang, Jonathan Le Roux
The cocktail party problem aims at isolating any source of interest within a complex acoustic scene, and has long inspired audio source separation research.
no code implementations • 29 Jun 2021 • Ankush Chakrabarty, Gordon Wichern, Christopher Laughman
Physics-informed dynamical system models form critical components of digital twins of the built environment.
no code implementations • 22 Oct 2020 • Yun-Ning Hung, Gordon Wichern, Jonathan Le Roux
Most music source separation systems require large collections of isolated sources for training, which can be difficult to obtain.
1 code implementation • 25 Jul 2020 • Prem Seetharaman, Gordon Wichern, Bryan Pardo, Jonathan Le Roux
Clipping the gradient is a known approach to improving gradient descent, but requires hand selection of a clipping threshold hyperparameter.
no code implementations • 6 Nov 2019 • Fatemeh Pishdadian, Gordon Wichern, Jonathan Le Roux
In this scenario, weak labels are defined in contrast with strong time-frequency (TF) labels such as those obtained from isolated sources, and refer either to frame-level weak labels where one only has access to the time periods when different sources are active in an audio mixture, or to clip-level weak labels that only indicate the presence or absence of sounds in an entire audio clip.
no code implementations • 23 Oct 2019 • Prem Seetharaman, Gordon Wichern, Jonathan Le Roux, Bryan Pardo
They are trained on synthetic mixtures of audio made from isolated sound source recordings so that ground truth for the separation is known.
no code implementations • 22 Oct 2019 • Matthew Maciejewski, Gordon Wichern, Emmett McQuinn, Jonathan Le Roux
While significant advances have been made with respect to the separation of overlapping speech signals, studies have been largely constrained to mixtures of clean, near anechoic speech, not representative of many real-world scenarios.
Sound Audio and Speech Processing
no code implementations • 18 Sep 2019 • Ethan Manilow, Gordon Wichern, Prem Seetharaman, Jonathan Le Roux
In this paper, we present the synthesized Lakh dataset (Slakh) as a new tool for music source separation research.
1 code implementation • 2 Jul 2019 • Gordon Wichern, Joe Antognini, Michael Flynn, Licheng Richard Zhu, Emmett McQuinn, Dwight Crow, Ethan Manilow, Jonathan Le Roux
Recent progress in separating the speech signals from multiple overlapping speakers using a single audio channel has brought us closer to solving the cocktail party problem.
Ranked #18 on
Speech Separation
on WHAMR!
no code implementations • 7 Nov 2018 • Prem Seetharaman, Gordon Wichern, Shrikant Venkataramani, Jonathan Le Roux
Isolating individual instruments in a musical mixture has a myriad of potential applications, and seems imminently achievable given the levels of performance reached by recent deep learning methods.
no code implementations • 6 Nov 2018 • Prem Seetharaman, Gordon Wichern, Jonathan Le Roux, Bryan Pardo
These estimates, together with a weighting scheme in the time-frequency domain, based on confidence in the separation quality, are used to train a deep learning model that can be used for single-channel separation, where no source direction information is available.
no code implementations • 2 Oct 2018 • Jonathan Le Roux, Gordon Wichern, Shinji Watanabe, Andy Sarroff, John R. Hershey
Here, we propose "magbook", "phasebook", and "combook", three new types of layers based on discrete representations that can be used to estimate complex time-frequency masks.
2 code implementations • 21 Jun 2018 • Chiori Hori, Huda Alamri, Jue Wang, Gordon Wichern, Takaaki Hori, Anoop Cherian, Tim K. Marks, Vincent Cartillier, Raphael Gontijo Lopes, Abhishek Das, Irfan Essa, Dhruv Batra, Devi Parikh
We introduce a new dataset of dialogs about videos of human behaviors.