no code implementations • 21 Mar 2024 • Shrishail Baligar, Mikolaj Kegler, Bryce Irvin, Marko Stamenovic, Shawn Newsam
First, we explore the utility of context by providing the TSE model with oracle information about what sound classes make up the input mixture, where the objective of the model is to extract one or more sources of interest indicated by the user.
1 code implementation • 18 Mar 2024 • Tornike Karchkhadze, Hassan Salami Kavaki, Mohammad Rasool Izadi, Bryce Irvin, Mikolaj Kegler, Ari Hertz, Shuo Zhang, Marko Stamenovic
We introduce a new loss term to enhance Foley sound generation in AudioLDM without post-filtering.
no code implementations • 15 Sep 2023 • Rayan Daod Nathoo, Mikolaj Kegler, Marko Stamenovic
Tiny, causal models are crucial for embedded audio machine learning applications.
1 code implementation • 4 Nov 2022 • Bryce Irvin, Marko Stamenovic, Mikolaj Kegler, Li-Chia Yang
Modern speech enhancement (SE) networks typically implement noise suppression through time-frequency masking, latent representation masking, or discriminative signal prediction.
1 code implementation • 24 Jun 2022 • Gasser Elbanna, Neil Scheidwasser-Clow, Mikolaj Kegler, Pierre Beckmann, Karl El Hajal, Milos Cernak
Our results indicate that the hybrid model with a convolutional transformer as the encoder yields superior performance in most HEAR challenge tasks.
Ranked #1 on Self-Supervised Learning on CREMA-D
1 code implementation • 30 Mar 2022 • Gasser Elbanna, Alice Biryukov, Neil Scheidwasser-Clow, Lara Orlandic, Pablo Mainar, Mikolaj Kegler, Pierre Beckmann, Milos Cernak
To that end, we introduce a set of five datasets for task load detection in speech.
2 code implementations • 7 Oct 2021 • Neil Scheidwasser-Clow, Mikolaj Kegler, Pierre Beckmann, Milos Cernak
To facilitate the process, here, we present the Speech Emotion Recognition Adaptation Benchmark (SERAB), a framework for evaluating the performance and generalization capacity of different approaches for utterance-level SER.
2 code implementations • 22 Oct 2019 • Pierre Beckmann, Mikolaj Kegler, Milos Cernak
Recent breakthroughs in deep learning often rely on representation learning and knowledge transfer.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +7
2 code implementations • 20 Oct 2019 • Mikolaj Kegler, Pierre Beckmann, Milos Cernak
To address these limitations, here we propose an end-to-end framework for speech inpainting, the context-based retrieval of missing or severely distorted parts of time-frequency representation of speech.