Search Results for author: Mikolaj Kegler

Found 9 papers, 7 papers with code

CATSE: A Context-Aware Framework for Causal Target Sound Extraction

no code implementations • 21 Mar 2024 • Shrishail Baligar, Mikolaj Kegler, Bryce Irvin, Marko Stamenovic, Shawn Newsam

First, we explore the utility of context by providing the TSE model with oracle information about what sound classes make up the input mixture, where the objective of the model is to extract one or more sources of interest indicated by the user.

Target Sound Extraction

Paper
Add Code

Latent CLAP Loss for Better Foley Sound Synthesis

1 code implementation • 18 Mar 2024 • Tornike Karchkhadze, Hassan Salami Kavaki, Mohammad Rasool Izadi, Bryce Irvin, Mikolaj Kegler, Ari Hertz, Shuo Zhang, Marko Stamenovic

We introduce a new loss term to enhance Foley sound generation in AudioLDM without post-filtering.

FAD

Paper
Code

Two-Step Knowledge Distillation for Tiny Speech Enhancement

no code implementations • 15 Sep 2023 • Rayan Daod Nathoo, Mikolaj Kegler, Marko Stamenovic

Tiny, causal models are crucial for embedded audio machine learning applications.

Knowledge Distillation Model Compression +1

Paper
Add Code

Self-Supervised Learning for Speech Enhancement through Synthesis

1 code implementation • 4 Nov 2022 • Bryce Irvin, Marko Stamenovic, Mikolaj Kegler, Li-Chia Yang

Modern speech enhancement (SE) networks typically implement noise suppression through time-frequency masking, latent representation masking, or discriminative signal prediction.

Denoising Self-Supervised Learning +2

Paper
Code

BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping

1 code implementation • 24 Jun 2022 • Gasser Elbanna, Neil Scheidwasser-Clow, Mikolaj Kegler, Pierre Beckmann, Karl El Hajal, Milos Cernak

Our results indicate that the hybrid model with a convolutional transformer as the encoder yields superior performance in most HEAR challenge tasks.

Ranked #1 on Self-Supervised Learning on CREMA-D

Scene Classification Self-Supervised Learning

Paper
Code

Hybrid Handcrafted and Learnable Audio Representation for Analysis of Speech Under Cognitive and Physical Load

1 code implementation • 30 Mar 2022 • Gasser Elbanna, Alice Biryukov, Neil Scheidwasser-Clow, Lara Orlandic, Pablo Mainar, Mikolaj Kegler, Pierre Beckmann, Milos Cernak

To that end, we introduce a set of five datasets for task load detection in speech.

Representation Learning

Paper
Code

SERAB: A multi-lingual benchmark for speech emotion recognition

2 code implementations • 7 Oct 2021 • Neil Scheidwasser-Clow, Mikolaj Kegler, Pierre Beckmann, Milos Cernak

To facilitate the process, here, we present the Speech Emotion Recognition Adaptation Benchmark (SERAB), a framework for evaluating the performance and generalization capacity of different approaches for utterance-level SER.

Benchmarking Speech Emotion Recognition

Paper
Code

Word-level Embeddings for Cross-Task Transfer Learning in Speech Processing

2 code implementations • 22 Oct 2019 • Pierre Beckmann, Mikolaj Kegler, Milos Cernak

Recent breakthroughs in deep learning often rely on representation learning and knowledge transfer.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

Paper
Code

Deep speech inpainting of time-frequency masks

2 code implementations • 20 Oct 2019 • Mikolaj Kegler, Pierre Beckmann, Milos Cernak

To address these limitations, here we propose an end-to-end framework for speech inpainting, the context-based retrieval of missing or severely distorted parts of time-frequency representation of speech.

Retrieval

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.