Search Results for author: Khaled Koutini

Found 17 papers, 14 papers with code

Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models

1 code implementation • 24 Oct 2023 • Florian Schmid, Khaled Koutini, Gerhard Widmer

Audio Spectrogram Transformers are excellent at exploiting large datasets, creating powerful pre-trained models that surpass CNNs when fine-tuned on downstream tasks.

Ranked #1 on Instrument Recognition on OpenMIC-2018 (using extra training data)

Audio Classification Audio Tagging +2

180

Paper
Code

Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data Sets

1 code implementation • 8 Aug 2023 • Paul Primus, Khaled Koutini, Gerhard Widmer

This work presents a text-to-audio-retrieval system based on pre-trained text and spectrogram transformers.

Retrieval Text to Audio Retrieval

Paper
Code

Domain Information Control at Inference Time for Acoustic Scene Classification

no code implementations • 13 Jun 2023 • Shahed Masoudian, Khaled Koutini, Markus Schedl, Gerhard Widmer, Navid Rekabsaz

In the Acoustic Scene Classification task (ASC), domain shift is mainly caused by different recording devices.

Acoustic Scene Classification Domain Generalization +1

Paper
Add Code

Device-Robust Acoustic Scene Classification via Impulse Response Augmentation

1 code implementation • 12 May 2023 • Tobias Morocutti, Florian Schmid, Khaled Koutini, Gerhard Widmer

However, we also show that DIR augmentation and Freq-MixStyle are complementary, achieving a new state-of-the-art performance on signals recorded by devices unseen during training.

Acoustic Scene Classification Audio Classification +1

Paper
Code

Learning General Audio Representations with Large-Scale Training of Patchout Audio Transformers

1 code implementation • 25 Nov 2022 • Khaled Koutini, Shahed Masoudian, Florian Schmid, Hamid Eghbal-zadeh, Jan Schlüter, Gerhard Widmer

Furthermore, we will show that transformers trained on Audioset can be extremely effective representation extractors for a wide range of downstream tasks.

Paper
Code

Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation

2 code implementations • 9 Nov 2022 • Florian Schmid, Khaled Koutini, Gerhard Widmer

We provide models of different complexity levels, scaling from low-complexity models up to a new state-of-the-art performance of . 483 mAP on AudioSet.

Ranked #2 on Audio Tagging on AudioSet (using extra training data)

Audio Classification Audio Tagging +2

180

Paper
Code

Efficient Training of Audio Transformers with Patchout

2 code implementations • 11 Oct 2021 • Khaled Koutini, Jan Schlüter, Hamid Eghbal-zadeh, Gerhard Widmer

However, one of the main shortcomings of transformer models, compared to the well-established CNNs, is the computational complexity.

Ranked #3 on Audio Classification on FSD50K (using extra training data)

Acoustic Scene Classification Audio Classification +2

278

Paper
Code

Over-Parameterization and Generalization in Audio Classification

no code implementations • 19 Jul 2021 • Khaled Koutini, Hamid Eghbal-zadeh, Florian Henkel, Jan Schlüter, Gerhard Widmer

Convolutional Neural Networks (CNNs) have been dominating classification tasks in various domains, such as machine vision, machine listening, and natural language processing.

Acoustic Scene Classification Audio Classification +1

Paper
Add Code

Receptive Field Regularization Techniques for Audio Classification and Tagging with Deep Convolutional Neural Networks

1 code implementation • 26 May 2021 • Khaled Koutini, Hamid Eghbal-zadeh, Gerhard Widmer

As state-of-the-art CNN architectures-in computer vision and other domains-tend to go deeper in terms of number of layers, their RF size increases and therefore they degrade in performance in several audio classification and tagging tasks.

Acoustic Scene Classification Audio Classification +2

Paper
Code

Low-Complexity Models for Acoustic Scene Classification Based on Receptive Field Regularization and Frequency Damping

1 code implementation • 5 Nov 2020 • Khaled Koutini, Florian Henkel, Hamid Eghbal-zadeh, Gerhard Widmer

Deep Neural Networks are known to be very demanding in terms of computing and memory requirements.

Acoustic Scene Classification Scene Classification

Paper
Code

Receptive-Field Regularized CNNs for Music Classification and Tagging

1 code implementation • 27 Jul 2020 • Khaled Koutini, Hamid Eghbal-zadeh, Verena Haunschmid, Paul Primus, Shreyan Chowdhury, Gerhard Widmer

However, the MIR field is still dominated by the classical VGG-based CNN architecture variants, often in combination with more complex modules such as attention, and/or techniques such as pre-training on large datasets.

Classification General Classification +4

Paper
Code

On Data Augmentation and Adversarial Risk: An Empirical Analysis

no code implementations • 6 Jul 2020 • Hamid Eghbal-zadeh, Khaled Koutini, Paul Primus, Verena Haunschmid, Michal Lewandowski, Werner Zellinger, Bernhard A. Moser, Gerhard Widmer

Data augmentation techniques have become standard practice in deep learning, as it has been shown to greatly improve the generalisation abilities of models.

Adversarial Attack Data Augmentation

Paper
Add Code

Emotion and Theme Recognition in Music with Frequency-Aware RF-Regularized CNNs

1 code implementation • 28 Oct 2019 • Khaled Koutini, Shreyan Chowdhury, Verena Haunschmid, Hamid Eghbal-zadeh, Gerhard Widmer

We present CP-JKU submission to MediaEval 2019; a Receptive Field-(RF)-regularized and Frequency-Aware CNN approach for tagging music with emotion/mood labels.

Acoustic Scene Classification Scene Classification

Paper
Code

Receptive-field-regularized CNN variants for acoustic scene classification

2 code implementations • 5 Sep 2019 • Khaled Koutini, Hamid Eghbal-zadeh, Gerhard Widmer

One side effect of restricting the RF of CNNs is that more frequency information is lost.

Acoustic Scene Classification Classification +2

Paper
Code

Exploiting Parallel Audio Recordings to Enforce Device Invariance in CNN-based Acoustic Scene Classification

1 code implementation • 4 Sep 2019 • Paul Primus, Hamid Eghbal-zadeh, David Eitelsebner, Khaled Koutini, Andreas Arzt, Gerhard Widmer

Distribution mismatches between the data seen at training and at application time remain a major challenge in all application areas of machine learning.

Acoustic Scene Classification BIG-bench Machine Learning +3

Paper
Code

The Receptive Field as a Regularizer in Deep Convolutional Neural Networks for Acoustic Scene Classification

3 code implementations • 3 Jul 2019 • Khaled Koutini, Hamid Eghbal-zadeh, Matthias Dorfer, Gerhard Widmer

To this end, we analyse the receptive field (RF) of these CNNs and demonstrate the importance of the RF to the generalization capability of the models.

Acoustic Scene Classification General Classification +1

Paper
Code

Deep SNP: An End-to-end Deep Neural Network with Attention-based Localization for Break-point Detection in SNP Array Genomic data

1 code implementation • 22 Jun 2018 • Hamid Eghbal-zadeh, Lukas Fischer, Niko Popitsch, Florian Kromp, Sabine Taschner-Mandl, Khaled Koutini, Teresa Gerber, Eva Bozsaky, Peter F. Ambros, Inge M. Ambros, Gerhard Widmer, Bernhard A. Moser

We show, that Deep SNP is capable of successfully predicting the presence or absence of a breakpoint in large genomic windows and outperforms state-of-the-art neural network models.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.