Search Results for author: Khaled Koutini

Found 17 papers, 14 papers with code

Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models

1 code implementation24 Oct 2023 Florian Schmid, Khaled Koutini, Gerhard Widmer

Audio Spectrogram Transformers are excellent at exploiting large datasets, creating powerful pre-trained models that surpass CNNs when fine-tuned on downstream tasks.

 Ranked #1 on Instrument Recognition on OpenMIC-2018 (using extra training data)

Audio Classification Audio Tagging +2

Device-Robust Acoustic Scene Classification via Impulse Response Augmentation

1 code implementation12 May 2023 Tobias Morocutti, Florian Schmid, Khaled Koutini, Gerhard Widmer

However, we also show that DIR augmentation and Freq-MixStyle are complementary, achieving a new state-of-the-art performance on signals recorded by devices unseen during training.

Acoustic Scene Classification Audio Classification +1

Learning General Audio Representations with Large-Scale Training of Patchout Audio Transformers

1 code implementation25 Nov 2022 Khaled Koutini, Shahed Masoudian, Florian Schmid, Hamid Eghbal-zadeh, Jan Schlüter, Gerhard Widmer

Furthermore, we will show that transformers trained on Audioset can be extremely effective representation extractors for a wide range of downstream tasks.

Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation

2 code implementations9 Nov 2022 Florian Schmid, Khaled Koutini, Gerhard Widmer

We provide models of different complexity levels, scaling from low-complexity models up to a new state-of-the-art performance of . 483 mAP on AudioSet.

Ranked #2 on Audio Tagging on AudioSet (using extra training data)

Audio Classification Audio Tagging +2

Efficient Training of Audio Transformers with Patchout

2 code implementations11 Oct 2021 Khaled Koutini, Jan Schlüter, Hamid Eghbal-zadeh, Gerhard Widmer

However, one of the main shortcomings of transformer models, compared to the well-established CNNs, is the computational complexity.

Ranked #3 on Audio Classification on FSD50K (using extra training data)

Acoustic Scene Classification Audio Classification +2

Over-Parameterization and Generalization in Audio Classification

no code implementations19 Jul 2021 Khaled Koutini, Hamid Eghbal-zadeh, Florian Henkel, Jan Schlüter, Gerhard Widmer

Convolutional Neural Networks (CNNs) have been dominating classification tasks in various domains, such as machine vision, machine listening, and natural language processing.

Acoustic Scene Classification Audio Classification +1

Receptive Field Regularization Techniques for Audio Classification and Tagging with Deep Convolutional Neural Networks

1 code implementation26 May 2021 Khaled Koutini, Hamid Eghbal-zadeh, Gerhard Widmer

As state-of-the-art CNN architectures-in computer vision and other domains-tend to go deeper in terms of number of layers, their RF size increases and therefore they degrade in performance in several audio classification and tagging tasks.

Acoustic Scene Classification Audio Classification +2

Receptive-Field Regularized CNNs for Music Classification and Tagging

1 code implementation27 Jul 2020 Khaled Koutini, Hamid Eghbal-zadeh, Verena Haunschmid, Paul Primus, Shreyan Chowdhury, Gerhard Widmer

However, the MIR field is still dominated by the classical VGG-based CNN architecture variants, often in combination with more complex modules such as attention, and/or techniques such as pre-training on large datasets.

Classification General Classification +4

On Data Augmentation and Adversarial Risk: An Empirical Analysis

no code implementations6 Jul 2020 Hamid Eghbal-zadeh, Khaled Koutini, Paul Primus, Verena Haunschmid, Michal Lewandowski, Werner Zellinger, Bernhard A. Moser, Gerhard Widmer

Data augmentation techniques have become standard practice in deep learning, as it has been shown to greatly improve the generalisation abilities of models.

Adversarial Attack Data Augmentation

Emotion and Theme Recognition in Music with Frequency-Aware RF-Regularized CNNs

1 code implementation28 Oct 2019 Khaled Koutini, Shreyan Chowdhury, Verena Haunschmid, Hamid Eghbal-zadeh, Gerhard Widmer

We present CP-JKU submission to MediaEval 2019; a Receptive Field-(RF)-regularized and Frequency-Aware CNN approach for tagging music with emotion/mood labels.

Acoustic Scene Classification Scene Classification

The Receptive Field as a Regularizer in Deep Convolutional Neural Networks for Acoustic Scene Classification

3 code implementations3 Jul 2019 Khaled Koutini, Hamid Eghbal-zadeh, Matthias Dorfer, Gerhard Widmer

To this end, we analyse the receptive field (RF) of these CNNs and demonstrate the importance of the RF to the generalization capability of the models.

Acoustic Scene Classification General Classification +1

Deep SNP: An End-to-end Deep Neural Network with Attention-based Localization for Break-point Detection in SNP Array Genomic data

1 code implementation22 Jun 2018 Hamid Eghbal-zadeh, Lukas Fischer, Niko Popitsch, Florian Kromp, Sabine Taschner-Mandl, Khaled Koutini, Teresa Gerber, Eva Bozsaky, Peter F. Ambros, Inge M. Ambros, Gerhard Widmer, Bernhard A. Moser

We show, that Deep SNP is capable of successfully predicting the presence or absence of a breakpoint in large genomic windows and outperforms state-of-the-art neural network models.

Cannot find the paper you are looking for? You can Submit a new open access paper.