1 code implementation • 24 Oct 2023 • Florian Schmid, Khaled Koutini, Gerhard Widmer
Audio Spectrogram Transformers are excellent at exploiting large datasets, creating powerful pre-trained models that surpass CNNs when fine-tuned on downstream tasks.
Ranked #1 on Instrument Recognition on OpenMIC-2018 (using extra training data)
1 code implementation • 8 Aug 2023 • Paul Primus, Khaled Koutini, Gerhard Widmer
This work presents a text-to-audio-retrieval system based on pre-trained text and spectrogram transformers.
no code implementations • 13 Jun 2023 • Shahed Masoudian, Khaled Koutini, Markus Schedl, Gerhard Widmer, Navid Rekabsaz
In the Acoustic Scene Classification task (ASC), domain shift is mainly caused by different recording devices.
1 code implementation • 12 May 2023 • Tobias Morocutti, Florian Schmid, Khaled Koutini, Gerhard Widmer
However, we also show that DIR augmentation and Freq-MixStyle are complementary, achieving a new state-of-the-art performance on signals recorded by devices unseen during training.
1 code implementation • 25 Nov 2022 • Khaled Koutini, Shahed Masoudian, Florian Schmid, Hamid Eghbal-zadeh, Jan Schlüter, Gerhard Widmer
Furthermore, we will show that transformers trained on Audioset can be extremely effective representation extractors for a wide range of downstream tasks.
2 code implementations • 9 Nov 2022 • Florian Schmid, Khaled Koutini, Gerhard Widmer
We provide models of different complexity levels, scaling from low-complexity models up to a new state-of-the-art performance of . 483 mAP on AudioSet.
Ranked #2 on Audio Tagging on AudioSet (using extra training data)
2 code implementations • 11 Oct 2021 • Khaled Koutini, Jan Schlüter, Hamid Eghbal-zadeh, Gerhard Widmer
However, one of the main shortcomings of transformer models, compared to the well-established CNNs, is the computational complexity.
Ranked #3 on Audio Classification on FSD50K (using extra training data)
no code implementations • 19 Jul 2021 • Khaled Koutini, Hamid Eghbal-zadeh, Florian Henkel, Jan Schlüter, Gerhard Widmer
Convolutional Neural Networks (CNNs) have been dominating classification tasks in various domains, such as machine vision, machine listening, and natural language processing.
1 code implementation • 26 May 2021 • Khaled Koutini, Hamid Eghbal-zadeh, Gerhard Widmer
As state-of-the-art CNN architectures-in computer vision and other domains-tend to go deeper in terms of number of layers, their RF size increases and therefore they degrade in performance in several audio classification and tagging tasks.
1 code implementation • 5 Nov 2020 • Khaled Koutini, Florian Henkel, Hamid Eghbal-zadeh, Gerhard Widmer
Deep Neural Networks are known to be very demanding in terms of computing and memory requirements.
1 code implementation • 27 Jul 2020 • Khaled Koutini, Hamid Eghbal-zadeh, Verena Haunschmid, Paul Primus, Shreyan Chowdhury, Gerhard Widmer
However, the MIR field is still dominated by the classical VGG-based CNN architecture variants, often in combination with more complex modules such as attention, and/or techniques such as pre-training on large datasets.
no code implementations • 6 Jul 2020 • Hamid Eghbal-zadeh, Khaled Koutini, Paul Primus, Verena Haunschmid, Michal Lewandowski, Werner Zellinger, Bernhard A. Moser, Gerhard Widmer
Data augmentation techniques have become standard practice in deep learning, as it has been shown to greatly improve the generalisation abilities of models.
1 code implementation • 28 Oct 2019 • Khaled Koutini, Shreyan Chowdhury, Verena Haunschmid, Hamid Eghbal-zadeh, Gerhard Widmer
We present CP-JKU submission to MediaEval 2019; a Receptive Field-(RF)-regularized and Frequency-Aware CNN approach for tagging music with emotion/mood labels.
2 code implementations • 5 Sep 2019 • Khaled Koutini, Hamid Eghbal-zadeh, Gerhard Widmer
One side effect of restricting the RF of CNNs is that more frequency information is lost.
1 code implementation • 4 Sep 2019 • Paul Primus, Hamid Eghbal-zadeh, David Eitelsebner, Khaled Koutini, Andreas Arzt, Gerhard Widmer
Distribution mismatches between the data seen at training and at application time remain a major challenge in all application areas of machine learning.
3 code implementations • 3 Jul 2019 • Khaled Koutini, Hamid Eghbal-zadeh, Matthias Dorfer, Gerhard Widmer
To this end, we analyse the receptive field (RF) of these CNNs and demonstrate the importance of the RF to the generalization capability of the models.
1 code implementation • 22 Jun 2018 • Hamid Eghbal-zadeh, Lukas Fischer, Niko Popitsch, Florian Kromp, Sabine Taschner-Mandl, Khaled Koutini, Teresa Gerber, Eva Bozsaky, Peter F. Ambros, Inge M. Ambros, Gerhard Widmer, Bernhard A. Moser
We show, that Deep SNP is capable of successfully predicting the presence or absence of a breakpoint in large genomic windows and outperforms state-of-the-art neural network models.