Search Results for author: Konstantinos Drossos

Found 33 papers, 19 papers with code

Representation Learning for Audio Privacy Preservation using Source Separation and Robust Adversarial Learning

no code implementations • 9 Aug 2023 • Diep Luong, Minh Tran, Shayan Gharib, Konstantinos Drossos, Tuomas Virtanen

Privacy preservation has long been a concern in smart acoustic monitoring systems, where speech can be passively recorded along with a target signal in the system's operating environment.

Privacy Preserving Representation Learning

Paper
Add Code

Adversarial Representation Learning for Robust Privacy Preservation in Audio

1 code implementation • 29 Apr 2023 • Shayan Gharib, Minh Tran, Diep Luong, Konstantinos Drossos, Tuomas Virtanen

In this study, we propose a novel adversarial training method for learning representations of audio recordings that effectively prevents the detection of speech activity from the latent features of the recordings.

Event Detection Representation Learning +1

Paper
Code

Domestic Activity Clustering from Audio via Depthwise Separable Convolutional Autoencoder Network

1 code implementation • 4 Aug 2022 • Yanxiong Li, Wenchang Cao, Konstantinos Drossos, Tuomas Virtanen

Automatic estimation of domestic activities from audio can be used to solve many problems, such as reducing the labor cost for nursing the elderly people.

Clustering

Paper
Code

Clotho-AQA: A Crowdsourced Dataset for Audio Question Answering

no code implementations • 20 Apr 2022 • Samuel Lipping, Parthasaarathy Sudarsanam, Konstantinos Drossos, Tuomas Virtanen

Audio question answering (AQA) is a multimodal translation task where a system analyzes an audio signal and a natural language question, to generate a desirable natural language answer.

Audio Question Answering Question Answering

Paper
Add Code

Evaluating Off-the-Shelf Machine Listening and Natural Language Models for Automated Audio Captioning

no code implementations • 14 Oct 2021 • Benno Weck, Xavier Favory, Konstantinos Drossos, Xavier Serra

Having attracted attention only recently, very few works on AAC study the performance of existing pre-trained audio and natural language processing resources.

Audio captioning Word Embeddings

Paper
Add Code

Unsupervised Audio-Caption Aligning Learns Correspondences between Individual Sound Events and Textual Phrases

1 code implementation • 6 Oct 2021 • Huang Xie, Okko Räsänen, Konstantinos Drossos, Tuomas Virtanen

We investigate unsupervised learning of correspondences between sound events and textual phrases through aligning audio clips with textual captions describing the content of a whole audio clip.

Event Detection Retrieval +1

Paper
Code

Fairness and underspecification in acoustic scene classification: The case for disaggregated evaluations

no code implementations • 4 Oct 2021 • Andreas Triantafyllopoulos, Manuel Milling, Konstantinos Drossos, Björn W. Schuller

Although these factors play a well-understood role in the performance of ASC models, most works report single evaluation metrics taking into account all different strata of a particular dataset.

Acoustic Scene Classification Fairness +1

Paper
Add Code

Continual Learning for Automated Audio Captioning Using The Learning Without Forgetting Approach

1 code implementation • 16 Jul 2021 • Jan Berg, Konstantinos Drossos

In our scenario, a pre-optimized AAC method is used for some unseen general audio signals and can update its parameters in order to adapt to the new information, given a new reference caption.

Audio captioning Continual Learning

Paper
Code

Automatic Analysis of the Emotional Content of Speech in Daylong Child-Centered Recordings from a Neonatal Intensive Care Unit

no code implementations • 14 Jun 2021 • Einari Vaaras, Sari Ahlqvist-Björkroth, Konstantinos Drossos, Okko Räsänen

Researchers have recently started to study how the emotional speech heard by young infants can affect their developmental outcomes.

Active Learning Binary Classification +3

Paper
Add Code

Enriched Music Representations with Multiple Cross-modal Contrastive Learning

1 code implementation • 1 Apr 2021 • Andres Ferraro, Xavier Favory, Konstantinos Drossos, Yuntae Kim, Dmitry Bogdanov

Modeling various aspects that make a music piece unique is a challenging task, requiring the combination of multiple sources of information.

Contrastive Learning Genre classification

Paper
Code

Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and Tags

1 code implementation • 27 Oct 2020 • Xavier Favory, Konstantinos Drossos, Tuomas Virtanen, Xavier Serra

In this work we propose a method for learning audio representations using an audio autoencoder (AAE), a general word embeddings model (WEM), and a multi-head self-attention (MHA) mechanism.

Representation Learning TAG +1

Paper
Code

WaveTransformer: A Novel Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information

1 code implementation • 21 Oct 2020 • An Tran, Konstantinos Drossos, Tuomas Virtanen

Automated audio captioning (AAC) is a novel task, where a method takes as an input an audio sample and outputs a textual description (i. e. a caption) of its contents.

Audio captioning Image Captioning +2

Paper
Code

Conditioned Time-Dilated Convolutions for Sound Event Detection

no code implementations • 10 Jul 2020 • Konstantinos Drossos, Stylianos I. Mimilakis, Tuomas Virtanen

Sound event detection (SED) is the task of identifying sound events along with their onset and offset times.

Event Detection Language Modelling +1

Paper
Add Code

Multi-task Regularization Based on Infrequent Classes for Audio Captioning

1 code implementation • 9 Jul 2020 • Emre Çakır, Konstantinos Drossos, Tuomas Virtanen

Audio captioning is a multi-modal task, focusing on using natural language for describing the contents of general audio.

Audio captioning

Paper
Code

Revisiting Representation Learning for Singing Voice Separation with Sinkhorn Distances

1 code implementation • 6 Jul 2020 • Stylianos Ioannis Mimilakis, Konstantinos Drossos, Gerald Schuller

In this work we present a method for unsupervised learning of audio representations, focused on the task of singing voice separation.

Sound Audio and Speech Processing

Paper
Code

Depthwise Separable Convolutions Versus Recurrent Neural Networks for Monaural Singing Voice Separation

no code implementations • 6 Jul 2020 • Pyry Pyykkönen, Styliannos I. Mimilakis, Konstantinos Drossos, Tuomas Virtanen

We focus on singing voice separation, employing an RNN architecture, and we replace the RNNs with DWS convolutions (DWS-CNNs).

Music Source Separation

Paper
Add Code

Temporal Sub-sampling of Audio Feature Sequences for Automated Audio Captioning

1 code implementation • 6 Jul 2020 • Khoa Nguyen, Konstantinos Drossos, Tuomas Virtanen

In this work we present an approach that focuses on explicitly taking advantage of this difference of lengths between sequences, by applying a temporal sub-sampling to the audio input sequence.

Audio captioning

Paper
Code

COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations

2 code implementations • 15 Jun 2020 • Xavier Favory, Konstantinos Drossos, Tuomas Virtanen, Xavier Serra

Audio representation learning based on deep neural networks (DNNs) emerged as an alternative approach to hand-crafted features.

Representation Learning

Paper
Code

Unsupervised Interpretable Representation Learning for Singing Voice Separation

1 code implementation • 3 Mar 2020 • Stylianos I. Mimilakis, Konstantinos Drossos, Gerald Schuller

In this work, we present a method for learning interpretable music signal representations directly from waveform signals.

Denoising Music Source Separation +1

Paper
Code

Sound Event Detection with Depthwise Separable and Dilated Convolutions

1 code implementation • 2 Feb 2020 • Konstantinos Drossos, Stylianos I. Mimilakis, Shayan Gharib, Yanxiong Li, Tuomas Virtanen

The number of the channels of the CNNs and size of the weight matrices of the RNNs have a direct effect on the total amount of parameters of the SED method, which is to a couple of millions.

Event Detection Sound Event Detection

Paper
Code

Memory Requirement Reduction of Deep Neural Networks Using Low-bit Quantization of Parameters

no code implementations • 1 Nov 2019 • Niccoló Nicodemo, Gaurav Naithani, Konstantinos Drossos, Tuomas Virtanen, Roberto Saletti

The application of the low-bit quantization allows a 50% reduction of the DNN memory footprint while the STOI performance drops only by 2. 7%.

Quantization Speech Enhancement

Paper
Add Code

Clotho: An Audio Captioning Dataset

7 code implementations • 21 Oct 2019 • Konstantinos Drossos, Samuel Lipping, Tuomas Virtanen

Audio captioning is the novel task of general audio content description using free text.

Audio captioning Translation

Paper
Code

Crowdsourcing a Dataset of Audio Captions

1 code implementation • 22 Jul 2019 • Samuel Lipping, Konstantinos Drossos, Tuomas Virtanen

In this paper we present a three steps based framework for crowdsourcing an audio captioning dataset, based on concepts and practises followed for the creation of widely used image captioning and machine translations datasets.

Sound Audio and Speech Processing

Paper
Code

Language Modelling for Sound Event Detection with Teacher Forcing and Scheduled Sampling

1 code implementation • Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 2019 • Konstantinos Drossos, Shayan Gharib, Paul Magron, Tuomas Virtanen

On the contrary, with our method there is a decrease of 4% at F1 score and an increase of 7% at ER for the TUT-SED Synthetic 2016 dataset.

Event Detection Language Modelling +2

Paper
Code

Unsupervised Adversarial Domain Adaptation Based On The Wasserstein Distance For Acoustic Scene Classification

1 code implementation • 24 Apr 2019 • Konstantinos Drossos, Paul Magron, Tuomas Virtanen

A challenging problem in deep learning-based machine listening field is the degradation of the performance when using data from unseen conditions.

Acoustic Scene Classification Classification +3

Paper
Code

Examining the Mapping Functions of Denoising Autoencoders in Singing Voice Separation

no code implementations • 12 Apr 2019 • Stylianos Ioannis Mimilakis, Konstantinos Drossos, Estefanía Cano, Gerald Schuller

We examine the mapping functions of neural networks based on the denoising autoencoder (DAE) model that are conditioned on the mixture magnitude spectra.

Denoising Knowledge Distillation +1

Paper
Add Code

Unsupervised adversarial domain adaptation for acoustic scene classification

1 code implementation • 17 Aug 2018 • Shayan Gharib, Konstantinos Drossos, Emre Çakır, Dmitriy Serdyuk, Tuomas Virtanen

A general problem in acoustic scene classification task is the mismatched conditions between training and testing data, which significantly reduces the performance of the developed methods on classification accuracy.

Acoustic Scene Classification Classification +3

Paper
Code

MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation

2 code implementations • 1 Feb 2018 • Konstantinos Drossos, Stylianos Ioannis Mimilakis, Dmitriy Serdyuk, Gerald Schuller, Tuomas Virtanen, Yoshua Bengio

Current state of the art (SOTA) results in monaural singing voice separation are obtained with deep learning based methods.

Sound Audio and Speech Processing

111

Paper
Code

Monaural Singing Voice Separation with Skip-Filtering Connections and Recurrent Inference of Time-Frequency Mask

no code implementations • 4 Nov 2017 • Stylianos Ioannis Mimilakis, Konstantinos Drossos, João F. Santos, Gerald Schuller, Tuomas Virtanen, Yoshua Bengio

Singing voice separation based on deep learning relies on the usage of time-frequency masking.

Sound Audio and Speech Processing

Paper
Add Code

Automated Audio Captioning with Recurrent Neural Networks

no code implementations • 30 Jun 2017 • Konstantinos Drossos, Sharath Adavanne, Tuomas Virtanen

The encoder is a multi-layered, bi-directional gated recurrent unit (GRU) and the decoder a multi-layered GRU with a classification layer connected to the last GRU of the decoder.

Audio captioning General Classification +3

Paper
Add Code

Stacked Convolutional and Recurrent Neural Networks for Bird Audio Detection

no code implementations • 7 Jun 2017 • Sharath Adavanne, Konstantinos Drossos, Emre Çakır, Tuomas Virtanen

This paper studies the detection of bird calls in audio segments using stacked convolutional and recurrent neural networks.

Bird Audio Detection Data Augmentation +1

Paper
Add Code

Stacked Convolutional and Recurrent Neural Networks for Music Emotion Recognition

no code implementations • 7 Jun 2017 • Miroslav Malik, Sharath Adavanne, Konstantinos Drossos, Tuomas Virtanen, Dasa Ticha, Roman Jarina

This paper studies the emotion recognition from musical tracks in the 2-dimensional valence-arousal (V-A) emotional space.

Emotion Recognition Music Emotion Recognition

Paper
Add Code

Convolutional Recurrent Neural Networks for Bird Audio Detection

no code implementations • 7 Mar 2017 • EmreÇakır, Sharath Adavanne, Giambattista Parascandolo, Konstantinos Drossos, Tuomas Virtanen

Bird sounds possess distinctive spectral structure which may exhibit small shifts in spectrum depending on the bird species and environmental conditions.

Bird Audio Detection

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.