Search Results for author: Konstantinos Drossos

Found 33 papers, 19 papers with code

Representation Learning for Audio Privacy Preservation using Source Separation and Robust Adversarial Learning

no code implementations9 Aug 2023 Diep Luong, Minh Tran, Shayan Gharib, Konstantinos Drossos, Tuomas Virtanen

Privacy preservation has long been a concern in smart acoustic monitoring systems, where speech can be passively recorded along with a target signal in the system's operating environment.

Privacy Preserving Representation Learning

Adversarial Representation Learning for Robust Privacy Preservation in Audio

1 code implementation29 Apr 2023 Shayan Gharib, Minh Tran, Diep Luong, Konstantinos Drossos, Tuomas Virtanen

In this study, we propose a novel adversarial training method for learning representations of audio recordings that effectively prevents the detection of speech activity from the latent features of the recordings.

Event Detection Representation Learning +1

Domestic Activity Clustering from Audio via Depthwise Separable Convolutional Autoencoder Network

1 code implementation4 Aug 2022 Yanxiong Li, Wenchang Cao, Konstantinos Drossos, Tuomas Virtanen

Automatic estimation of domestic activities from audio can be used to solve many problems, such as reducing the labor cost for nursing the elderly people.

Clustering

Clotho-AQA: A Crowdsourced Dataset for Audio Question Answering

no code implementations20 Apr 2022 Samuel Lipping, Parthasaarathy Sudarsanam, Konstantinos Drossos, Tuomas Virtanen

Audio question answering (AQA) is a multimodal translation task where a system analyzes an audio signal and a natural language question, to generate a desirable natural language answer.

Audio Question Answering Question Answering

Evaluating Off-the-Shelf Machine Listening and Natural Language Models for Automated Audio Captioning

no code implementations14 Oct 2021 Benno Weck, Xavier Favory, Konstantinos Drossos, Xavier Serra

Having attracted attention only recently, very few works on AAC study the performance of existing pre-trained audio and natural language processing resources.

Audio captioning Word Embeddings

Unsupervised Audio-Caption Aligning Learns Correspondences between Individual Sound Events and Textual Phrases

1 code implementation6 Oct 2021 Huang Xie, Okko Räsänen, Konstantinos Drossos, Tuomas Virtanen

We investigate unsupervised learning of correspondences between sound events and textual phrases through aligning audio clips with textual captions describing the content of a whole audio clip.

Event Detection Retrieval +1

Fairness and underspecification in acoustic scene classification: The case for disaggregated evaluations

no code implementations4 Oct 2021 Andreas Triantafyllopoulos, Manuel Milling, Konstantinos Drossos, Björn W. Schuller

Although these factors play a well-understood role in the performance of ASC models, most works report single evaluation metrics taking into account all different strata of a particular dataset.

Acoustic Scene Classification Fairness +1

Continual Learning for Automated Audio Captioning Using The Learning Without Forgetting Approach

1 code implementation16 Jul 2021 Jan Berg, Konstantinos Drossos

In our scenario, a pre-optimized AAC method is used for some unseen general audio signals and can update its parameters in order to adapt to the new information, given a new reference caption.

Audio captioning Continual Learning

Enriched Music Representations with Multiple Cross-modal Contrastive Learning

1 code implementation1 Apr 2021 Andres Ferraro, Xavier Favory, Konstantinos Drossos, Yuntae Kim, Dmitry Bogdanov

Modeling various aspects that make a music piece unique is a challenging task, requiring the combination of multiple sources of information.

Contrastive Learning Genre classification

Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and Tags

1 code implementation27 Oct 2020 Xavier Favory, Konstantinos Drossos, Tuomas Virtanen, Xavier Serra

In this work we propose a method for learning audio representations using an audio autoencoder (AAE), a general word embeddings model (WEM), and a multi-head self-attention (MHA) mechanism.

Representation Learning TAG +1

WaveTransformer: A Novel Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information

1 code implementation21 Oct 2020 An Tran, Konstantinos Drossos, Tuomas Virtanen

Automated audio captioning (AAC) is a novel task, where a method takes as an input an audio sample and outputs a textual description (i. e. a caption) of its contents.

Audio captioning Image Captioning +2

Multi-task Regularization Based on Infrequent Classes for Audio Captioning

1 code implementation9 Jul 2020 Emre Çakır, Konstantinos Drossos, Tuomas Virtanen

Audio captioning is a multi-modal task, focusing on using natural language for describing the contents of general audio.

Audio captioning

Revisiting Representation Learning for Singing Voice Separation with Sinkhorn Distances

1 code implementation6 Jul 2020 Stylianos Ioannis Mimilakis, Konstantinos Drossos, Gerald Schuller

In this work we present a method for unsupervised learning of audio representations, focused on the task of singing voice separation.

Sound Audio and Speech Processing

Temporal Sub-sampling of Audio Feature Sequences for Automated Audio Captioning

1 code implementation6 Jul 2020 Khoa Nguyen, Konstantinos Drossos, Tuomas Virtanen

In this work we present an approach that focuses on explicitly taking advantage of this difference of lengths between sequences, by applying a temporal sub-sampling to the audio input sequence.

Audio captioning

COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations

2 code implementations15 Jun 2020 Xavier Favory, Konstantinos Drossos, Tuomas Virtanen, Xavier Serra

Audio representation learning based on deep neural networks (DNNs) emerged as an alternative approach to hand-crafted features.

Representation Learning

Unsupervised Interpretable Representation Learning for Singing Voice Separation

1 code implementation3 Mar 2020 Stylianos I. Mimilakis, Konstantinos Drossos, Gerald Schuller

In this work, we present a method for learning interpretable music signal representations directly from waveform signals.

Denoising Music Source Separation +1

Sound Event Detection with Depthwise Separable and Dilated Convolutions

1 code implementation2 Feb 2020 Konstantinos Drossos, Stylianos I. Mimilakis, Shayan Gharib, Yanxiong Li, Tuomas Virtanen

The number of the channels of the CNNs and size of the weight matrices of the RNNs have a direct effect on the total amount of parameters of the SED method, which is to a couple of millions.

Event Detection Sound Event Detection

Memory Requirement Reduction of Deep Neural Networks Using Low-bit Quantization of Parameters

no code implementations1 Nov 2019 Niccoló Nicodemo, Gaurav Naithani, Konstantinos Drossos, Tuomas Virtanen, Roberto Saletti

The application of the low-bit quantization allows a 50% reduction of the DNN memory footprint while the STOI performance drops only by 2. 7%.

Quantization Speech Enhancement

Clotho: An Audio Captioning Dataset

7 code implementations21 Oct 2019 Konstantinos Drossos, Samuel Lipping, Tuomas Virtanen

Audio captioning is the novel task of general audio content description using free text.

Audio captioning Translation

Crowdsourcing a Dataset of Audio Captions

1 code implementation22 Jul 2019 Samuel Lipping, Konstantinos Drossos, Tuomas Virtanen

In this paper we present a three steps based framework for crowdsourcing an audio captioning dataset, based on concepts and practises followed for the creation of widely used image captioning and machine translations datasets.

Sound Audio and Speech Processing

Unsupervised Adversarial Domain Adaptation Based On The Wasserstein Distance For Acoustic Scene Classification

1 code implementation24 Apr 2019 Konstantinos Drossos, Paul Magron, Tuomas Virtanen

A challenging problem in deep learning-based machine listening field is the degradation of the performance when using data from unseen conditions.

Acoustic Scene Classification Classification +3

Examining the Mapping Functions of Denoising Autoencoders in Singing Voice Separation

no code implementations12 Apr 2019 Stylianos Ioannis Mimilakis, Konstantinos Drossos, Estefanía Cano, Gerald Schuller

We examine the mapping functions of neural networks based on the denoising autoencoder (DAE) model that are conditioned on the mixture magnitude spectra.

Denoising Knowledge Distillation +1

Unsupervised adversarial domain adaptation for acoustic scene classification

1 code implementation17 Aug 2018 Shayan Gharib, Konstantinos Drossos, Emre Çakır, Dmitriy Serdyuk, Tuomas Virtanen

A general problem in acoustic scene classification task is the mismatched conditions between training and testing data, which significantly reduces the performance of the developed methods on classification accuracy.

Acoustic Scene Classification Classification +3

MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation

2 code implementations1 Feb 2018 Konstantinos Drossos, Stylianos Ioannis Mimilakis, Dmitriy Serdyuk, Gerald Schuller, Tuomas Virtanen, Yoshua Bengio

Current state of the art (SOTA) results in monaural singing voice separation are obtained with deep learning based methods.

Sound Audio and Speech Processing

Automated Audio Captioning with Recurrent Neural Networks

no code implementations30 Jun 2017 Konstantinos Drossos, Sharath Adavanne, Tuomas Virtanen

The encoder is a multi-layered, bi-directional gated recurrent unit (GRU) and the decoder a multi-layered GRU with a classification layer connected to the last GRU of the decoder.

Audio captioning General Classification +3

Stacked Convolutional and Recurrent Neural Networks for Bird Audio Detection

no code implementations7 Jun 2017 Sharath Adavanne, Konstantinos Drossos, Emre Çakır, Tuomas Virtanen

This paper studies the detection of bird calls in audio segments using stacked convolutional and recurrent neural networks.

Bird Audio Detection Data Augmentation +1

Convolutional Recurrent Neural Networks for Bird Audio Detection

no code implementations7 Mar 2017 EmreÇakır, Sharath Adavanne, Giambattista Parascandolo, Konstantinos Drossos, Tuomas Virtanen

Bird sounds possess distinctive spectral structure which may exhibit small shifts in spectrum depending on the bird species and environmental conditions.

Bird Audio Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.