Search Results for author: Mark D. Plumbley

Found 69 papers, 46 papers with code

AudioLDM: Text-to-Audio Generation with Latent Diffusion Models

3 code implementations29 Jan 2023 Haohe Liu, Zehua Chen, Yi Yuan, Xinhao Mei, Xubo Liu, Danilo Mandic, Wenwu Wang, Mark D. Plumbley

By learning the latent representations of audio signals and their compositions without modeling the cross-modal relationship, AudioLDM is advantageous in both generation quality and computational efficiency.

AudioCaps Audio Generation +2

Separate Anything You Describe

1 code implementation9 Aug 2023 Xubo Liu, Qiuqiang Kong, Yan Zhao, Haohe Liu, Yi Yuan, Yuzhuo Liu, Rui Xia, Yuxuan Wang, Mark D. Plumbley, Wenwu Wang

In this work, we introduce AudioSep, a foundation model for open-domain audio source separation with natural language queries.

Audio Source Separation Natural Language Queries +2

AudioSR: Versatile Audio Super-resolution at Scale

1 code implementation13 Sep 2023 Haohe Liu, Ke Chen, Qiao Tian, Wenwu Wang, Mark D. Plumbley

Audio super-resolution is a fundamental task that predicts high-frequency components for low-resolution audio, enhancing audio quality in digital applications.

Audio Super-Resolution Super-Resolution

Single Channel Audio Source Separation using Convolutional Denoising Autoencoders

4 code implementations23 Mar 2017 Emad M. Grais, Mark D. Plumbley

Each CDAE is trained to separate one source and treats the other sources as background noise.

Sound 68T01 H.5.5; I.5; I.2.6; I.4.3

WavJourney: Compositional Audio Creation with Large Language Models

1 code implementation26 Jul 2023 Xubo Liu, Zhongkai Zhu, Haohe Liu, Yi Yuan, Meng Cui, Qiushi Huang, Jinhua Liang, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang

Subjective evaluations demonstrate the potential of WavJourney in crafting engaging storytelling audio content from text.

Audio Generation

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research

3 code implementations30 Mar 2023 Xinhao Mei, Chutong Meng, Haohe Liu, Qiuqiang Kong, Tom Ko, Chengqi Zhao, Mark D. Plumbley, Yuexian Zou, Wenwu Wang

To address this data scarcity issue, we introduce WavCaps, the first large-scale weakly-labelled audio captioning dataset, comprising approximately 400k audio clips with paired captions.

 Ranked #1 on Zero-Shot Environment Sound Classification on ESC-50 (using extra training data)

Audio captioning Event Detection +6

Audio Set classification with attention model: A probabilistic perspective

5 code implementations2 Nov 2017 Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark D. Plumbley

Then the classification of a bag is the expectation of the classification output of the instances in the bag with respect to the learned probability measure.

Sound Audio and Speech Processing

Separate What You Describe: Language-Queried Audio Source Separation

1 code implementation28 Mar 2022 Xubo Liu, Haohe Liu, Qiuqiang Kong, Xinhao Mei, Jinzheng Zhao, Qiushi Huang, Mark D. Plumbley, Wenwu Wang

In this paper, we introduce the task of language-queried audio source separation (LASS), which aims to separate a target source from an audio mixture based on a natural language query of the target source (e. g., "a man tells a joke followed by people laughing").

AudioCaps Audio Source Separation

Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy

1 code implementation1 May 2019 Yin Cao, Qiuqiang Kong, Turab Iqbal, Fengyan An, Wenwu Wang, Mark D. Plumbley

In this paper, it is experimentally shown that the training information of SED is able to contribute to the direction of arrival estimation (DOAE).

Sound Audio and Speech Processing

WavCraft: Audio Editing and Generation with Natural Language Prompts

1 code implementation14 Mar 2024 Jinhua Liang, huan zhang, Haohe Liu, Yin Cao, Qiuqiang Kong, Xubo Liu, Wenwu Wang, Mark D. Plumbley, Huy Phan, Emmanouil Benetos

We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing.

In-Context Learning

Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning

1 code implementation21 Jul 2021 Xubo Liu, Turab Iqbal, Jinzheng Zhao, Qiushi Huang, Mark D. Plumbley, Wenwu Wang

We evaluate our approach on the UrbanSound8K dataset, compared to SampleRNN, with the performance metrics measuring the quality and diversity of generated sounds.

Music Generation Representation Learning +1

Event-Independent Network for Polyphonic Sound Event Localization and Detection

2 code implementations30 Sep 2020 Yin Cao, Turab Iqbal, Qiuqiang Kong, Yue Zhong, Wenwu Wang, Mark D. Plumbley

In this paper, a novel event-independent network for polyphonic sound event localization and detection is proposed.

Audio and Speech Processing Sound

An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection

3 code implementations25 Oct 2020 Yin Cao, Turab Iqbal, Qiuqiang Kong, Fengyan An, Wenwu Wang, Mark D. Plumbley

Polyphonic sound event localization and detection (SELD), which jointly performs sound event detection (SED) and direction-of-arrival (DoA) estimation, detects the type and occurrence time of sound events as well as their corresponding DoA angles simultaneously.

Sound Audio and Speech Processing

Gender Bias in Depression Detection Using Audio Features

2 code implementations28 Oct 2020 Andrew Bailey, Mark D. Plumbley

Depression is a large-scale mental health problem and a challenging area for machine learning researchers in detection of depression.

BIG-bench Machine Learning Depression Detection

Simple Pooling Front-ends For Efficient Audio Classification

1 code implementation3 Oct 2022 Xubo Liu, Haohe Liu, Qiuqiang Kong, Xinhao Mei, Mark D. Plumbley, Wenwu Wang

Recently, there has been increasing interest in building efficient audio neural networks for on-device scenarios.

Audio Classification

CL4AC: A Contrastive Loss for Audio Captioning

2 code implementations21 Jul 2021 Xubo Liu, Qiushi Huang, Xinhao Mei, Tom Ko, H Lilian Tang, Mark D. Plumbley, Wenwu Wang

Automated Audio captioning (AAC) is a cross-modal translation task that aims to use natural language to describe the content of an audio clip.

Audio captioning Translation

Audio Captioning Transformer

1 code implementation21 Jul 2021 Xinhao Mei, Xubo Liu, Qiushi Huang, Mark D. Plumbley, Wenwu Wang

In this paper, we propose an Audio Captioning Transformer (ACT), which is a full Transformer network based on an encoder-decoder architecture and is totally convolution-free.

AudioCaps Audio captioning

Sound Event Detection and Time-Frequency Segmentation from Weakly Labelled Data

2 code implementations12 Apr 2018 Qiuqiang Kong, Yong Xu, Iwona Sobieraj, Wenwu Wang, Mark D. Plumbley

Sound event detection (SED) aims to detect when and recognize what sound events happen in an audio clip.

Sound Audio and Speech Processing

On Metric Learning for Audio-Text Cross-Modal Retrieval

1 code implementation29 Mar 2022 Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang

We present an extensive evaluation of popular metric learning objectives on the AudioCaps and Clotho datasets.

AudioCaps Cross-Modal Retrieval +4

A joint separation-classification model for sound event detection of weakly labelled data

2 code implementations8 Nov 2017 Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark D. Plumbley

First, we propose a separation mapping from the time-frequency (T-F) representation of an audio to the T-F segmentation masks of the audio events.

Sound Audio and Speech Processing

Learning with Out-of-Distribution Data for Audio Classification

1 code implementation11 Feb 2020 Turab Iqbal, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang

The proposed method uses an auxiliary classifier, trained on data that is known to be in-distribution, for detection and relabelling.

Audio Classification General Classification

Learning Temporal Resolution in Spectrogram for Audio Classification

1 code implementation4 Oct 2022 Haohe Liu, Xubo Liu, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley

The audio spectrogram is a time-frequency representation that has been widely used for audio classification.

Audio Classification General Classification

Segment-level Metric Learning for Few-shot Bioacoustic Event Detection

1 code implementation15 Jul 2022 Haohe Liu, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley

In addition, we use transductive inference on the validation set during training for better adaptation to novel classes.

Event Detection Few-Shot Learning +2

Multi-Band Multi-Resolution Fully Convolutional Neural Networks for Singing Voice Separation

1 code implementation21 Oct 2019 Emad M. Grais, Fei Zhao, Mark D. Plumbley

In the spectrogram of a mixture of singing voices and music signals, there is usually more information about the voice in the low frequency bands than the high frequency bands.

Dimensionality Reduction

Continual Learning For On-Device Environmental Sound Classification

1 code implementation15 Jul 2022 Yang Xiao, Xubo Liu, James King, Arshdeep Singh, Eng Siong Chng, Mark D. Plumbley, Wenwu Wang

Experimental results on the DCASE 2019 Task 1 and ESC-50 dataset show that our proposed method outperforms baseline continual learning methods on classification accuracy and computational efficiency, indicating our method can efficiently and incrementally learn new classes without the catastrophic forgetting problem for on-device environmental sound classification.

Classification Computational Efficiency +3

Ontology-aware Learning and Evaluation for Audio Tagging

1 code implementation22 Nov 2022 Haohe Liu, Qiuqiang Kong, Xubo Liu, Xinhao Mei, Wenwu Wang, Mark D. Plumbley

The proposed metric, ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation.

Audio Tagging

Single-Channel Signal Separation and Deconvolution with Generative Adversarial Networks

1 code implementation14 Jun 2019 Qiuqiang Kong, Yong Xu, Wenwu Wang, Philip J. B. Jackson, Mark D. Plumbley

Single-channel signal separation and deconvolution aims to separate and deconvolve individual sources from a single-channel mixture and is a challenging problem in which no prior knowledge of the mixing filters is available.

Generative Adversarial Network Image Inpainting

ARCA23K: An audio dataset for investigating open-set label noise

2 code implementations19 Sep 2021 Turab Iqbal, Yin Cao, Andrew Bailey, Mark D. Plumbley, Wenwu Wang

We show that the majority of labelling errors in ARCA23K are due to out-of-vocabulary audio clips, and we refer to this type of label noise as open-set label noise.

Representation Learning

E-PANNs: Sound Recognition Using Efficient Pre-trained Audio Neural Networks

1 code implementation30 May 2023 Arshdeep Singh, Haohe Liu, Mark D. Plumbley

Sounds carry an abundance of information about activities and events in our everyday environment, such as traffic noise, road works, music, or people talking.

Audio Tagging

Convolutional Gated Recurrent Neural Network Incorporating Spatial Features for Audio Tagging

2 code implementations24 Feb 2017 Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley

In this paper, we propose to use a convolutional neural network (CNN) to extract robust features from mel-filter banks (MFBs), spectrograms or even raw waveforms for audio tagging.

Audio Tagging

Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network

1 code implementation17 Apr 2015 Andrew J. R. Simpson, Gerard Roma, Mark D. Plumbley

Identification and extraction of singing voice from within musical mixtures is a key challenge in source separation and machine audition.

Speech Separation

Large-scale weakly supervised audio classification using gated convolutional neural network

3 code implementations1 Oct 2017 Yong Xu, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley

In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly supervised sound event detection task of Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 challenge.

Sound Audio and Speech Processing

Audio Tagging on an Embedded Hardware Platform

1 code implementation15 Jun 2023 Gabriel Bibbo, Arshdeep Singh, Mark D. Plumbley

In this paper, we analyze how the performance of large-scale pretrained audio neural networks designed for audio pattern recognition changes when deployed on a hardware such as Raspberry Pi.

Audio Classification Audio Tagging

Text-Driven Foley Sound Generation With Latent Diffusion Model

1 code implementation17 Jun 2023 Yi Yuan, Haohe Liu, Xubo Liu, Xiyuan Kang, Peipei Wu, Mark D. Plumbley, Wenwu Wang

We have observed that the feature embedding extracted by the text encoder can significantly affect the performance of the generation model.

Transfer Learning

Sound Event Detection: A Tutorial

1 code implementation12 Jul 2021 Annamaria Mesaros, Toni Heittola, Tuomas Virtanen, Mark D. Plumbley

The goal of automatic sound event detection (SED) methods is to recognize what is happening in an audio signal and when it is happening.

BIG-bench Machine Learning Event Detection +1

Efficient Similarity-based Passive Filter Pruning for Compressing CNNs

1 code implementation27 Oct 2022 Arshdeep Singh, Mark D. Plumbley

However, the computational complexity of computing the pairwise similarity matrix is high, particularly when a convolutional layer has many filters.

Acoustic Scene Classification Scene Classification

Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging

2 code implementations13 Jul 2016 Yong Xu, Qiang Huang, Wenwu Wang, Peter Foster, Siddharth Sigtia, Philip J. B. Jackson, Mark D. Plumbley

For the unsupervised feature learning, we propose to use a symmetric or asymmetric deep de-noising auto-encoder (sDAE or aDAE) to generate new data-driven features from the Mel-Filter Banks (MFBs) features.

Audio Tagging General Classification +1

Attention and Localization based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging

1 code implementation17 Mar 2017 Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley

Audio tagging aims to perform multi-label classification on audio chunks and it is a newly proposed task in the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge.

Sound

Selective-Memory Meta-Learning with Environment Representations for Sound Event Localization and Detection

1 code implementation27 Dec 2023 Jinbo Hu, Yin Cao, Ming Wu, Qiuqiang Kong, Feiran Yang, Mark D. Plumbley, Jun Yang

In addition, we introduce environment representations to characterize different acoustic settings, enhancing the adaptability of our attenuation approach to various environments.

Meta-Learning Sound Event Localization and Detection

Raw Multi-Channel Audio Source Separation using Multi-Resolution Convolutional Auto-Encoders

no code implementations2 Mar 2018 Emad M. Grais, Dominic Ward, Mark D. Plumbley

Supervised multi-channel audio source separation requires extracting useful spectral, temporal, and spatial features from the mixed signals.

Audio Source Separation

Multi-Resolution Fully Convolutional Neural Networks for Monaural Audio Source Separation

no code implementations28 Oct 2017 Emad M. Grais, Hagen Wierstorf, Dominic Ward, Mark D. Plumbley

In deep neural networks with convolutional layers, each layer typically has fixed-size/single-resolution receptive field (RF).

Audio Source Separation

Fully DNN-based Multi-label regression for audio tagging

no code implementations24 Jun 2016 Yong Xu, Qiang Huang, Wenwu Wang, Philip J. B. Jackson, Mark D. Plumbley

Compared with the conventional Gaussian Mixture Model (GMM) and support vector machine (SVM) methods, the proposed fully DNN-based method could well utilize the long-term temporal information with the whole chunk as the input.

Audio Tagging Event Detection +4

Automatic Environmental Sound Recognition: Performance versus Computational Cost

no code implementations15 Jul 2016 Siddharth Sigtia, Adam M. Stark, Sacha Krstulovic, Mark D. Plumbley

In the context of the Internet of Things (IoT), sound sensing applications are required to run on embedded platforms where notions of product pricing and form factor impose hard constraints on the available computing power.

General Classification Sound Classification

Acoustic Scene Classification

no code implementations13 Nov 2014 Daniele Barchiesi, Dimitrios Giannoulis, Dan Stowell, Mark D. Plumbley

We then describe a range of different algorithms submitted for a data challenge that was held to provide a general and fair benchmark for ASC techniques.

Acoustic Scene Classification Classification +2

Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning

no code implementations26 May 2014 Dan Stowell, Mark D. Plumbley

Feature learning can be performed at large scale and "unsupervised", meaning it requires no manual data labelling, yet it can improve performance on "supervised" tasks such as classification.

Classification General Classification

Referenceless Performance Evaluation of Audio Source Separation using Deep Neural Networks

no code implementations1 Nov 2018 Emad M. Grais, Hagen Wierstorf, Dominic Ward, Russell Mason, Mark D. Plumbley

Current performance evaluation for audio source separation depends on comparing the processed or separated signals with reference signals.

Audio Source Separation blind source separation

Diverse Audio Captioning via Adversarial Training

no code implementations13 Oct 2021 Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang

As different people may describe an audio clip from different aspects using distinct words and grammars, we argue that an audio captioning system should have the ability to generate diverse captions for a fixed audio clip and across similar audio clips.

Audio captioning Generative Adversarial Network +1

A Passive Similarity based CNN Filter Pruning for Efficient Acoustic Scene Classification

1 code implementation29 Mar 2022 Arshdeep Singh, Mark D. Plumbley

We propose a passive filter pruning framework, where a few convolutional filters from the CNNs are eliminated to yield compressed CNNs.

Acoustic Scene Classification Scene Classification

Automated Audio Captioning: An Overview of Recent Progress and New Challenges

no code implementations12 May 2022 Xinhao Mei, Xubo Liu, Mark D. Plumbley, Wenwu Wang

In this paper, we present a comprehensive review of the published contributions in automated audio captioning, from a variety of existing approaches to evaluation metrics and datasets.

Audio captioning Caption Generation +2

Low-complexity CNNs for Acoustic Scene Classification

no code implementations2 Aug 2022 Arshdeep Singh, James A King, Xubo Liu, Wenwu Wang, Mark D. Plumbley

This technical report describes the SurreyAudioTeam22s submission for DCASE 2022 ASC Task 1, Low-Complexity Acoustic Scene Classification (ASC).

Acoustic Scene Classification Classification +1

Automated Audio Captioning via Fusion of Low- and High- Dimensional Features

no code implementations10 Oct 2022 Jianyuan Sun, Xubo Liu, Xinhao Mei, Mark D. Plumbley, Volkan Kilic, Wenwu Wang

Moreover, in LHDFF, a new PANNs encoder is proposed called Residual PANNs (RPANNs) by fusing the low-dimensional feature from the intermediate convolution layer output and the high-dimensional feature from the final layer output of PANNs.

AudioCaps Audio captioning +1

Towards Generating Diverse Audio Captions via Adversarial Training

no code implementations5 Dec 2022 Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang

Captions generated by existing models are generally faithful to the content of audio clips, however, these machine-generated captions are often deterministic (e. g., generating a fixed caption for a given audio clip), simple (e. g., using common words and simple grammar), and generic (e. g., generating the same caption for similar audio clips).

Audio captioning Generative Adversarial Network

Efficient CNNs via Passive Filter Pruning

no code implementations5 Apr 2023 Arshdeep Singh, Mark D. Plumbley

In comparison to the existing active filter pruning methods, the proposed pruning method is at least 4. 5 times faster in computing filter importance and is able to achieve similar performance compared to that of the active filter pruning methods.

Computational Efficiency Image Classification +1

Compressing audio CNNs with graph centrality based filter pruning

no code implementations5 May 2023 James A King, Arshdeep Singh, Mark D. Plumbley

For large-scale CNNs such as PANNs designed for audio tagging, our method reduces 24\% computations per inference with 41\% fewer parameters at a slight improvement in performance.

Acoustic Scene Classification Audio Classification +2

Adapting Language-Audio Models as Few-Shot Audio Learners

no code implementations28 May 2023 Jinhua Liang, Xubo Liu, Haohe Liu, Huy Phan, Emmanouil Benetos, Mark D. Plumbley, Wenwu Wang

We presented the Treff adapter, a training-efficient adapter for CLAP, to boost zero-shot classification performance by making use of a small set of labelled data.

Audio Classification Few-Shot Learning +1

META-SELD: Meta-Learning for Fast Adaptation to the new environment in Sound Event Localization and Detection

no code implementations17 Aug 2023 Jinbo Hu, Yin Cao, Ming Wu, Feiran Yang, Ziying Yu, Wenwu Wang, Mark D. Plumbley, Jun Yang

For learning-based sound event localization and detection (SELD) methods, different acoustic environments in the training and test sets may result in large performance differences in the validation and evaluation stages.

Meta-Learning Sound Event Localization and Detection

Retrieval-Augmented Text-to-Audio Generation

no code implementations14 Sep 2023 Yi Yuan, Haohe Liu, Xubo Liu, Qiushi Huang, Mark D. Plumbley, Wenwu Wang

Despite recent progress in text-to-audio (TTA) generation, we show that the state-of-the-art models, such as AudioLDM, trained on datasets with an imbalanced class distribution, such as AudioCaps, are biased in their generation performance.

AudioCaps Audio Generation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.