Search Results for author: Wenwu Wang

Found 74 papers, 46 papers with code

WavCraft: Audio Editing and Generation with Natural Language Prompts

1 code implementation • 14 Mar 2024 • Jinhua Liang, huan zhang, Haohe Liu, Yin Cao, Qiuqiang Kong, Xubo Liu, Wenwu Wang, Mark D. Plumbley, Huy Phan, Emmanouil Benetos

We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing.

In-Context Learning

Paper
Code

Multi-level graph learning for audio event classification and human-perceived annoyance rating prediction

1 code implementation • 15 Dec 2023 • Yuanbo Hou, Qiaoqiao Ren, Siyang Song, Yuxin Song, Wenwu Wang, Dick Botteldooren

Specifically, this paper proposes a lightweight multi-level graph learning (MLGL) based on local and global semantic graphs to simultaneously perform audio event classification (AEC) and human annoyance rating prediction (ARP).

Graph Learning

Paper
Code

Fusion of Audio and Visual Embeddings for Sound Event Localization and Detection

1 code implementation • 14 Dec 2023 • Davide Berghi, Peipei Wu, Jinzheng Zhao, Wenwu Wang, Philip J. B. Jackson

Sound event localization and detection (SELD) combines two subtasks: sound event detection (SED) and direction of arrival (DOA) estimation.

Data Augmentation Event Detection +2

Paper
Code

Acoustic Prompt Tuning: Empowering Large Language Models with Audition Capabilities

1 code implementation • 30 Nov 2023 • Jinhua Liang, Xubo Liu, Wenwu Wang, Mark D. Plumbley, Huy Phan, Emmanouil Benetos

Moreover, we improve the framework of audio language model by using interleaved audio-text embeddings as the input sequence.

Audio Classification Few-Shot Audio Classification +2

Paper
Code

CM-PIE: Cross-modal perception for interactive-enhanced audio-visual video parsing

no code implementations • 11 Oct 2023 • Yaru Chen, Ruohao Guo, Xubo Liu, Peipei Wu, Guangyao Li, Zhenbo Li, Wenwu Wang

Audio-visual video parsing is the task of categorizing a video at the segment level with weak labels, and predicting them as audible or visible events.

Paper
Add Code

Audio Event-Relational Graph Representation Learning for Acoustic Scene Classification

1 code implementation • 5 Oct 2023 • Yuanbo Hou, Siyang Song, Chuang Yu, Wenwu Wang, Dick Botteldooren

The results show the feasibility of recognizing diverse acoustic scenes based on the audio event-relational graph.

Acoustic Scene Classification Graph Representation Learning +1

Paper
Code

Hierarchical Metadata Information Constrained Self-Supervised Learning for Anomalous Sound Detection Under Domain Shift

no code implementations • 14 Sep 2023 • Haiyan Lan, Qiaoxi Zhu, Jian Guan, Yuming Wei, Wenwu Wang

Self-supervised learning methods have achieved promising performance for anomalous sound detection (ASD) under domain shift, where the type of domain shift is considered in feature learning by incorporating section IDs.

Attribute Self-Supervised Learning +1

Paper
Add Code

Retrieval-Augmented Text-to-Audio Generation

no code implementations • 14 Sep 2023 • Yi Yuan, Haohe Liu, Xubo Liu, Qiushi Huang, Mark D. Plumbley, Wenwu Wang

Despite recent progress in text-to-audio (TTA) generation, we show that the state-of-the-art models, such as AudioLDM, trained on datasets with an imbalanced class distribution, such as AudioCaps, are biased in their generation performance.

Ranked #2 on Audio Generation on AudioCaps

AudioCaps Audio Generation +2

Paper
Add Code

AudioSR: Versatile Audio Super-resolution at Scale

1 code implementation • 13 Sep 2023 • Haohe Liu, Ke Chen, Qiao Tian, Wenwu Wang, Mark D. Plumbley

Audio super-resolution is a fundamental task that predicts high-frequency components for low-resolution audio, enhancing audio quality in digital applications.

Audio Super-Resolution Super-Resolution

864

Paper
Code

Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning

1 code implementation • 23 Aug 2023 • Yuanbo Hou, Siyang Song, Cheng Luo, Andrew Mitchell, Qiaoqiao Ren, Weicheng Xie, Jian Kang, Wenwu Wang, Dick Botteldooren

Sound events in daily life carry rich information about the objective world.

Graph Representation Learning

Paper
Code

META-SELD: Meta-Learning for Fast Adaptation to the new environment in Sound Event Localization and Detection

no code implementations • 17 Aug 2023 • Jinbo Hu, Yin Cao, Ming Wu, Feiran Yang, Ziying Yu, Wenwu Wang, Mark D. Plumbley, Jun Yang

For learning-based sound event localization and detection (SELD) methods, different acoustic environments in the training and test sets may result in large performance differences in the validation and evaluation stages.

Meta-Learning Sound Event Localization and Detection

Paper
Add Code

AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

1 code implementation • 10 Aug 2023 • Haohe Liu, Qiao Tian, Yi Yuan, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Yuping Wang, Wenwu Wang, Yuxuan Wang, Mark D. Plumbley

Any audio can be translated into LOA based on AudioMAE, a self-supervised pre-trained representation learning model.

Ranked #3 on Audio Generation on AudioCaps

Audio Generation In-Context Learning +2

2,032

Paper
Code

Separate Anything You Describe

1 code implementation • 9 Aug 2023 • Xubo Liu, Qiuqiang Kong, Yan Zhao, Haohe Liu, Yi Yuan, Yuzhuo Liu, Rui Xia, Yuxuan Wang, Mark D. Plumbley, Wenwu Wang

In this work, we introduce AudioSep, a foundation model for open-domain audio source separation with natural language queries.

Audio Source Separation Natural Language Queries +2

1,425

Paper
Code

WavJourney: Compositional Audio Creation with Large Language Models

1 code implementation • 26 Jul 2023 • Xubo Liu, Zhongkai Zhu, Haohe Liu, Yi Yuan, Meng Cui, Qiushi Huang, Jinhua Liang, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang

Subjective evaluations demonstrate the potential of WavJourney in crafting engaging storytelling audio content from text.

Audio Generation

501

Paper
Code

Text-Driven Foley Sound Generation With Latent Diffusion Model

1 code implementation • 17 Jun 2023 • Yi Yuan, Haohe Liu, Xubo Liu, Xiyuan Kang, Peipei Wu, Mark D. Plumbley, Wenwu Wang

We have observed that the feature embedding extracted by the text encoder can significantly affect the performance of the generation model.

Transfer Learning

Paper
Code

Knowledge Distillation for Efficient Audio-Visual Video Captioning

no code implementations • 16 Jun 2023 • Özkan Çaylı, Xubo Liu, Volkan Kılıç, Wenwu Wang

Automatically describing audio-visual content with texts, namely video captioning, has received significant attention due to its potential applications across diverse fields.

Audio-Visual Video Captioning Caption Generation +1

Paper
Add Code

Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning

no code implementations • 30 May 2023 • Jianyuan Sun, Xubo Liu, Xinhao Mei, Volkan Kılıç, Mark D. Plumbley, Wenwu Wang

Experimental results show that LHDFF outperforms existing audio captioning models.

Audio captioning

Paper
Add Code

Adapting Language-Audio Models as Few-Shot Audio Learners

no code implementations • 28 May 2023 • Jinhua Liang, Xubo Liu, Haohe Liu, Huy Phan, Emmanouil Benetos, Mark D. Plumbley, Wenwu Wang

We presented the Treff adapter, a training-efficient adapter for CLAP, to boost zero-shot classification performance by making use of a small set of labelled data.

Audio Classification Few-Shot Learning +1

Paper
Add Code

Time-weighted Frequency Domain Audio Representation with GMM Estimator for Anomalous Sound Detection

1 code implementation • 5 May 2023 • Jian Guan, Youde Liu, Qiaoxi Zhu, Tieran Zheng, Jiqing Han, Wenwu Wang

This paper presents Time-Weighted Frequency Domain Representation (TWFR) with the GMM method (TWFR-GMM) for anomalous sound detection.

Paper
Code

Anomalous Sound Detection using Audio Representation with Machine ID based Contrastive Learning Pretraining

no code implementations • 7 Apr 2023 • Jian Guan, Feiyang Xiao, Youde Liu, Qiaoxi Zhu, Wenwu Wang

This paper uses contrastive learning to refine audio representations for each machine ID, rather than for each audio sample.

Anomaly Detection Contrastive Learning

Paper
Add Code

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research

3 code implementations • 30 Mar 2023 • Xinhao Mei, Chutong Meng, Haohe Liu, Qiuqiang Kong, Tom Ko, Chengqi Zhao, Mark D. Plumbley, Yuexian Zou, Wenwu Wang

To address this data scarcity issue, we introduce WavCaps, the first large-scale weakly-labelled audio captioning dataset, comprising approximately 400k audio clips with paired captions.

Ranked #1 on Zero-Shot Environment Sound Classification on ESC-50 (using extra training data)

Audio captioning Event Detection +6

170

Paper
Code

Leveraging Pre-trained AudioLDM for Text to Sound Generation: A Benchmark Study

no code implementations • 7 Mar 2023 • Yi Yuan, Haohe Liu, Jinhua Liang, Xubo Liu, Mark D. Plumbley, Wenwu Wang

Deep neural networks have recently achieved breakthroughs in sound generation with text prompts.

Audio Generation Benchmarking +1

Paper
Add Code

Differentiable Bootstrap Particle Filters for Regime-Switching Models

no code implementations • 20 Feb 2023 • Wenhan Li, Xiongjie Chen, Wenwu Wang, Víctor Elvira, Yunpeng Li

Differentiable particle filters are an emerging class of particle filtering methods that use neural networks to construct and learn parametric state-space models.

Paper
Add Code

AudioLDM: Text-to-Audio Generation with Latent Diffusion Models

3 code implementations • 29 Jan 2023 • Haohe Liu, Zehua Chen, Yi Yuan, Xinhao Mei, Xubo Liu, Danilo Mandic, Wenwu Wang, Mark D. Plumbley

By learning the latent representations of audio signals and their compositions without modeling the cross-modal relationship, AudioLDM is advantageous in both generation quality and computational efficiency.

Ranked #9 on Audio Generation on AudioCaps

AudioCaps Audio Generation +2

22,393

Paper
Code

Unpaired Overwater Image Defogging Using Prior Map Guided CycleGAN

no code implementations • 23 Dec 2022 • Yaozong Mo, ChaoFeng Li, Wenqi Ren, Shaopeng Shang, Wenwu Wang, Xiao-Jun Wu

In this work, we propose a Prior map Guided CycleGAN (PG-CycleGAN) for defogging of images with overwater scenes.

Paper
Add Code

Towards Generating Diverse Audio Captions via Adversarial Training

no code implementations • 5 Dec 2022 • Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang

Captions generated by existing models are generally faithful to the content of audio clips, however, these machine-generated captions are often deterministic (e. g., generating a fixed caption for a given audio clip), simple (e. g., using common words and simple grammar), and generic (e. g., generating the same caption for similar audio clips).

Audio captioning Generative Adversarial Network

Paper
Add Code

ASiT: Local-Global Audio Spectrogram vIsion Transformer for Event Classification

1 code implementation • 23 Nov 2022 • Sara Atito, Muhammad Awais, Wenwu Wang, Mark D Plumbley, Josef Kittler

Transformers, which were originally developed for natural language processing, have recently generated significant interest in the computer vision and audio communities due to their flexibility in learning long-range relationships.

Keyword Spotting Self-Supervised Learning +1

Paper
Code

Ontology-aware Learning and Evaluation for Audio Tagging

1 code implementation • 22 Nov 2022 • Haohe Liu, Qiuqiang Kong, Xubo Liu, Xinhao Mei, Wenwu Wang, Mark D. Plumbley

The proposed metric, ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation.

Audio Tagging

Paper
Code

Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention

1 code implementation • 28 Oct 2022 • Xubo Liu, Qiushi Huang, Xinhao Mei, Haohe Liu, Qiuqiang Kong, Jianyuan Sun, Shengchen Li, Tom Ko, Yu Zhang, Lilian H. Tang, Mark D. Plumbley, Volkan Kılıç, Wenwu Wang

Audio captioning aims to generate text descriptions of audio clips.

AudioCaps Audio captioning +1

Paper
Code

Multi-dimensional Edge-based Audio Event Relational Graph Representation Learning for Acoustic Scene Classification

1 code implementation • 27 Oct 2022 • Yuanbo Hou, Siyang Song, Chuang Yu, Yuxin Song, Wenwu Wang, Dick Botteldooren

Experiments on a polyphonic acoustic scene dataset show that the proposed ERGL achieves competitive performance on ASC by using only a limited number of embeddings of audio events without any data augmentations.

Ranked #1 on Acoustic Scene Classification on TUT Urban Acoustic Scenes 2018

Acoustic Scene Classification Graph Representation Learning +1

Paper
Code

Personalized Dialogue Generation with Persona-Adaptive Attention

1 code implementation • 27 Oct 2022 • Qiushi Huang, Yu Zhang, Tom Ko, Xubo Liu, Bo Wu, Wenwu Wang, Lilian Tang

Persona-based dialogue systems aim to generate consistent responses based on historical context and predefined persona.

Dialogue Generation

Paper
Code

Automated Audio Captioning via Fusion of Low- and High- Dimensional Features

no code implementations • 10 Oct 2022 • Jianyuan Sun, Xubo Liu, Xinhao Mei, Mark D. Plumbley, Volkan Kilic, Wenwu Wang

Moreover, in LHDFF, a new PANNs encoder is proposed called Residual PANNs (RPANNs) by fusing the low-dimensional feature from the intermediate convolution layer output and the high-dimensional feature from the final layer output of PANNs.

AudioCaps Audio captioning +1

Paper
Add Code

Learning Temporal Resolution in Spectrogram for Audio Classification

1 code implementation • 4 Oct 2022 • Haohe Liu, Xubo Liu, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley

The audio spectrogram is a time-frequency representation that has been widely used for audio classification.

Audio Classification General Classification

Paper
Code

Simple Pooling Front-ends For Efficient Audio Classification

1 code implementation • 3 Oct 2022 • Xubo Liu, Haohe Liu, Qiuqiang Kong, Xinhao Mei, Mark D. Plumbley, Wenwu Wang

Recently, there has been increasing interest in building efficient audio neural networks for on-device scenarios.

Audio Classification

Paper
Code

Low-complexity CNNs for Acoustic Scene Classification

no code implementations • 2 Aug 2022 • Arshdeep Singh, James A King, Xubo Liu, Wenwu Wang, Mark D. Plumbley

This technical report describes the SurreyAudioTeam22s submission for DCASE 2022 ASC Task 1, Low-Complexity Acoustic Scene Classification (ASC).

Acoustic Scene Classification Classification +1

Paper
Add Code

Continual Learning For On-Device Environmental Sound Classification

1 code implementation • 15 Jul 2022 • Yang Xiao, Xubo Liu, James King, Arshdeep Singh, Eng Siong Chng, Mark D. Plumbley, Wenwu Wang

Experimental results on the DCASE 2019 Task 1 and ESC-50 dataset show that our proposed method outperforms baseline continual learning methods on classification accuracy and computational efficiency, indicating our method can efficiently and incrementally learn new classes without the catastrophic forgetting problem for on-device environmental sound classification.

Classification Computational Efficiency +3

Paper
Code

Segment-level Metric Learning for Few-shot Bioacoustic Event Detection

1 code implementation • 15 Jul 2022 • Haohe Liu, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley

In addition, we use transductive inference on the validation set during training for better adaptation to novel classes.

Event Detection Few-Shot Learning +2

Paper
Code

Automated Audio Captioning: An Overview of Recent Progress and New Challenges

no code implementations • 12 May 2022 • Xinhao Mei, Xubo Liu, Mark D. Plumbley, Wenwu Wang

In this paper, we present a comprehensive review of the published contributions in automated audio captioning, from a variety of existing approaches to evaluation metrics and datasets.

Audio captioning Caption Generation +2

Paper
Add Code

On Metric Learning for Audio-Text Cross-Modal Retrieval

1 code implementation • 29 Mar 2022 • Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang

We present an extensive evaluation of popular metric learning objectives on the AudioCaps and Clotho datasets.

AudioCaps Cross-Modal Retrieval +4

Paper
Code

Separate What You Describe: Language-Queried Audio Source Separation

1 code implementation • 28 Mar 2022 • Xubo Liu, Haohe Liu, Qiuqiang Kong, Xinhao Mei, Jinzheng Zhao, Qiushi Huang, Mark D. Plumbley, Wenwu Wang

In this paper, we introduce the task of language-queried audio source separation (LASS), which aims to separate a target source from an audio mixture based on a natural language query of the target source (e. g., "a man tells a joke followed by people laughing").

AudioCaps Audio Source Separation

126

Paper
Code

Deep Neural Decision Forest for Acoustic Scene Classification

no code implementations • 7 Mar 2022 • Jianyuan Sun, Xubo Liu, Xinhao Mei, Jinzheng Zhao, Mark D. Plumbley, Volkan Kılıç, Wenwu Wang

In this paper, we propose a novel approach for ASC using deep neural decision forest (DNDF).

Acoustic Scene Classification Classification +1

Paper
Add Code

Leveraging Pre-trained BERT for Audio Captioning

no code implementations • 6 Mar 2022 • Xubo Liu, Xinhao Mei, Qiushi Huang, Jianyuan Sun, Jinzheng Zhao, Haohe Liu, Mark D. Plumbley, Volkan Kılıç, Wenwu Wang

BERT is a pre-trained language model that has been extensively used in Natural Language Processing (NLP) tasks.

AudioCaps Audio captioning +1

Paper
Add Code

Local Information Assisted Attention-free Decoder for Audio Captioning

1 code implementation • 10 Jan 2022 • Feiyang Xiao, Jian Guan, Haiyan Lan, Qiaoxi Zhu, Wenwu Wang

Although this method effectively captures global information within audio data via the self-attention mechanism, it may ignore the event with short time duration, due to its limitation in capturing local information in an audio signal, leading to inaccurate prediction of captions.

Audio captioning Caption Generation

Paper
Code

Diverse Audio Captioning via Adversarial Training

no code implementations • 13 Oct 2021 • Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang

As different people may describe an audio clip from different aspects using distinct words and grammars, we argue that an audio captioning system should have the ability to generate diverse captions for a fixed audio clip and across similar audio clips.

Audio captioning Generative Adversarial Network +1

Paper
Add Code

End-to-end translation of human neural activity to speech with a dual-dual generative adversarial network

no code implementations • 13 Oct 2021 • Yina Guo, Xiaofei Zhang, Zhenying Gong, Anhong Wang, Wenwu Wang

A potential approach to this problem is to design an end-to-end method by using a dual generative adversarial network (DualGAN) without dimension reduction of passing information, but it cannot realize one-to-one signal-to-signal translation (see Fig. 1 (a) and (b)).

Brain Computer Interface Dimensionality Reduction +4

Paper
Add Code

One to Multiple Mapping Dual Learning: Learning Multiple Sources from One Mixed Signal

no code implementations • 13 Oct 2021 • Ting Liu, Wenwu Wang, Xiaofei Zhang, Zhenyin Gong, Yina Guo

Single channel blind source separation (SCBSS) refers to separate multiple sources from a mixed signal collected by a single sensor.

blind source separation Generative Adversarial Network

Paper
Add Code

ARCA23K: An audio dataset for investigating open-set label noise

2 code implementations • 19 Sep 2021 • Turab Iqbal, Yin Cao, Andrew Bailey, Mark D. Plumbley, Wenwu Wang

We show that the majority of labelling errors in ARCA23K are due to out-of-vocabulary audio clips, and we refer to this type of label noise as open-set label noise.

Representation Learning

Paper
Code

An Encoder-Decoder Based Audio Captioning System With Transfer and Reinforcement Learning

1 code implementation • 5 Aug 2021 • Xinhao Mei, Qiushi Huang, Xubo Liu, Gengyun Chen, Jingqian Wu, Yusong Wu, Jinzheng Zhao, Shengchen Li, Tom Ko, H Lilian Tang, Xi Shao, Mark D. Plumbley, Wenwu Wang

Automated audio captioning aims to use natural language to describe the content of audio data.

Audio captioning reinforcement-learning +2

Paper
Code

Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning

1 code implementation • 21 Jul 2021 • Xubo Liu, Turab Iqbal, Jinzheng Zhao, Qiushi Huang, Mark D. Plumbley, Wenwu Wang

We evaluate our approach on the UrbanSound8K dataset, compared to SampleRNN, with the performance metrics measuring the quality and diversity of generated sounds.

Music Generation Representation Learning +1

Paper
Code

Audio Captioning Transformer

1 code implementation • 21 Jul 2021 • Xinhao Mei, Xubo Liu, Qiushi Huang, Mark D. Plumbley, Wenwu Wang

In this paper, we propose an Audio Captioning Transformer (ACT), which is a full Transformer network based on an encoder-decoder architecture and is totally convolution-free.

Ranked #8 on Audio captioning on AudioCaps

AudioCaps Audio captioning

Paper
Code

CL4AC: A Contrastive Loss for Audio Captioning

2 code implementations • 21 Jul 2021 • Xubo Liu, Qiushi Huang, Xinhao Mei, Tom Ko, H Lilian Tang, Mark D. Plumbley, Wenwu Wang

Automated Audio captioning (AAC) is a cross-modal translation task that aims to use natural language to describe the content of an audio clip.

Audio captioning Translation

Paper
Code

Low-dimensional Denoising Embedding Transformer for ECG Classification

no code implementations • 31 Mar 2021 • Jian Guan, Wenbo Wang, Pengming Feng, Xinxin Wang, Wenwu Wang

However, the high-dimensional embedding obtained via 1-D convolution and positional encoding can lead to the loss of the signal's own temporal information and a large amount of training parameters.

Classification Denoising +2

Paper
Add Code

SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification

no code implementations • 31 Mar 2021 • Helin Wang, Yuexian Zou, Wenwu Wang

In this paper, we present SpecAugment++, a novel data augmentation method for deep neural networks based acoustic scene classification (ASC).

Acoustic Scene Classification Data Augmentation +2

Paper
Add Code

Time-domain Speech Enhancement with Generative Adversarial Learning

1 code implementation • 30 Mar 2021 • Feiyang Xiao, Jian Guan, Qiuqiang Kong, Wenwu Wang

Speech enhancement aims to obtain speech signals with high intelligibility and quality from noisy speech.

Generative Adversarial Network Speech Enhancement

Paper
Code

Enhancing Audio Augmentation Methods with Consistency Learning

no code implementations • 9 Feb 2021 • Turab Iqbal, Karim Helwani, Arvindh Krishnaswamy, Wenwu Wang

For tasks such as classification, there is a good case for learning representations of the data that are invariant to such transformations, yet this is not explicitly enforced by classification losses such as the cross-entropy loss.

Audio Classification Audio Tagging +2

Paper
Add Code

An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection

3 code implementations • 25 Oct 2020 • Yin Cao, Turab Iqbal, Qiuqiang Kong, Fengyan An, Wenwu Wang, Mark D. Plumbley

Polyphonic sound event localization and detection (SELD), which jointly performs sound event detection (SED) and direction-of-arrival (DoA) estimation, detects the type and occurrence time of sound events as well as their corresponding DoA angles simultaneously.

Sound Audio and Speech Processing

Paper
Code

Event-Independent Network for Polyphonic Sound Event Localization and Detection

2 code implementations • 30 Sep 2020 • Yin Cao, Turab Iqbal, Qiuqiang Kong, Yue Zhong, Wenwu Wang, Mark D. Plumbley

In this paper, a novel event-independent network for polyphonic sound event localization and detection is proposed.

Audio and Speech Processing Sound

Paper
Code

Evolving Multi-Resolution Pooling CNN for Monaural Singing Voice Separation

no code implementations • 3 Aug 2020 • Weitao Yuan, Bofei Dong, Shengbei Wang, Masashi Unoki, Wenwu Wang

Monaural Singing Voice Separation (MSVS) is a challenging task and has been studied for decades.

Neural Architecture Search

Paper
Add Code

Learning with Out-of-Distribution Data for Audio Classification

1 code implementation • 11 Feb 2020 • Turab Iqbal, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang

The proposed method uses an auxiliary classifier, trained on data that is known to be in-distribution, for detection and relabelling.

Audio Classification General Classification

Paper
Code

Environmental Sound Classification with Parallel Temporal-spectral Attention

no code implementations • 14 Dec 2019 • Helin Wang, Yuexian Zou, Dading Chong, Wenwu Wang

Convolutional neural networks (CNN) are one of the best-performing neural network architectures for environmental sound classification (ESC).

Acoustic Scene Classification Environmental Sound Classification +3

Paper
Add Code

IENet: Interacting Embranchment One Stage Anchor Free Detector for Orientation Aerial Object Detection

no code implementations • 2 Dec 2019 • Youtian Lin, Pengming Feng, Jian Guan, Wenwu Wang, Jonathon Chambers

First, a novel geometric transformation is employed to better represent the oriented object in angle prediction, then a branch interactive module with a self-attention mechanism is developed to fuse features from classification and box regression branches.

Ranked #4 on One-stage Anchor-free Oriented Object Detection on HRSC2016

Object object-detection +4

Paper
Add Code

Single-Channel Signal Separation and Deconvolution with Generative Adversarial Networks

1 code implementation • 14 Jun 2019 • Qiuqiang Kong, Yong Xu, Wenwu Wang, Philip J. B. Jackson, Mark D. Plumbley

Single-channel signal separation and deconvolution aims to separate and deconvolve individual sources from a single-channel mixture and is a challenging problem in which no prior knowledge of the mixing filters is available.

Generative Adversarial Network Image Inpainting

Paper
Code

Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy

1 code implementation • 1 May 2019 • Yin Cao, Qiuqiang Kong, Turab Iqbal, Fengyan An, Wenwu Wang, Mark D. Plumbley

In this paper, it is experimentally shown that the training information of SED is able to contribute to the direction of arrival estimation (DOAE).

Sound Audio and Speech Processing

Paper
Code

Bayesian inference for PCA and MUSIC algorithms with unknown number of sources

1 code implementation • 26 Sep 2018 • Viet Hung Tran, Wenwu Wang

We then use Bayesian method to, for the first time, compute the MAP estimate for the number of sources in PCA and MUSIC algorithms.

Bayesian Inference

Paper
Code

Sound Event Detection and Time-Frequency Segmentation from Weakly Labelled Data

2 code implementations • 12 Apr 2018 • Qiuqiang Kong, Yong Xu, Iwona Sobieraj, Wenwu Wang, Mark D. Plumbley

Sound event detection (SED) aims to detect when and recognize what sound events happen in an audio clip.

Sound Audio and Speech Processing

Paper
Code

A joint separation-classification model for sound event detection of weakly labelled data

2 code implementations • 8 Nov 2017 • Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark D. Plumbley

First, we propose a separation mapping from the time-frequency (T-F) representation of an audio to the T-F segmentation masks of the audio events.

Sound Audio and Speech Processing

Paper
Code

Audio Set classification with attention model: A probabilistic perspective

5 code implementations • 2 Nov 2017 • Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark D. Plumbley

Then the classification of a bag is the expectation of the classification output of the instances in the bag with respect to the learned probability measure.

Sound Audio and Speech Processing

150

Paper
Code

Large-scale weakly supervised audio classification using gated convolutional neural network

3 code implementations • 1 Oct 2017 • Yong Xu, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley

In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly supervised sound event detection task of Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 challenge.

Sound Audio and Speech Processing

Paper
Code

Attention and Localization based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging

1 code implementation • 17 Mar 2017 • Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley

Audio tagging aims to perform multi-label classification on audio chunks and it is a newly proposed task in the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge.

Sound

Paper
Code

Convolutional Gated Recurrent Neural Network Incorporating Spatial Features for Audio Tagging

2 code implementations • 24 Feb 2017 • Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley

In this paper, we propose to use a convolutional neural network (CNN) to extract robust features from mel-filter banks (MFBs), spectrograms or even raw waveforms for audio tagging.

Audio Tagging

Paper
Code

A Joint Detection-Classification Model for Audio Tagging of Weakly Labelled Data

1 code implementation • 6 Oct 2016 • Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark Plumbley

The labeling of an audio clip is often based on the audio events in the clip and no event level label is provided to the user.

Sound

Paper
Code

Hierarchical learning for DNN-based acoustic scene classification

no code implementations • 13 Jul 2016 • Yong Xu, Qiang Huang, Wenwu Wang, Mark D. Plumbley

In this paper, we present a deep neural network (DNN)-based acoustic scene classification framework.

Acoustic Scene Classification Classification +2

Paper
Add Code

Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging

2 code implementations • 13 Jul 2016 • Yong Xu, Qiang Huang, Wenwu Wang, Peter Foster, Siddharth Sigtia, Philip J. B. Jackson, Mark D. Plumbley

For the unsupervised feature learning, we propose to use a symmetric or asymmetric deep de-noising auto-encoder (sDAE or aDAE) to generate new data-driven features from the Mel-Filter Banks (MFBs) features.

Audio Tagging General Classification +1

Paper
Code

Fully DNN-based Multi-label regression for audio tagging

no code implementations • 24 Jun 2016 • Yong Xu, Qiang Huang, Wenwu Wang, Philip J. B. Jackson, Mark D. Plumbley

Compared with the conventional Gaussian Mixture Model (GMM) and support vector machine (SVM) methods, the proposed fully DNN-based method could well utilize the long-term temporal information with the whole chunk as the input.

Audio Tagging Event Detection +4

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.