Search Results for author: Wenwu Wang

Found 37 papers, 22 papers with code

Automated Audio Captioning: an Overview of Recent Progress and New Challenges

no code implementations12 May 2022 Xinhao Mei, Xubo Liu, Mark D. Plumbley, Wenwu Wang

In this paper, we present a comprehensive review of the published contributions in automated audio captioning, from a variety of existing approaches to evaluation metrics and datasets.

Audio captioning Translation

On Metric Learning for Audio-Text Cross-Modal Retrieval

1 code implementation29 Mar 2022 Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang

We present an extensive evaluation of popular metric learning objectives on the AudioCaps and Clotho datasets.

Cross-Modal Retrieval Metric Learning +1

Separate What You Describe: Language-Queried Audio Source Separation

1 code implementation28 Mar 2022 Xubo Liu, Haohe Liu, Qiuqiang Kong, Xinhao Mei, Jinzheng Zhao, Qiushi Huang, Mark D. Plumbley, Wenwu Wang

In this paper, we introduce the task of language-queried audio source separation (LASS), which aims to separate a target source from an audio mixture based on a natural language query of the target source (e. g., "a man tells a joke followed by people laughing").

Audio Source Separation

Local Information Assisted Attention-free Decoder for Audio Captioning

no code implementations10 Jan 2022 Feiyang Xiao, Jian Guan, Qiaoxi Zhu, Haiyan Lan, Wenwu Wang

Automated audio captioning (AAC) aims to describe audio data with captions using natural language.

Audio captioning

One to Multiple Mapping Dual Learning: Learning Multiple Sources from One Mixed Signal

no code implementations13 Oct 2021 Ting Liu, Wenwu Wang, Xiaofei Zhang, Zhenyin Gong, Yina Guo

Single channel blind source separation (SCBSS) refers to separate multiple sources from a mixed signal collected by a single sensor.

Diverse Audio Captioning via Adversarial Training

no code implementations13 Oct 2021 Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang

As different people may describe an audio clip from different aspects using distinct words and grammars, we argue that an audio captioning system should have the ability to generate diverse captions for a fixed audio clip and across similar audio clips.

Audio captioning

End-to-end translation of human neural activity to speech with a dual-dual generative adversarial network

no code implementations13 Oct 2021 Yina Guo, Xiaofei Zhang, Zhenying Gong, Anhong Wang, Wenwu Wang

A potential approach to this problem is to design an end-to-end method by using a dual generative adversarial network (DualGAN) without dimension reduction of passing information, but it cannot realize one-to-one signal-to-signal translation (see Fig. 1 (a) and (b)).

Dimensionality Reduction EEG +1

ARCA23K: An audio dataset for investigating open-set label noise

2 code implementations19 Sep 2021 Turab Iqbal, Yin Cao, Andrew Bailey, Mark D. Plumbley, Wenwu Wang

We show that the majority of labelling errors in ARCA23K are due to out-of-vocabulary audio clips, and we refer to this type of label noise as open-set label noise.

Representation Learning

Audio Captioning Transformer

1 code implementation21 Jul 2021 Xinhao Mei, Xubo Liu, Qiushi Huang, Mark D. Plumbley, Wenwu Wang

In this paper, we propose an Audio Captioning Transformer (ACT), which is a full Transformer network based on an encoder-decoder architecture and is totally convolution-free.

Audio captioning

CL4AC: A Contrastive Loss for Audio Captioning

2 code implementations21 Jul 2021 Xubo Liu, Qiushi Huang, Xinhao Mei, Tom Ko, H Lilian Tang, Mark D. Plumbley, Wenwu Wang

Automated Audio captioning (AAC) is a cross-modal translation task that aims to use natural language to describe the content of an audio clip.

Audio captioning Translation

Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning

1 code implementation21 Jul 2021 Xubo Liu, Turab Iqbal, Jinzheng Zhao, Qiushi Huang, Mark D. Plumbley, Wenwu Wang

We evaluate our approach on the UrbanSound8K dataset, compared to SampleRNN, with the performance metrics measuring the quality and diversity of generated sounds.

Music Generation Representation Learning +1

Low-dimensional Denoising Embedding Transformer for ECG Classification

no code implementations31 Mar 2021 Jian Guan, Wenbo Wang, Pengming Feng, Xinxin Wang, Wenwu Wang

However, the high-dimensional embedding obtained via 1-D convolution and positional encoding can lead to the loss of the signal's own temporal information and a large amount of training parameters.

Classification Denoising +2

SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification

no code implementations31 Mar 2021 Helin Wang, Yuexian Zou, Wenwu Wang

In this paper, we present SpecAugment++, a novel data augmentation method for deep neural networks based acoustic scene classification (ASC).

Acoustic Scene Classification Data Augmentation +2

Time-domain Speech Enhancement with Generative Adversarial Learning

1 code implementation30 Mar 2021 Feiyang Xiao, Jian Guan, Qiuqiang Kong, Wenwu Wang

Speech enhancement aims to obtain speech signals with high intelligibility and quality from noisy speech.

Speech Enhancement

Enhancing Audio Augmentation Methods with Consistency Learning

no code implementations9 Feb 2021 Turab Iqbal, Karim Helwani, Arvindh Krishnaswamy, Wenwu Wang

For tasks such as classification, there is a good case for learning representations of the data that are invariant to such transformations, yet this is not explicitly enforced by classification losses such as the cross-entropy loss.

Audio Classification Audio Tagging +3

An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection

2 code implementations25 Oct 2020 Yin Cao, Turab Iqbal, Qiuqiang Kong, Fengyan An, Wenwu Wang, Mark D. Plumbley

Polyphonic sound event localization and detection (SELD), which jointly performs sound event detection (SED) and direction-of-arrival (DoA) estimation, detects the type and occurrence time of sound events as well as their corresponding DoA angles simultaneously.

Sound Audio and Speech Processing

Event-Independent Network for Polyphonic Sound Event Localization and Detection

2 code implementations30 Sep 2020 Yin Cao, Turab Iqbal, Qiuqiang Kong, Yue Zhong, Wenwu Wang, Mark D. Plumbley

In this paper, a novel event-independent network for polyphonic sound event localization and detection is proposed.

Audio and Speech Processing Sound

Learning with Out-of-Distribution Data for Audio Classification

1 code implementation11 Feb 2020 Turab Iqbal, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang

The proposed method uses an auxiliary classifier, trained on data that is known to be in-distribution, for detection and relabelling.

Audio Classification Classification +1

Environmental Sound Classification with Parallel Temporal-spectral Attention

no code implementations14 Dec 2019 Helin Wang, Yuexian Zou, Dading Chong, Wenwu Wang

Convolutional neural networks (CNN) are one of the best-performing neural network architectures for environmental sound classification (ESC).

Acoustic Scene Classification Classification +3

IENet: Interacting Embranchment One Stage Anchor Free Detector for Orientation Aerial Object Detection

no code implementations2 Dec 2019 Youtian Lin, Pengming Feng, Jian Guan, Wenwu Wang, Jonathon Chambers

First, a novel geometric transformation is employed to better represent the oriented object in angle prediction, then a branch interactive module with a self-attention mechanism is developed to fuse features from classification and box regression branches.

Object Detection In Aerial Images Object Localization +1

Single-Channel Signal Separation and Deconvolution with Generative Adversarial Networks

1 code implementation14 Jun 2019 Qiuqiang Kong, Yong Xu, Wenwu Wang, Philip J. B. Jackson, Mark D. Plumbley

Single-channel signal separation and deconvolution aims to separate and deconvolve individual sources from a single-channel mixture and is a challenging problem in which no prior knowledge of the mixing filters is available.

Image Inpainting

Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy

1 code implementation1 May 2019 Yin Cao, Qiuqiang Kong, Turab Iqbal, Fengyan An, Wenwu Wang, Mark D. Plumbley

In this paper, it is experimentally shown that the training information of SED is able to contribute to the direction of arrival estimation (DOAE).

Sound Audio and Speech Processing

Bayesian inference for PCA and MUSIC algorithms with unknown number of sources

1 code implementation26 Sep 2018 Viet Hung Tran, Wenwu Wang

We then use Bayesian method to, for the first time, compute the MAP estimate for the number of sources in PCA and MUSIC algorithms.

Bayesian Inference

Sound Event Detection and Time-Frequency Segmentation from Weakly Labelled Data

2 code implementations12 Apr 2018 Qiuqiang Kong, Yong Xu, Iwona Sobieraj, Wenwu Wang, Mark D. Plumbley

Sound event detection (SED) aims to detect when and recognize what sound events happen in an audio clip.

Sound Audio and Speech Processing

A joint separation-classification model for sound event detection of weakly labelled data

2 code implementations8 Nov 2017 Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark D. Plumbley

First, we propose a separation mapping from the time-frequency (T-F) representation of an audio to the T-F segmentation masks of the audio events.

Sound Audio and Speech Processing

Audio Set classification with attention model: A probabilistic perspective

5 code implementations2 Nov 2017 Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark D. Plumbley

Then the classification of a bag is the expectation of the classification output of the instances in the bag with respect to the learned probability measure.

Sound Audio and Speech Processing

Large-scale weakly supervised audio classification using gated convolutional neural network

3 code implementations1 Oct 2017 Yong Xu, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley

In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly supervised sound event detection task of Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 challenge.

Sound Audio and Speech Processing

Attention and Localization based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging

1 code implementation17 Mar 2017 Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley

Audio tagging aims to perform multi-label classification on audio chunks and it is a newly proposed task in the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge.

Sound

Convolutional Gated Recurrent Neural Network Incorporating Spatial Features for Audio Tagging

2 code implementations24 Feb 2017 Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley

In this paper, we propose to use a convolutional neural network (CNN) to extract robust features from mel-filter banks (MFBs), spectrograms or even raw waveforms for audio tagging.

Audio Tagging

A Joint Detection-Classification Model for Audio Tagging of Weakly Labelled Data

1 code implementation6 Oct 2016 Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark Plumbley

The labeling of an audio clip is often based on the audio events in the clip and no event level label is provided to the user.

Sound

Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging

1 code implementation13 Jul 2016 Yong Xu, Qiang Huang, Wenwu Wang, Peter Foster, Siddharth Sigtia, Philip J. B. Jackson, Mark D. Plumbley

For the unsupervised feature learning, we propose to use a symmetric or asymmetric deep de-noising auto-encoder (sDAE or aDAE) to generate new data-driven features from the Mel-Filter Banks (MFBs) features.

Audio Tagging General Classification +1

Fully DNN-based Multi-label regression for audio tagging

no code implementations24 Jun 2016 Yong Xu, Qiang Huang, Wenwu Wang, Philip J. B. Jackson, Mark D. Plumbley

Compared with the conventional Gaussian Mixture Model (GMM) and support vector machine (SVM) methods, the proposed fully DNN-based method could well utilize the long-term temporal information with the whole chunk as the input.

Audio Tagging Event Detection +3

Cannot find the paper you are looking for? You can Submit a new open access paper.