Search Results for author: Xinhao Mei

Found 18 papers, 11 papers with code

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research

1 code implementation30 Mar 2023 Xinhao Mei, Chutong Meng, Haohe Liu, Qiuqiang Kong, Tom Ko, Chengqi Zhao, Mark D. Plumbley, Yuexian Zou, Wenwu Wang

To address this data scarcity issue, we introduce WavCaps, the first large-scale weakly-labelled audio captioning dataset, comprising approximately 400k audio clips with paired captions.

Audio captioning Event Detection +2

AudioLDM: Text-to-Audio Generation with Latent Diffusion Models

3 code implementations29 Jan 2023 Haohe Liu, Zehua Chen, Yi Yuan, Xinhao Mei, Xubo Liu, Danilo Mandic, Wenwu Wang, Mark D. Plumbley

By learning the latent representations of audio signals and their compositions without modeling the cross-modal relationship, AudioLDM is advantageous in both generation quality and computational efficiency.

Audio Generation Style Transfer

Towards Generating Diverse Audio Captions via Adversarial Training

no code implementations5 Dec 2022 Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang

Captions generated by existing models are generally faithful to the content of audio clips, however, these machine-generated captions are often deterministic (e. g., generating a fixed caption for a given audio clip), simple (e. g., using common words and simple grammar), and generic (e. g., generating the same caption for similar audio clips).

Audio captioning

Ontology-aware Learning and Evaluation for Audio Tagging

1 code implementation22 Nov 2022 Haohe Liu, Qiuqiang Kong, Xubo Liu, Xinhao Mei, Wenwu Wang, Mark D. Plumbley

The proposed metric, ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation.

Audio Tagging

Automated Audio Captioning via Fusion of Low- and High- Dimensional Features

no code implementations10 Oct 2022 Jianyuan Sun, Xubo Liu, Xinhao Mei, Mark D. Plumbley, Volkan Kilic, Wenwu Wang

Moreover, in LHDFF, a new PANNs encoder is proposed called Residual PANNs (RPANNs) by fusing the low-dimensional feature from the intermediate convolution layer output and the high-dimensional feature from the final layer output of PANNs.

Audio captioning Vocal Bursts Intensity Prediction

Simple Pooling Front-ends For Efficient Audio Classification

1 code implementation3 Oct 2022 Xubo Liu, Haohe Liu, Qiuqiang Kong, Xinhao Mei, Mark D. Plumbley, Wenwu Wang

Recently, there has been increasing interest in building efficient audio neural networks for on-device scenarios.

Audio Classification Classification

Segment-level Metric Learning for Few-shot Bioacoustic Event Detection

1 code implementation15 Jul 2022 Haohe Liu, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley

In addition, we use transductive inference on the validation set during training for better adaptation to novel classes.

Event Detection Few-Shot Learning +2

Automated Audio Captioning: An Overview of Recent Progress and New Challenges

no code implementations12 May 2022 Xinhao Mei, Xubo Liu, Mark D. Plumbley, Wenwu Wang

In this paper, we present a comprehensive review of the published contributions in automated audio captioning, from a variety of existing approaches to evaluation metrics and datasets.

Audio captioning Translation

On Metric Learning for Audio-Text Cross-Modal Retrieval

1 code implementation29 Mar 2022 Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang

We present an extensive evaluation of popular metric learning objectives on the AudioCaps and Clotho datasets.

Cross-Modal Retrieval Metric Learning +3

Separate What You Describe: Language-Queried Audio Source Separation

1 code implementation28 Mar 2022 Xubo Liu, Haohe Liu, Qiuqiang Kong, Xinhao Mei, Jinzheng Zhao, Qiushi Huang, Mark D. Plumbley, Wenwu Wang

In this paper, we introduce the task of language-queried audio source separation (LASS), which aims to separate a target source from an audio mixture based on a natural language query of the target source (e. g., "a man tells a joke followed by people laughing").

Audio Source Separation

Diverse Audio Captioning via Adversarial Training

no code implementations13 Oct 2021 Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang

As different people may describe an audio clip from different aspects using distinct words and grammars, we argue that an audio captioning system should have the ability to generate diverse captions for a fixed audio clip and across similar audio clips.

Audio captioning

Audio Captioning Transformer

1 code implementation21 Jul 2021 Xinhao Mei, Xubo Liu, Qiushi Huang, Mark D. Plumbley, Wenwu Wang

In this paper, we propose an Audio Captioning Transformer (ACT), which is a full Transformer network based on an encoder-decoder architecture and is totally convolution-free.

Audio captioning

CL4AC: A Contrastive Loss for Audio Captioning

2 code implementations21 Jul 2021 Xubo Liu, Qiushi Huang, Xinhao Mei, Tom Ko, H Lilian Tang, Mark D. Plumbley, Wenwu Wang

Automated Audio captioning (AAC) is a cross-modal translation task that aims to use natural language to describe the content of an audio clip.

Audio captioning Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.