Search Results for author: Xubo Liu

Found 13 papers, 8 papers with code

Automated Audio Captioning: an Overview of Recent Progress and New Challenges

no code implementations12 May 2022 Xinhao Mei, Xubo Liu, Mark D. Plumbley, Wenwu Wang

In this paper, we present a comprehensive review of the published contributions in automated audio captioning, from a variety of existing approaches to evaluation metrics and datasets.

Audio captioning Translation

On Metric Learning for Audio-Text Cross-Modal Retrieval

1 code implementation29 Mar 2022 Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang

We present an extensive evaluation of popular metric learning objectives on the AudioCaps and Clotho datasets.

Cross-Modal Retrieval Metric Learning +1

Separate What You Describe: Language-Queried Audio Source Separation

1 code implementation28 Mar 2022 Xubo Liu, Haohe Liu, Qiuqiang Kong, Xinhao Mei, Jinzheng Zhao, Qiushi Huang, Mark D. Plumbley, Wenwu Wang

In this paper, we introduce the task of language-queried audio source separation (LASS), which aims to separate a target source from an audio mixture based on a natural language query of the target source (e. g., "a man tells a joke followed by people laughing").

Audio Source Separation

Neural Vocoder is All You Need for Speech Super-resolution

1 code implementation28 Mar 2022 Haohe Liu, Woosung Choi, Xubo Liu, Qiuqiang Kong, Qiao Tian, DeLiang Wang

In this paper, we propose a neural vocoder based speech super-resolution method (NVSR) that can handle a variety of input resolution and upsampling ratios.

Audio Super-Resolution Bandwidth Extension +1

Diverse Audio Captioning via Adversarial Training

no code implementations13 Oct 2021 Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang

As different people may describe an audio clip from different aspects using distinct words and grammars, we argue that an audio captioning system should have the ability to generate diverse captions for a fixed audio clip and across similar audio clips.

Audio captioning

Audio Captioning Transformer

1 code implementation21 Jul 2021 Xinhao Mei, Xubo Liu, Qiushi Huang, Mark D. Plumbley, Wenwu Wang

In this paper, we propose an Audio Captioning Transformer (ACT), which is a full Transformer network based on an encoder-decoder architecture and is totally convolution-free.

Audio captioning

CL4AC: A Contrastive Loss for Audio Captioning

2 code implementations21 Jul 2021 Xubo Liu, Qiushi Huang, Xinhao Mei, Tom Ko, H Lilian Tang, Mark D. Plumbley, Wenwu Wang

Automated Audio captioning (AAC) is a cross-modal translation task that aims to use natural language to describe the content of an audio clip.

Audio captioning Translation

Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning

1 code implementation21 Jul 2021 Xubo Liu, Turab Iqbal, Jinzheng Zhao, Qiushi Huang, Mark D. Plumbley, Wenwu Wang

We evaluate our approach on the UrbanSound8K dataset, compared to SampleRNN, with the performance metrics measuring the quality and diversity of generated sounds.

Music Generation Representation Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.