Search Results for author: Haohe Liu

Found 19 papers, 15 papers with code

E-PANNs: Sound Recognition Using Efficient Pre-trained Audio Neural Networks

1 code implementation30 May 2023 Arshdeep Singh, Haohe Liu, Mark D. Plumbley

Sounds carry an abundance of information about activities and events in our everyday environment, such as traffic noise, road works, music, or people talking.

Audio Tagging

Adapting Language-Audio Models as Few-Shot Audio Learners

no code implementations28 May 2023 Jinhua Liang, Xubo Liu, Haohe Liu, Huy Phan, Emmanouil Benetos, Mark D. Plumbley, Wenwu Wang

We presented the Treff adapter, a training-efficient adapter for CLAP, to boost zero-shot classification performance by making use of a small set of labelled data.

Audio Classification Classification +2

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research

1 code implementation30 Mar 2023 Xinhao Mei, Chutong Meng, Haohe Liu, Qiuqiang Kong, Tom Ko, Chengqi Zhao, Mark D. Plumbley, Yuexian Zou, Wenwu Wang

To address this data scarcity issue, we introduce WavCaps, the first large-scale weakly-labelled audio captioning dataset, comprising approximately 400k audio clips with paired captions.

Audio captioning Event Detection +2

AudioLDM: Text-to-Audio Generation with Latent Diffusion Models

3 code implementations29 Jan 2023 Haohe Liu, Zehua Chen, Yi Yuan, Xinhao Mei, Xubo Liu, Danilo Mandic, Wenwu Wang, Mark D. Plumbley

By learning the latent representations of audio signals and their compositions without modeling the cross-modal relationship, AudioLDM is advantageous in both generation quality and computational efficiency.

Audio Generation Style Transfer

ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech

no code implementations30 Dec 2022 Zehua Chen, Yihan Wu, Yichong Leng, Jiawei Chen, Haohe Liu, Xu Tan, Yang Cui, Ke Wang, Lei He, Sheng Zhao, Jiang Bian, Danilo Mandic

Denoising Diffusion Probabilistic Models (DDPMs) are emerging in text-to-speech (TTS) synthesis because of their strong capability of generating high-fidelity samples.


Ontology-aware Learning and Evaluation for Audio Tagging

1 code implementation22 Nov 2022 Haohe Liu, Qiuqiang Kong, Xubo Liu, Xinhao Mei, Wenwu Wang, Mark D. Plumbley

The proposed metric, ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation.

Audio Tagging

Learning the Spectrogram Temporal Resolution for Audio Classification

1 code implementation4 Oct 2022 Haohe Liu, Xubo Liu, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley

Starting from a high-temporal-resolution spectrogram such as one-millisecond hop size, we show that DiffRes can improve classification accuracy with the same computational complexity.

Audio Classification General Classification

Simple Pooling Front-ends For Efficient Audio Classification

1 code implementation3 Oct 2022 Xubo Liu, Haohe Liu, Qiuqiang Kong, Xinhao Mei, Mark D. Plumbley, Wenwu Wang

Recently, there has been increasing interest in building efficient audio neural networks for on-device scenarios.

Audio Classification Classification

Segment-level Metric Learning for Few-shot Bioacoustic Event Detection

1 code implementation15 Jul 2022 Haohe Liu, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley

In addition, we use transductive inference on the validation set during training for better adaptation to novel classes.

Event Detection Few-Shot Learning +2

BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis

1 code implementation30 May 2022 Yichong Leng, Zehua Chen, Junliang Guo, Haohe Liu, Jiawei Chen, Xu Tan, Danilo Mandic, Lei He, Xiang-Yang Li, Tao Qin, Sheng Zhao, Tie-Yan Liu

Combining this novel perspective of two-stage synthesis with advanced generative models (i. e., the diffusion models), the proposed BinauralGrad is able to generate accurate and high-fidelity binaural audio samples.

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

2 code implementations9 May 2022 Xu Tan, Jiawei Chen, Haohe Liu, Jian Cong, Chen Zhang, Yanqing Liu, Xi Wang, Yichong Leng, YuanHao Yi, Lei He, Frank Soong, Tao Qin, Sheng Zhao, Tie-Yan Liu

In this paper, we answer these questions by first defining the human-level quality based on the statistical significance of subjective measure and introducing appropriate guidelines to judge it, and then developing a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset.

Speech Synthesis Text-To-Speech Synthesis

Separate What You Describe: Language-Queried Audio Source Separation

1 code implementation28 Mar 2022 Xubo Liu, Haohe Liu, Qiuqiang Kong, Xinhao Mei, Jinzheng Zhao, Qiushi Huang, Mark D. Plumbley, Wenwu Wang

In this paper, we introduce the task of language-queried audio source separation (LASS), which aims to separate a target source from an audio mixture based on a natural language query of the target source (e. g., "a man tells a joke followed by people laughing").

Audio Source Separation

Neural Vocoder is All You Need for Speech Super-resolution

1 code implementation28 Mar 2022 Haohe Liu, Woosung Choi, Xubo Liu, Qiuqiang Kong, Qiao Tian, DeLiang Wang

In this paper, we propose a neural vocoder based speech super-resolution method (NVSR) that can handle a variety of input resolution and upsampling ratios.

Audio Super-Resolution Bandwidth Extension +1

CWS-PResUNet: Music Source Separation with Channel-wise Subband Phase-aware ResUNet

1 code implementation9 Dec 2021 Haohe Liu, Qiuqiang Kong, Jiafeng Liu

On the MUSDB18HQ test set, we propose a 276-layer CWS-PResUNet and achieve state-of-the-art (SoTA) performance on vocals with an 8. 92 signal-to-distortion ratio (SDR) score.

Music Source Separation

Channel-wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music

1 code implementation12 Aug 2020 Haohe Liu, Lei Xie, Jian Wu, Geng Yang

We aim to address the major issues in CNN-based high-resolution MSS model: high computational cost and weight sharing between distinctly different bands.

Audio and Speech Processing Sound

Cannot find the paper you are looking for? You can Submit a new open access paper.