Search Results for author: Haohe Liu

Found 28 papers, 24 papers with code

WavCraft: Audio Editing and Generation with Natural Language Prompts

1 code implementation • 14 Mar 2024 • Jinhua Liang, huan zhang, Haohe Liu, Yin Cao, Qiuqiang Kong, Xubo Liu, Wenwu Wang, Mark D. Plumbley, Huy Phan, Emmanouil Benetos

We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing.

In-Context Learning

Paper
Code

Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift

1 code implementation • 5 Feb 2024 • Jisheng Bai, Mou Wang, Haohe Liu, Han Yin, Yafei Jia, Siwei Huang, Yutong Du, Dongzhe Zhang, Dongyuan Shi, Woon-Seng Gan, Mark D. Plumbley, Susanto Rahardja, Bin Xiang, Jianfeng Chen

In addition, considering the abundance of unlabeled acoustic scene data in the real world, it is important to study the possible ways to utilize these unlabelled data.

Acoustic Scene Classification Scene Classification

Paper
Code

Retrieval-Augmented Text-to-Audio Generation

no code implementations • 14 Sep 2023 • Yi Yuan, Haohe Liu, Xubo Liu, Qiushi Huang, Mark D. Plumbley, Wenwu Wang

Despite recent progress in text-to-audio (TTA) generation, we show that the state-of-the-art models, such as AudioLDM, trained on datasets with an imbalanced class distribution, such as AudioCaps, are biased in their generation performance.

Ranked #2 on Audio Generation on AudioCaps

AudioCaps Audio Generation +2

Paper
Add Code

AudioSR: Versatile Audio Super-resolution at Scale

1 code implementation • 13 Sep 2023 • Haohe Liu, Ke Chen, Qiao Tian, Wenwu Wang, Mark D. Plumbley

Audio super-resolution is a fundamental task that predicts high-frequency components for low-resolution audio, enhancing audio quality in digital applications.

Audio Super-Resolution Super-Resolution

872

Paper
Code

AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

1 code implementation • 10 Aug 2023 • Haohe Liu, Qiao Tian, Yi Yuan, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Yuping Wang, Wenwu Wang, Yuxuan Wang, Mark D. Plumbley

Any audio can be translated into LOA based on AudioMAE, a self-supervised pre-trained representation learning model.

Ranked #3 on Audio Generation on AudioCaps

Audio Generation In-Context Learning +2

2,042

Paper
Code

Separate Anything You Describe

1 code implementation • 9 Aug 2023 • Xubo Liu, Qiuqiang Kong, Yan Zhao, Haohe Liu, Yi Yuan, Yuzhuo Liu, Rui Xia, Yuxuan Wang, Mark D. Plumbley, Wenwu Wang

In this work, we introduce AudioSep, a foundation model for open-domain audio source separation with natural language queries.

Audio Source Separation Natural Language Queries +2

1,426

Paper
Code

MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies

1 code implementation • 3 Aug 2023 • Ke Chen, Yusong Wu, Haohe Liu, Marianna Nezhurina, Taylor Berg-Kirkpatrick, Shlomo Dubnov

Diffusion models have shown promising results in cross-modal generation tasks, including text-to-image and text-to-audio generation.

Audio Generation Data Augmentation +2

123

Paper
Code

WavJourney: Compositional Audio Creation with Large Language Models

1 code implementation • 26 Jul 2023 • Xubo Liu, Zhongkai Zhu, Haohe Liu, Yi Yuan, Meng Cui, Qiushi Huang, Jinhua Liang, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang

Subjective evaluations demonstrate the potential of WavJourney in crafting engaging storytelling audio content from text.

Audio Generation

502

Paper
Code

Text-Driven Foley Sound Generation With Latent Diffusion Model

1 code implementation • 17 Jun 2023 • Yi Yuan, Haohe Liu, Xubo Liu, Xiyuan Kang, Peipei Wu, Mark D. Plumbley, Wenwu Wang

We have observed that the feature embedding extracted by the text encoder can significantly affect the performance of the generation model.

Transfer Learning

Paper
Code

E-PANNs: Sound Recognition Using Efficient Pre-trained Audio Neural Networks

1 code implementation • 30 May 2023 • Arshdeep Singh, Haohe Liu, Mark D. Plumbley

Sounds carry an abundance of information about activities and events in our everyday environment, such as traffic noise, road works, music, or people talking.

Audio Tagging

Paper
Code

Adapting Language-Audio Models as Few-Shot Audio Learners

no code implementations • 28 May 2023 • Jinhua Liang, Xubo Liu, Haohe Liu, Huy Phan, Emmanouil Benetos, Mark D. Plumbley, Wenwu Wang

We presented the Treff adapter, a training-efficient adapter for CLAP, to boost zero-shot classification performance by making use of a small set of labelled data.

Audio Classification Few-Shot Learning +1

Paper
Add Code

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research

3 code implementations • 30 Mar 2023 • Xinhao Mei, Chutong Meng, Haohe Liu, Qiuqiang Kong, Tom Ko, Chengqi Zhao, Mark D. Plumbley, Yuexian Zou, Wenwu Wang

To address this data scarcity issue, we introduce WavCaps, the first large-scale weakly-labelled audio captioning dataset, comprising approximately 400k audio clips with paired captions.

Ranked #1 on Zero-Shot Environment Sound Classification on ESC-50 (using extra training data)

Audio captioning Event Detection +6

173

Paper
Code

Leveraging Pre-trained AudioLDM for Text to Sound Generation: A Benchmark Study

no code implementations • 7 Mar 2023 • Yi Yuan, Haohe Liu, Jinhua Liang, Xubo Liu, Mark D. Plumbley, Wenwu Wang

Deep neural networks have recently achieved breakthroughs in sound generation with text prompts.

Audio Generation Benchmarking +1

Paper
Add Code

AudioLDM: Text-to-Audio Generation with Latent Diffusion Models

3 code implementations • 29 Jan 2023 • Haohe Liu, Zehua Chen, Yi Yuan, Xinhao Mei, Xubo Liu, Danilo Mandic, Wenwu Wang, Mark D. Plumbley

By learning the latent representations of audio signals and their compositions without modeling the cross-modal relationship, AudioLDM is advantageous in both generation quality and computational efficiency.

Ranked #9 on Audio Generation on AudioCaps

AudioCaps Audio Generation +2

22,476

Paper
Code

ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech

1 code implementation • 30 Dec 2022 • Zehua Chen, Yihan Wu, Yichong Leng, Jiawei Chen, Haohe Liu, Xu Tan, Yang Cui, Ke Wang, Lei He, Sheng Zhao, Jiang Bian, Danilo Mandic

Denoising Diffusion Probabilistic Models (DDPMs) are emerging in text-to-speech (TTS) synthesis because of their strong capability of generating high-fidelity samples.

Denoising

Paper
Code

Ontology-aware Learning and Evaluation for Audio Tagging

1 code implementation • 22 Nov 2022 • Haohe Liu, Qiuqiang Kong, Xubo Liu, Xinhao Mei, Wenwu Wang, Mark D. Plumbley

The proposed metric, ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation.

Audio Tagging

Paper
Code

Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention

1 code implementation • 28 Oct 2022 • Xubo Liu, Qiushi Huang, Xinhao Mei, Haohe Liu, Qiuqiang Kong, Jianyuan Sun, Shengchen Li, Tom Ko, Yu Zhang, Lilian H. Tang, Mark D. Plumbley, Volkan Kılıç, Wenwu Wang

Audio captioning aims to generate text descriptions of audio clips.

AudioCaps Audio captioning +1

Paper
Code

Learning Temporal Resolution in Spectrogram for Audio Classification

1 code implementation • 4 Oct 2022 • Haohe Liu, Xubo Liu, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley

The audio spectrogram is a time-frequency representation that has been widely used for audio classification.

Audio Classification General Classification

Paper
Code

Simple Pooling Front-ends For Efficient Audio Classification

1 code implementation • 3 Oct 2022 • Xubo Liu, Haohe Liu, Qiuqiang Kong, Xinhao Mei, Mark D. Plumbley, Wenwu Wang

Recently, there has been increasing interest in building efficient audio neural networks for on-device scenarios.

Audio Classification

Paper
Code

Segment-level Metric Learning for Few-shot Bioacoustic Event Detection

1 code implementation • 15 Jul 2022 • Haohe Liu, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley

In addition, we use transductive inference on the validation set during training for better adaptation to novel classes.

Event Detection Few-Shot Learning +2

Paper
Code

BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis

1 code implementation • 30 May 2022 • Yichong Leng, Zehua Chen, Junliang Guo, Haohe Liu, Jiawei Chen, Xu Tan, Danilo Mandic, Lei He, Xiang-Yang Li, Tao Qin, Sheng Zhao, Tie-Yan Liu

Combining this novel perspective of two-stage synthesis with advanced generative models (i. e., the diffusion models), the proposed BinauralGrad is able to generate accurate and high-fidelity binaural audio samples.

Audio Synthesis

1,285

Paper
Code

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

3 code implementations • 9 May 2022 • Xu Tan, Jiawei Chen, Haohe Liu, Jian Cong, Chen Zhang, Yanqing Liu, Xi Wang, Yichong Leng, YuanHao Yi, Lei He, Frank Soong, Tao Qin, Sheng Zhao, Tie-Yan Liu

In this paper, we answer these questions by first defining the human-level quality based on the statistical significance of subjective measure and introducing appropriate guidelines to judge it, and then developing a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset.

Ranked #1 on Text-To-Speech Synthesis on LJSpeech (using extra training data)

Sentence Speech Synthesis +1

1,285

Paper
Code

VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration

1 code implementation • 12 Apr 2022 • Haohe Liu, Xubo Liu, Qiuqiang Kong, Qiao Tian, Yan Zhao, DeLiang Wang, Chuanzeng Huang, Yuxuan Wang

Speech restoration aims to remove distortions in speech signals.

Speech Denoising Speech Enhancement +1

906

Paper
Code

Separate What You Describe: Language-Queried Audio Source Separation

1 code implementation • 28 Mar 2022 • Xubo Liu, Haohe Liu, Qiuqiang Kong, Xinhao Mei, Jinzheng Zhao, Qiushi Huang, Mark D. Plumbley, Wenwu Wang

In this paper, we introduce the task of language-queried audio source separation (LASS), which aims to separate a target source from an audio mixture based on a natural language query of the target source (e. g., "a man tells a joke followed by people laughing").

AudioCaps Audio Source Separation

126

Paper
Code

Neural Vocoder is All You Need for Speech Super-resolution

1 code implementation • 28 Mar 2022 • Haohe Liu, Woosung Choi, Xubo Liu, Qiuqiang Kong, Qiao Tian, DeLiang Wang

In this paper, we propose a neural vocoder based speech super-resolution method (NVSR) that can handle a variety of input resolution and upsampling ratios.

Ranked #2 on Audio Super-Resolution on VCTK Multi-Speaker

Audio Super-Resolution Bandwidth Extension +1

118

Paper
Code

Leveraging Pre-trained BERT for Audio Captioning

no code implementations • 6 Mar 2022 • Xubo Liu, Xinhao Mei, Qiushi Huang, Jianyuan Sun, Jinzheng Zhao, Haohe Liu, Mark D. Plumbley, Volkan Kılıç, Wenwu Wang

BERT is a pre-trained language model that has been extensively used in Natural Language Processing (NLP) tasks.

AudioCaps Audio captioning +1

Paper
Add Code

CWS-PResUNet: Music Source Separation with Channel-wise Subband Phase-aware ResUNet

1 code implementation • 9 Dec 2021 • Haohe Liu, Qiuqiang Kong, Jiafeng Liu

On the MUSDB18HQ test set, we propose a 276-layer CWS-PResUNet and achieve state-of-the-art (SoTA) performance on vocals with an 8. 92 signal-to-distortion ratio (SDR) score.

Ranked #11 on Music Source Separation on MUSDB18-HQ

Music Source Separation

112

Paper
Code

Channel-wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music

1 code implementation • 12 Aug 2020 • Haohe Liu, Lei Xie, Jian Wu, Geng Yang

We aim to address the major issues in CNN-based high-resolution MSS model: high computational cost and weight sharing between distinctly different bands.

Audio and Speech Processing Sound

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.