no code implementations • 21 May 2025 • Jinhua Liang, Yuanzhe Chen, Yi Yuan, Dongya Jia, Xiaobin Zhuang, Zhuo Chen, Yuping Wang, Yuxuan Wang
Editing sound with precision is a crucial yet underexplored challenge in audio content creation.
no code implementations • 30 Apr 2025 • huan zhang, Jinhua Liang, Huy Phan, Wenwu Wang, Emmanouil Benetos
In this paper, we use music generation as a case study to investigate the gap between automatic evaluation metrics and human preferences.
no code implementations • 12 Sep 2024 • Wen Qing Lim, Jinhua Liang, huan zhang
Music is inherently made up of complex structures, and representing them as graphs helps to capture multiple levels of relationships.
no code implementations • 12 Sep 2024 • Tanisha Hisariya, huan zhang, Jinhua Liang
Rapid advancements in artificial intelligence have significantly enhanced generative tasks involving music and images, employing both unimodal and multimodal approaches.
no code implementations • 5 Jul 2024 • huan zhang, Jinhua Liang, Simon Dixon
Our study investigates an approach for understanding musical performances through the lens of audio encoding models, focusing on the domain of solo Western classical piano music.
1 code implementation • 21 Jun 2024 • huan zhang, Shreyan Chowdhury, Carlos Eduardo Cancino-Chacón, Jinhua Liang, Simon Dixon, Gerhard Widmer
The perceptual-feature-conditioned generation and transferring capabilities of DExter are verified by a proxy model predicting perceptual characteristics of differently steered performances.
2 code implementations • 27 Mar 2024 • Jinhua Liang, Ines Nolasco, Burooj Ghani, Huy Phan, Emmanouil Benetos, Dan Stowell
A recent development in the field is the introduction of the task known as few-shot bioacoustic sound event detection, which aims to train a versatile animal sound detector using only a small set of audio samples.
1 code implementation • 14 Mar 2024 • Jinhua Liang, huan zhang, Haohe Liu, Yin Cao, Qiuqiang Kong, Xubo Liu, Wenwu Wang, Mark D. Plumbley, Huy Phan, Emmanouil Benetos
We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing.
2 code implementations • 30 Nov 2023 • Jinhua Liang, Xubo Liu, Wenwu Wang, Mark D. Plumbley, Huy Phan, Emmanouil Benetos
In this work, we introduce Acoustic Prompt Tuning (APT), a new adapter extending LLMs and VLMs to the audio domain by injecting audio embeddings to the input of LLMs, namely soft prompting.
1 code implementation • 26 Jul 2023 • Xubo Liu, Zhongkai Zhu, Haohe Liu, Yi Yuan, Meng Cui, Qiushi Huang, Jinhua Liang, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang
Subjective evaluations demonstrate the potential of WavJourney in crafting engaging storytelling audio content from text.
no code implementations • 28 May 2023 • Jinhua Liang, Xubo Liu, Haohe Liu, Huy Phan, Emmanouil Benetos, Mark D. Plumbley, Wenwu Wang
We presented the Treff adapter, a training-efficient adapter for CLAP, to boost zero-shot classification performance by making use of a small set of labelled data.
no code implementations • 7 Mar 2023 • Yi Yuan, Haohe Liu, Jinhua Liang, Xubo Liu, Mark D. Plumbley, Wenwu Wang
Deep neural networks have recently achieved breakthroughs in sound generation.
no code implementations • 2 Jul 2020 • Jinhua Liang, Tao Zhang, Guoqing Feng
Aiming at channel compression, a novel convolutional construction named compact convolution is proposed to embrace the progress in spatial convolution, channel grouping and pooling operation.