Search Results for author: Qiuqiang Kong

Found 47 papers, 36 papers with code

WavCraft: Audio Editing and Generation with Natural Language Prompts

1 code implementation14 Mar 2024 Jinhua Liang, huan zhang, Haohe Liu, Yin Cao, Qiuqiang Kong, Xubo Liu, Wenwu Wang, Mark D. Plumbley, Huy Phan, Emmanouil Benetos

We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing.

In-Context Learning

Selective-Memory Meta-Learning with Environment Representations for Sound Event Localization and Detection

1 code implementation27 Dec 2023 Jinbo Hu, Yin Cao, Ming Wu, Qiuqiang Kong, Feiran Yang, Mark D. Plumbley, Jun Yang

In addition, we introduce environment representations to characterize different acoustic settings, enhancing the adaptability of our attenuation approach to various environments.

Meta-Learning Sound Event Localization and Detection

Joint Music and Language Attention Models for Zero-shot Music Tagging

no code implementations16 Oct 2023 Xingjian Du, Zhesong Yu, Jiaju Lin, Bilei Zhu, Qiuqiang Kong

However, previous music tagging research primarily focuses on close-set music tagging tasks which can not be generalized to new tags.

Audio Tagging Music Tagging

MERTech: Instrument Playing Technique Detection Using Self-Supervised Pretrained Model With Multi-Task Finetuning

1 code implementation15 Oct 2023 Dichucheng Li, Yinghao Ma, Weixing Wei, Qiuqiang Kong, Yulun Wu, Mingjin Che, Fan Xia, Emmanouil Benetos, Wei Li

Recognizing the significance of pitch in capturing the nuances of IPTs and the importance of onset in locating IPT events, we investigate multi-task finetuning with pitch and onset detection as auxiliary tasks.

Instrument Playing Technique Detection Self-Supervised Learning

Separate Anything You Describe

1 code implementation9 Aug 2023 Xubo Liu, Qiuqiang Kong, Yan Zhao, Haohe Liu, Yi Yuan, Yuzhuo Liu, Rui Xia, Yuxuan Wang, Mark D. Plumbley, Wenwu Wang

In this work, we introduce AudioSep, a foundation model for open-domain audio source separation with natural language queries.

Audio Source Separation Natural Language Queries +2

WavJourney: Compositional Audio Creation with Large Language Models

1 code implementation26 Jul 2023 Xubo Liu, Zhongkai Zhu, Haohe Liu, Yi Yuan, Meng Cui, Qiushi Huang, Jinhua Liang, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang

Subjective evaluations demonstrate the potential of WavJourney in crafting engaging storytelling audio content from text.

Audio Generation

A unified front-end framework for English text-to-speech synthesis

no code implementations18 May 2023 Zelin Ying, Chen Li, Yu Dong, Qiuqiang Kong, Qiao Tian, YuanYuan Huo, Yuxuan Wang

The front-end is a critical component of English text-to-speech (TTS) systems, responsible for extracting linguistic features that are essential for a text-to-speech model to synthesize speech, such as prosodies and phonemes.

Speech Synthesis Text-To-Speech Synthesis

Multi-level Temporal-channel Speaker Retrieval for Zero-shot Voice Conversion

no code implementations12 May 2023 Zhichao Wang, Liumeng Xue, Qiuqiang Kong, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang

Specifically, to flexibly adapt to the dynamic-variant speaker characteristic in the temporal and channel axis of the speech, we propose a novel fine-grained speaker modeling method, called temporal-channel retrieval (TCR), to find out when and where speaker information appears in speech.

Disentanglement Retrieval +2

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research

3 code implementations30 Mar 2023 Xinhao Mei, Chutong Meng, Haohe Liu, Qiuqiang Kong, Tom Ko, Chengqi Zhao, Mark D. Plumbley, Yuexian Zou, Wenwu Wang

To address this data scarcity issue, we introduce WavCaps, the first large-scale weakly-labelled audio captioning dataset, comprising approximately 400k audio clips with paired captions.

 Ranked #1 on Zero-Shot Environment Sound Classification on ESC-50 (using extra training data)

Audio captioning Event Detection +6

Jointist: Simultaneous Improvement of Multi-instrument Transcription and Music Source Separation via Joint Training

no code implementations1 Feb 2023 Kin Wai Cheuk, Keunwoo Choi, Qiuqiang Kong, Bochen Li, Minz Won, Ju-Chiang Wang, Yun-Ning Hung, Dorien Herremans

Jointist consists of an instrument recognition module that conditions the other two modules: a transcription module that outputs instrument-specific piano rolls, and a source separation module that utilizes instrument information and transcription results.

Chord Recognition Instrument Recognition +1

Ontology-aware Learning and Evaluation for Audio Tagging

1 code implementation22 Nov 2022 Haohe Liu, Qiuqiang Kong, Xubo Liu, Xinhao Mei, Wenwu Wang, Mark D. Plumbley

The proposed metric, ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation.

Audio Tagging

Binaural Rendering of Ambisonic Signals by Neural Networks

no code implementations4 Nov 2022 Yin Zhu, Qiuqiang Kong, Junjie Shi, Shilei Liu, Xuzhou Ye, Ju-Chiang Wang, Junping Zhang

Binaural rendering of ambisonic signals is of broad interest to virtual reality and immersive media.

Learning Temporal Resolution in Spectrogram for Audio Classification

1 code implementation4 Oct 2022 Haohe Liu, Xubo Liu, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley

The audio spectrogram is a time-frequency representation that has been widely used for audio classification.

Audio Classification General Classification

Simple Pooling Front-ends For Efficient Audio Classification

1 code implementation3 Oct 2022 Xubo Liu, Haohe Liu, Qiuqiang Kong, Xinhao Mei, Mark D. Plumbley, Wenwu Wang

Recently, there has been increasing interest in building efficient audio neural networks for on-device scenarios.

Audio Classification

Segment-level Metric Learning for Few-shot Bioacoustic Event Detection

1 code implementation15 Jul 2022 Haohe Liu, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley

In addition, we use transductive inference on the validation set during training for better adaptation to novel classes.

Event Detection Few-Shot Learning +2

Separate What You Describe: Language-Queried Audio Source Separation

1 code implementation28 Mar 2022 Xubo Liu, Haohe Liu, Qiuqiang Kong, Xinhao Mei, Jinzheng Zhao, Qiushi Huang, Mark D. Plumbley, Wenwu Wang

In this paper, we introduce the task of language-queried audio source separation (LASS), which aims to separate a target source from an audio mixture based on a natural language query of the target source (e. g., "a man tells a joke followed by people laughing").

AudioCaps Audio Source Separation

Neural Vocoder is All You Need for Speech Super-resolution

1 code implementation28 Mar 2022 Haohe Liu, Woosung Choi, Xubo Liu, Qiuqiang Kong, Qiao Tian, DeLiang Wang

In this paper, we propose a neural vocoder based speech super-resolution method (NVSR) that can handle a variety of input resolution and upsampling ratios.

Audio Super-Resolution Bandwidth Extension +1

CWS-PResUNet: Music Source Separation with Channel-wise Subband Phase-aware ResUNet

1 code implementation9 Dec 2021 Haohe Liu, Qiuqiang Kong, Jiafeng Liu

On the MUSDB18HQ test set, we propose a 276-layer CWS-PResUNet and achieve state-of-the-art (SoTA) performance on vocals with an 8. 92 signal-to-distortion ratio (SDR) score.

Music Source Separation

A Unified Model for Zero-shot Music Source Separation, Transcription and Synthesis

1 code implementation7 Aug 2021 Liwei Lin, Qiuqiang Kong, Junyan Jiang, Gus Xia

We propose a unified model for three inter-related tasks: 1) to \textit{separate} individual sound sources from a mixed music audio, 2) to \textit{transcribe} each sound source to MIDI notes, and 3) to\textit{ synthesize} new pieces based on the timbre of separated sources.

Disentanglement Music Source Separation +2

Time-domain Speech Enhancement with Generative Adversarial Learning

1 code implementation30 Mar 2021 Feiyang Xiao, Jian Guan, Qiuqiang Kong, Wenwu Wang

Speech enhancement aims to obtain speech signals with high intelligibility and quality from noisy speech.

Generative Adversarial Network Speech Enhancement

Large-Scale MIDI-based Composer Classification

no code implementations28 Oct 2020 Qiuqiang Kong, Keunwoo Choi, Yuxuan Wang

Music classification is a task to classify a music piece into labels such as genres or composers.

Classification General Classification +1

An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection

3 code implementations25 Oct 2020 Yin Cao, Turab Iqbal, Qiuqiang Kong, Fengyan An, Wenwu Wang, Mark D. Plumbley

Polyphonic sound event localization and detection (SELD), which jointly performs sound event detection (SED) and direction-of-arrival (DoA) estimation, detects the type and occurrence time of sound events as well as their corresponding DoA angles simultaneously.

Sound Audio and Speech Processing

GiantMIDI-Piano: A large-scale MIDI dataset for classical piano music

3 code implementations11 Oct 2020 Qiuqiang Kong, Bochen Li, Jitong Chen, Yuxuan Wang

In this article, we create a GiantMIDI-Piano (GP) dataset containing 38, 700, 838 transcribed notes and 10, 855 unique solo piano works composed by 2, 786 composers.

Information Retrieval Music Information Retrieval +1

High-resolution Piano Transcription with Pedals by Regressing Onset and Offset Times

3 code implementations5 Oct 2020 Qiuqiang Kong, Bochen Li, Xuchen Song, Yuan Wan, Yuxuan Wang

In addition, previous AMT systems are sensitive to the misaligned onset and offset labels of audio recordings.

Sound Audio and Speech Processing

Event-Independent Network for Polyphonic Sound Event Localization and Detection

2 code implementations30 Sep 2020 Yin Cao, Turab Iqbal, Qiuqiang Kong, Yue Zhong, Wenwu Wang, Mark D. Plumbley

In this paper, a novel event-independent network for polyphonic sound event localization and detection is proposed.

Audio and Speech Processing Sound

Audio Tagging by Cross Filtering Noisy Labels

no code implementations16 Jul 2020 Boqing Zhu, Kele Xu, Qiuqiang Kong, Huaimin Wang, Yuxing Peng

Yet, it is labor-intensive to accurately annotate large amount of audio data, and the dataset may contain noisy labels in the practical settings.

Audio Tagging Memorization +1

Learning with Out-of-Distribution Data for Audio Classification

1 code implementation11 Feb 2020 Turab Iqbal, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang

The proposed method uses an auxiliary classifier, trained on data that is known to be in-distribution, for detection and relabelling.

Audio Classification General Classification

Deep Learning Based Energy Disaggregation and On/Off Detection of Household Appliances

no code implementations3 Jul 2019 Jie Jiang, Qiuqiang Kong, Mark Plumbley, Nigel Gilbert

On the basis of energy disaggregation, we then investigate the performance of two deep-learning based frameworks for the task of on/off detection which aims at estimating whether an appliance is in operation or not.

Non-Intrusive Load Monitoring Total Energy

Single-Channel Signal Separation and Deconvolution with Generative Adversarial Networks

1 code implementation14 Jun 2019 Qiuqiang Kong, Yong Xu, Wenwu Wang, Philip J. B. Jackson, Mark D. Plumbley

Single-channel signal separation and deconvolution aims to separate and deconvolve individual sources from a single-channel mixture and is a challenging problem in which no prior knowledge of the mixing filters is available.

Generative Adversarial Network Image Inpainting

Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy

1 code implementation1 May 2019 Yin Cao, Qiuqiang Kong, Turab Iqbal, Fengyan An, Wenwu Wang, Mark D. Plumbley

In this paper, it is experimentally shown that the training information of SED is able to contribute to the direction of arrival estimation (DOAE).

Sound Audio and Speech Processing

Audio Tagging With Connectionist Temporal Classification Model Using Sequential Labelled Data

1 code implementation6 Aug 2018 Yuanbo Hou, Qiuqiang Kong, Shengchen Li

To use the order information of sound events, we propose sequential labelled data (SLD), where both the presence or absence and the order information of sound events are known.

Audio Tagging General Classification

Sound Event Detection and Time-Frequency Segmentation from Weakly Labelled Data

2 code implementations12 Apr 2018 Qiuqiang Kong, Yong Xu, Iwona Sobieraj, Wenwu Wang, Mark D. Plumbley

Sound event detection (SED) aims to detect when and recognize what sound events happen in an audio clip.

Sound Audio and Speech Processing

Multi-level Attention Model for Weakly Supervised Audio Classification

5 code implementations6 Mar 2018 Changsong Yu, Karim Said Barsim, Qiuqiang Kong, Bin Yang

The objective of audio classification is to predict the presence or absence of audio events in an audio clip.

Audio Classification

A joint separation-classification model for sound event detection of weakly labelled data

2 code implementations8 Nov 2017 Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark D. Plumbley

First, we propose a separation mapping from the time-frequency (T-F) representation of an audio to the T-F segmentation masks of the audio events.

Sound Audio and Speech Processing

Audio Set classification with attention model: A probabilistic perspective

5 code implementations2 Nov 2017 Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark D. Plumbley

Then the classification of a bag is the expectation of the classification output of the instances in the bag with respect to the learned probability measure.

Sound Audio and Speech Processing

Large-scale weakly supervised audio classification using gated convolutional neural network

3 code implementations1 Oct 2017 Yong Xu, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley

In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly supervised sound event detection task of Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 challenge.

Sound Audio and Speech Processing

Attention and Localization based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging

1 code implementation17 Mar 2017 Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley

Audio tagging aims to perform multi-label classification on audio chunks and it is a newly proposed task in the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge.

Sound

Convolutional Gated Recurrent Neural Network Incorporating Spatial Features for Audio Tagging

2 code implementations24 Feb 2017 Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley

In this paper, we propose to use a convolutional neural network (CNN) to extract robust features from mel-filter banks (MFBs), spectrograms or even raw waveforms for audio tagging.

Audio Tagging

A Joint Detection-Classification Model for Audio Tagging of Weakly Labelled Data

1 code implementation6 Oct 2016 Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark Plumbley

The labeling of an audio clip is often based on the audio events in the clip and no event level label is provided to the user.

Sound

Cannot find the paper you are looking for? You can Submit a new open access paper.