Search Results for author: Qiuqiang Kong

Found 47 papers, 36 papers with code

WavCraft: Audio Editing and Generation with Natural Language Prompts

1 code implementation • 14 Mar 2024 • Jinhua Liang, huan zhang, Haohe Liu, Yin Cao, Qiuqiang Kong, Xubo Liu, Wenwu Wang, Mark D. Plumbley, Huy Phan, Emmanouil Benetos

We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing.

In-Context Learning

Paper
Code

Selective-Memory Meta-Learning with Environment Representations for Sound Event Localization and Detection

1 code implementation • 27 Dec 2023 • Jinbo Hu, Yin Cao, Ming Wu, Qiuqiang Kong, Feiran Yang, Mark D. Plumbley, Jun Yang

In addition, we introduce environment representations to characterize different acoustic settings, enhancing the adaptability of our attenuation approach to various environments.

Meta-Learning Sound Event Localization and Detection

Paper
Code

Joint Music and Language Attention Models for Zero-shot Music Tagging

no code implementations • 16 Oct 2023 • Xingjian Du, Zhesong Yu, Jiaju Lin, Bilei Zhu, Qiuqiang Kong

However, previous music tagging research primarily focuses on close-set music tagging tasks which can not be generalized to new tags.

Audio Tagging Music Tagging

Paper
Add Code

MERTech: Instrument Playing Technique Detection Using Self-Supervised Pretrained Model With Multi-Task Finetuning

1 code implementation • 15 Oct 2023 • Dichucheng Li, Yinghao Ma, Weixing Wei, Qiuqiang Kong, Yulun Wu, Mingjin Che, Fan Xia, Emmanouil Benetos, Wei Li

Recognizing the significance of pitch in capturing the nuances of IPTs and the importance of onset in locating IPT events, we investigate multi-task finetuning with pitch and onset detection as auxiliary tasks.

Instrument Playing Technique Detection Self-Supervised Learning

Paper
Code

AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

1 code implementation • 10 Aug 2023 • Haohe Liu, Qiao Tian, Yi Yuan, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Yuping Wang, Wenwu Wang, Yuxuan Wang, Mark D. Plumbley

Any audio can be translated into LOA based on AudioMAE, a self-supervised pre-trained representation learning model.

Ranked #3 on Audio Generation on AudioCaps

Audio Generation In-Context Learning +2

2,044

Paper
Code

Separate Anything You Describe

1 code implementation • 9 Aug 2023 • Xubo Liu, Qiuqiang Kong, Yan Zhao, Haohe Liu, Yi Yuan, Yuzhuo Liu, Rui Xia, Yuxuan Wang, Mark D. Plumbley, Wenwu Wang

In this work, we introduce AudioSep, a foundation model for open-domain audio source separation with natural language queries.

Audio Source Separation Natural Language Queries +2

1,427

Paper
Code

WavJourney: Compositional Audio Creation with Large Language Models

1 code implementation • 26 Jul 2023 • Xubo Liu, Zhongkai Zhu, Haohe Liu, Yi Yuan, Meng Cui, Qiushi Huang, Jinhua Liang, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang

Subjective evaluations demonstrate the potential of WavJourney in crafting engaging storytelling audio content from text.

Audio Generation

503

Paper
Code

A unified front-end framework for English text-to-speech synthesis

no code implementations • 18 May 2023 • Zelin Ying, Chen Li, Yu Dong, Qiuqiang Kong, Qiao Tian, YuanYuan Huo, Yuxuan Wang

The front-end is a critical component of English text-to-speech (TTS) systems, responsible for extracting linguistic features that are essential for a text-to-speech model to synthesize speech, such as prosodies and phonemes.

Speech Synthesis Text-To-Speech Synthesis

Paper
Add Code

Multi-level Temporal-channel Speaker Retrieval for Zero-shot Voice Conversion

no code implementations • 12 May 2023 • Zhichao Wang, Liumeng Xue, Qiuqiang Kong, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang

Specifically, to flexibly adapt to the dynamic-variant speaker characteristic in the temporal and channel axis of the speech, we propose a novel fine-grained speaker modeling method, called temporal-channel retrieval (TCR), to find out when and where speaker information appears in speech.

Disentanglement Retrieval +2

Paper
Add Code

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research

3 code implementations • 30 Mar 2023 • Xinhao Mei, Chutong Meng, Haohe Liu, Qiuqiang Kong, Tom Ko, Chengqi Zhao, Mark D. Plumbley, Yuexian Zou, Wenwu Wang

To address this data scarcity issue, we introduce WavCaps, the first large-scale weakly-labelled audio captioning dataset, comprising approximately 400k audio clips with paired captions.

Ranked #1 on Zero-Shot Environment Sound Classification on ESC-50 (using extra training data)

Audio captioning Event Detection +6

173

Paper
Code

Jointist: Simultaneous Improvement of Multi-instrument Transcription and Music Source Separation via Joint Training

no code implementations • 1 Feb 2023 • Kin Wai Cheuk, Keunwoo Choi, Qiuqiang Kong, Bochen Li, Minz Won, Ju-Chiang Wang, Yun-Ning Hung, Dorien Herremans

Jointist consists of an instrument recognition module that conditions the other two modules: a transcription module that outputs instrument-specific piano rolls, and a source separation module that utilizes instrument information and transcription results.

Chord Recognition Instrument Recognition +1

Paper
Add Code

Ontology-aware Learning and Evaluation for Audio Tagging

1 code implementation • 22 Nov 2022 • Haohe Liu, Qiuqiang Kong, Xubo Liu, Xinhao Mei, Wenwu Wang, Mark D. Plumbley

The proposed metric, ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation.

Audio Tagging

Paper
Code

Binaural Rendering of Ambisonic Signals by Neural Networks

no code implementations • 4 Nov 2022 • Yin Zhu, Qiuqiang Kong, Junjie Shi, Shilei Liu, Xuzhou Ye, Ju-Chiang Wang, Junping Zhang

Binaural rendering of ambisonic signals is of broad interest to virtual reality and immersive media.

Paper
Add Code

Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention

1 code implementation • 28 Oct 2022 • Xubo Liu, Qiushi Huang, Xinhao Mei, Haohe Liu, Qiuqiang Kong, Jianyuan Sun, Shengchen Li, Tom Ko, Yu Zhang, Lilian H. Tang, Mark D. Plumbley, Volkan Kılıç, Wenwu Wang

Audio captioning aims to generate text descriptions of audio clips.

AudioCaps Audio captioning +1

Paper
Code

Streaming Voice Conversion Via Intermediate Bottleneck Features And Non-streaming Teacher Guidance

no code implementations • 27 Oct 2022 • Yuanzhe Chen, Ming Tu, Tang Li, Xin Li, Qiuqiang Kong, Jiaxin Li, Zhichao Wang, Qiao Tian, Yuping Wang, Yuxuan Wang

In this paper, we propose to use intermediate bottleneck features (IBFs) to replace PPGs.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Learning Temporal Resolution in Spectrogram for Audio Classification

1 code implementation • 4 Oct 2022 • Haohe Liu, Xubo Liu, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley

The audio spectrogram is a time-frequency representation that has been widely used for audio classification.

Audio Classification General Classification

Paper
Code

Simple Pooling Front-ends For Efficient Audio Classification

1 code implementation • 3 Oct 2022 • Xubo Liu, Haohe Liu, Qiuqiang Kong, Xinhao Mei, Mark D. Plumbley, Wenwu Wang

Recently, there has been increasing interest in building efficient audio neural networks for on-device scenarios.

Audio Classification

Paper
Code

Sound Event Localization and Detection for Real Spatial Sound Scenes: Event-Independent Network and Data Augmentation Chains

1 code implementation • 5 Sep 2022 • Jinbo Hu, Yin Cao, Ming Wu, Qiuqiang Kong, Feiran Yang, Mark D. Plumbley, Jun Yang

Our system submitted to the DCASE 2022 Task 3 is based on our previous proposed Event-Independent Network V2 (EINV2) with a novel data augmentation method.

Data Augmentation Direction of Arrival Estimation +4

Paper
Code

Segment-level Metric Learning for Few-shot Bioacoustic Event Detection

1 code implementation • 15 Jul 2022 • Haohe Liu, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley

In addition, we use transductive inference on the validation set during training for better adaptation to novel classes.

Event Detection Few-Shot Learning +2

Paper
Code

Jointist: Joint Learning for Multi-instrument Transcription and Its Applications

no code implementations • 22 Jun 2022 • Kin Wai Cheuk, Keunwoo Choi, Qiuqiang Kong, Bochen Li, Minz Won, Amy Hung, Ju-Chiang Wang, Dorien Herremans

However, its novelty necessitates a new perspective on how to evaluate such a model.

Ranked #1 on Music Transcription on Slakh2100

Chord Recognition Instrument Recognition +1

Paper
Add Code

VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration

1 code implementation • 12 Apr 2022 • Haohe Liu, Xubo Liu, Qiuqiang Kong, Qiao Tian, Yan Zhao, DeLiang Wang, Chuanzeng Huang, Yuxuan Wang

Speech restoration aims to remove distortions in speech signals.

Speech Denoising Speech Enhancement +1

907

Paper
Code

Separate What You Describe: Language-Queried Audio Source Separation

1 code implementation • 28 Mar 2022 • Xubo Liu, Haohe Liu, Qiuqiang Kong, Xinhao Mei, Jinzheng Zhao, Qiushi Huang, Mark D. Plumbley, Wenwu Wang

In this paper, we introduce the task of language-queried audio source separation (LASS), which aims to separate a target source from an audio mixture based on a natural language query of the target source (e. g., "a man tells a joke followed by people laughing").

AudioCaps Audio Source Separation

126

Paper
Code

Neural Vocoder is All You Need for Speech Super-resolution

1 code implementation • 28 Mar 2022 • Haohe Liu, Woosung Choi, Xubo Liu, Qiuqiang Kong, Qiao Tian, DeLiang Wang

In this paper, we propose a neural vocoder based speech super-resolution method (NVSR) that can handle a variety of input resolution and upsampling ratios.

Ranked #2 on Audio Super-Resolution on VCTK Multi-Speaker

Audio Super-Resolution Bandwidth Extension +1

119

Paper
Code

CWS-PResUNet: Music Source Separation with Channel-wise Subband Phase-aware ResUNet

1 code implementation • 9 Dec 2021 • Haohe Liu, Qiuqiang Kong, Jiafeng Liu

On the MUSDB18HQ test set, we propose a 276-layer CWS-PResUNet and achieve state-of-the-art (SoTA) performance on vocals with an 8. 92 signal-to-distortion ratio (SDR) score.

Ranked #11 on Music Source Separation on MUSDB18-HQ

Music Source Separation

113

Paper
Code

A Unified Model for Zero-shot Music Source Separation, Transcription and Synthesis

1 code implementation • 7 Aug 2021 • Liwei Lin, Qiuqiang Kong, Junyan Jiang, Gus Xia

We propose a unified model for three inter-related tasks: 1) to \textit{separate} individual sound sources from a mixed music audio, 2) to \textit{transcribe} each sound source to MIDI notes, and 3) to\textit{ synthesize} new pieces based on the timbre of separated sources.

Disentanglement Music Source Separation +2

Paper
Code

Time-domain Speech Enhancement with Generative Adversarial Learning

1 code implementation • 30 Mar 2021 • Feiyang Xiao, Jian Guan, Qiuqiang Kong, Wenwu Wang

Speech enhancement aims to obtain speech signals with high intelligibility and quality from noisy speech.

Generative Adversarial Network Speech Enhancement

Paper
Code

Large-Scale MIDI-based Composer Classification

no code implementations • 28 Oct 2020 • Qiuqiang Kong, Keunwoo Choi, Yuxuan Wang

Music classification is a task to classify a music piece into labels such as genres or composers.

Classification General Classification +1

Paper
Add Code

An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection

3 code implementations • 25 Oct 2020 • Yin Cao, Turab Iqbal, Qiuqiang Kong, Fengyan An, Wenwu Wang, Mark D. Plumbley

Polyphonic sound event localization and detection (SELD), which jointly performs sound event detection (SED) and direction-of-arrival (DoA) estimation, detects the type and occurrence time of sound events as well as their corresponding DoA angles simultaneously.

Sound Audio and Speech Processing

Paper
Code

GiantMIDI-Piano: A large-scale MIDI dataset for classical piano music

3 code implementations • 11 Oct 2020 • Qiuqiang Kong, Bochen Li, Jitong Chen, Yuxuan Wang

In this article, we create a GiantMIDI-Piano (GP) dataset containing 38, 700, 838 transcribed notes and 10, 855 unique solo piano works composed by 2, 786 composers.

Information Retrieval Music Information Retrieval +1

1,611

Paper
Code

High-resolution Piano Transcription with Pedals by Regressing Onset and Offset Times

3 code implementations • 5 Oct 2020 • Qiuqiang Kong, Bochen Li, Xuchen Song, Yuan Wan, Yuxuan Wang

In addition, previous AMT systems are sensitive to the misaligned onset and offset labels of audio recordings.

Sound Audio and Speech Processing

1,534

Paper
Code

Event-Independent Network for Polyphonic Sound Event Localization and Detection

2 code implementations • 30 Sep 2020 • Yin Cao, Turab Iqbal, Qiuqiang Kong, Yue Zhong, Wenwu Wang, Mark D. Plumbley

In this paper, a novel event-independent network for polyphonic sound event localization and detection is proposed.

Audio and Speech Processing Sound

Paper
Code

DD-CNN: Depthwise Disout Convolutional Neural Network for Low-complexity Acoustic Scene Classification

no code implementations • 25 Jul 2020 • Jingqiao Zhao, Zhen-Hua Feng, Qiuqiang Kong, Xiaoning Song, Xiao-Jun Wu

This paper presents a Depthwise Disout Convolutional Neural Network (DD-CNN) for the detection and classification of urban acoustic scenes.

Acoustic Scene Classification Classification +2

Paper
Add Code

Audio Tagging by Cross Filtering Noisy Labels

no code implementations • 16 Jul 2020 • Boqing Zhu, Kele Xu, Qiuqiang Kong, Huaimin Wang, Yuxing Peng

Yet, it is labor-intensive to accurately annotate large amount of audio data, and the dataset may contain noisy labels in the practical settings.

Audio Tagging Memorization +1

Paper
Add Code

Learning with Out-of-Distribution Data for Audio Classification

1 code implementation • 11 Feb 2020 • Turab Iqbal, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang

The proposed method uses an auxiliary classifier, trained on data that is known to be in-distribution, for detection and relabelling.

Audio Classification General Classification

Paper
Code

Deep Learning Based Energy Disaggregation and On/Off Detection of Household Appliances

no code implementations • 3 Jul 2019 • Jie Jiang, Qiuqiang Kong, Mark Plumbley, Nigel Gilbert

On the basis of energy disaggregation, we then investigate the performance of two deep-learning based frameworks for the task of on/off detection which aims at estimating whether an appliance is in operation or not.

Non-Intrusive Load Monitoring Total Energy

Paper
Add Code

Single-Channel Signal Separation and Deconvolution with Generative Adversarial Networks

1 code implementation • 14 Jun 2019 • Qiuqiang Kong, Yong Xu, Wenwu Wang, Philip J. B. Jackson, Mark D. Plumbley

Single-channel signal separation and deconvolution aims to separate and deconvolve individual sources from a single-channel mixture and is a challenging problem in which no prior knowledge of the mixing filters is available.

Generative Adversarial Network Image Inpainting

Paper
Code

Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy

1 code implementation • 1 May 2019 • Yin Cao, Qiuqiang Kong, Turab Iqbal, Fengyan An, Wenwu Wang, Mark D. Plumbley

In this paper, it is experimentally shown that the training information of SED is able to contribute to the direction of arrival estimation (DOAE).

Sound Audio and Speech Processing

Paper
Code

General audio tagging with ensembling convolutional neural network and statistical features

2 code implementations • 30 Oct 2018 • Kele Xu, Boqing Zhu, Qiuqiang Kong, Haibo Mi, Bo Ding, Dezhi Wang, Huaimin Wang

Audio tagging is challenging due to the limited size of data and noisy labels.

Audio Tagging Descriptive +2

Paper
Code

Audio Tagging With Connectionist Temporal Classification Model Using Sequential Labelled Data

1 code implementation • 6 Aug 2018 • Yuanbo Hou, Qiuqiang Kong, Shengchen Li

To use the order information of sound events, we propose sequential labelled data (SLD), where both the presence or absence and the order information of sound events are known.

Audio Tagging General Classification

Paper
Code

Sound Event Detection and Time-Frequency Segmentation from Weakly Labelled Data

2 code implementations • 12 Apr 2018 • Qiuqiang Kong, Yong Xu, Iwona Sobieraj, Wenwu Wang, Mark D. Plumbley

Sound event detection (SED) aims to detect when and recognize what sound events happen in an audio clip.

Sound Audio and Speech Processing

Paper
Code

Multi-level Attention Model for Weakly Supervised Audio Classification

5 code implementations • 6 Mar 2018 • Changsong Yu, Karim Said Barsim, Qiuqiang Kong, Bin Yang

The objective of audio classification is to predict the presence or absence of audio events in an audio clip.

Audio Classification

150

Paper
Code

A joint separation-classification model for sound event detection of weakly labelled data

2 code implementations • 8 Nov 2017 • Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark D. Plumbley

First, we propose a separation mapping from the time-frequency (T-F) representation of an audio to the T-F segmentation masks of the audio events.

Sound Audio and Speech Processing

Paper
Code

Audio Set classification with attention model: A probabilistic perspective

5 code implementations • 2 Nov 2017 • Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark D. Plumbley

Then the classification of a bag is the expectation of the classification output of the instances in the bag with respect to the learned probability measure.

Sound Audio and Speech Processing

150

Paper
Code

Large-scale weakly supervised audio classification using gated convolutional neural network

3 code implementations • 1 Oct 2017 • Yong Xu, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley

In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly supervised sound event detection task of Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 challenge.

Sound Audio and Speech Processing

Paper
Code

Attention and Localization based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging

1 code implementation • 17 Mar 2017 • Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley

Audio tagging aims to perform multi-label classification on audio chunks and it is a newly proposed task in the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge.

Sound

Paper
Code

Convolutional Gated Recurrent Neural Network Incorporating Spatial Features for Audio Tagging

2 code implementations • 24 Feb 2017 • Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley

In this paper, we propose to use a convolutional neural network (CNN) to extract robust features from mel-filter banks (MFBs), spectrograms or even raw waveforms for audio tagging.

Audio Tagging

Paper
Code

A Joint Detection-Classification Model for Audio Tagging of Weakly Labelled Data

1 code implementation • 6 Oct 2016 • Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark Plumbley

The labeling of an audio clip is often based on the audio events in the clip and no event level label is provided to the user.

Sound

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.