no code implementations • 16 Oct 2023 • Xingjian Du, Zhesong Yu, Jiaju Lin, Bilei Zhu, Qiuqiang Kong
However, previous music tagging research primarily focuses on close-set music tagging tasks which can not be generalized to new tags.
no code implementations • 15 Oct 2023 • Dichucheng Li, Yinghao Ma, Weixing Wei, Qiuqiang Kong, Yulun Wu, Mingjin Che, Fan Xia, Emmanouil Benetos, Wei Li
Recognizing the significance of pitch in capturing the nuances of IPTs and the importance of onset in locating IPT events, we investigate multi-task finetuning with pitch and onset detection as auxiliary tasks.
Instrument Playing Technique Detection
Self-Supervised Learning
1 code implementation • 10 Aug 2023 • Haohe Liu, Qiao Tian, Yi Yuan, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Yuping Wang, Wenwu Wang, Yuxuan Wang, Mark D. Plumbley
Any audio can be translated into LOA based on AudioMAE, a self-supervised pre-trained representation learning model.
Ranked #2 on
Audio Generation
on AudioCaps
1 code implementation • 9 Aug 2023 • Xubo Liu, Qiuqiang Kong, Yan Zhao, Haohe Liu, Yi Yuan, Yuzhuo Liu, Rui Xia, Yuxuan Wang, Mark D. Plumbley, Wenwu Wang
In this work, we introduce AudioSep, a foundation model for open-domain audio source separation with natural language queries.
1 code implementation • 26 Jul 2023 • Xubo Liu, Zhongkai Zhu, Haohe Liu, Yi Yuan, Meng Cui, Qiushi Huang, Jinhua Liang, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang
Subjective evaluations demonstrate the potential of WavJourney in crafting engaging storytelling audio content from text.
no code implementations • 18 May 2023 • Zelin Ying, Chen Li, Yu Dong, Qiuqiang Kong, Qiao Tian, YuanYuan Huo, Yuxuan Wang
The front-end is a critical component of English text-to-speech (TTS) systems, responsible for extracting linguistic features that are essential for a text-to-speech model to synthesize speech, such as prosodies and phonemes.
no code implementations • 12 May 2023 • Zhichao Wang, Liumeng Xue, Qiuqiang Kong, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang
Specifically, to flexibly adapt to the dynamic-variant speaker characteristic in the temporal and channel axis of the speech, we propose a novel fine-grained speaker modeling method, called temporal-channel retrieval (TCR), to find out when and where speaker information appears in speech.
2 code implementations • 30 Mar 2023 • Xinhao Mei, Chutong Meng, Haohe Liu, Qiuqiang Kong, Tom Ko, Chengqi Zhao, Mark D. Plumbley, Yuexian Zou, Wenwu Wang
To address this data scarcity issue, we introduce WavCaps, the first large-scale weakly-labelled audio captioning dataset, comprising approximately 400k audio clips with paired captions.
Ranked #1 on
Zero-shot Text to Audio Retrieval
on AudioCaps
(using extra training data)
no code implementations • 1 Feb 2023 • Kin Wai Cheuk, Keunwoo Choi, Qiuqiang Kong, Bochen Li, Minz Won, Ju-Chiang Wang, Yun-Ning Hung, Dorien Herremans
Jointist consists of an instrument recognition module that conditions the other two modules: a transcription module that outputs instrument-specific piano rolls, and a source separation module that utilizes instrument information and transcription results.
1 code implementation • 22 Nov 2022 • Haohe Liu, Qiuqiang Kong, Xubo Liu, Xinhao Mei, Wenwu Wang, Mark D. Plumbley
The proposed metric, ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation.
no code implementations • 4 Nov 2022 • Yin Zhu, Qiuqiang Kong, Junjie Shi, Shilei Liu, Xuzhou Ye, Ju-Chiang Wang, Junping Zhang
Binaural rendering of ambisonic signals is of broad interest to virtual reality and immersive media.
1 code implementation • 28 Oct 2022 • Xubo Liu, Qiushi Huang, Xinhao Mei, Haohe Liu, Qiuqiang Kong, Jianyuan Sun, Shengchen Li, Tom Ko, Yu Zhang, Lilian H. Tang, Mark D. Plumbley, Volkan Kılıç, Wenwu Wang
Audio captioning aims to generate text descriptions of audio clips.
no code implementations • 27 Oct 2022 • Yuanzhe Chen, Ming Tu, Tang Li, Xin Li, Qiuqiang Kong, Jiaxin Li, Zhichao Wang, Qiao Tian, Yuping Wang, Yuxuan Wang
In this paper, we propose to use intermediate bottleneck features (IBFs) to replace PPGs.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 4 Oct 2022 • Haohe Liu, Xubo Liu, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley
Starting from a high-temporal-resolution spectrogram such as one-millisecond hop size, we show that DiffRes can improve classification accuracy with the same computational complexity.
1 code implementation • 3 Oct 2022 • Xubo Liu, Haohe Liu, Qiuqiang Kong, Xinhao Mei, Mark D. Plumbley, Wenwu Wang
Recently, there has been increasing interest in building efficient audio neural networks for on-device scenarios.
1 code implementation • 5 Sep 2022 • Jinbo Hu, Yin Cao, Ming Wu, Qiuqiang Kong, Feiran Yang, Mark D. Plumbley, Jun Yang
Our system submitted to the DCASE 2022 Task 3 is based on our previous proposed Event-Independent Network V2 (EINV2) with a novel data augmentation method.
1 code implementation • 15 Jul 2022 • Haohe Liu, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley
In addition, we use transductive inference on the validation set during training for better adaptation to novel classes.
no code implementations • 22 Jun 2022 • Kin Wai Cheuk, Keunwoo Choi, Qiuqiang Kong, Bochen Li, Minz Won, Amy Hung, Ju-Chiang Wang, Dorien Herremans
However, its novelty necessitates a new perspective on how to evaluate such a model.
1 code implementation • 12 Apr 2022 • Haohe Liu, Xubo Liu, Qiuqiang Kong, Qiao Tian, Yan Zhao, DeLiang Wang, Chuanzeng Huang, Yuxuan Wang
Speech restoration aims to remove distortions in speech signals.
1 code implementation • 28 Mar 2022 • Xubo Liu, Haohe Liu, Qiuqiang Kong, Xinhao Mei, Jinzheng Zhao, Qiushi Huang, Mark D. Plumbley, Wenwu Wang
In this paper, we introduce the task of language-queried audio source separation (LASS), which aims to separate a target source from an audio mixture based on a natural language query of the target source (e. g., "a man tells a joke followed by people laughing").
1 code implementation • 28 Mar 2022 • Haohe Liu, Woosung Choi, Xubo Liu, Qiuqiang Kong, Qiao Tian, DeLiang Wang
In this paper, we propose a neural vocoder based speech super-resolution method (NVSR) that can handle a variety of input resolution and upsampling ratios.
Ranked #2 on
Audio Super-Resolution
on VCTK Multi-Speaker
1 code implementation • 9 Dec 2021 • Haohe Liu, Qiuqiang Kong, Jiafeng Liu
On the MUSDB18HQ test set, we propose a 276-layer CWS-PResUNet and achieve state-of-the-art (SoTA) performance on vocals with an 8. 92 signal-to-distortion ratio (SDR) score.
Ranked #9 on
Music Source Separation
on MUSDB18-HQ
1 code implementation • 7 Aug 2021 • Liwei Lin, Qiuqiang Kong, Junyan Jiang, Gus Xia
We propose a unified model for three inter-related tasks: 1) to \textit{separate} individual sound sources from a mixed music audio, 2) to \textit{transcribe} each sound source to MIDI notes, and 3) to\textit{ synthesize} new pieces based on the timbre of separated sources.
1 code implementation • 30 Mar 2021 • Feiyang Xiao, Jian Guan, Qiuqiang Kong, Wenwu Wang
Speech enhancement aims to obtain speech signals with high intelligibility and quality from noisy speech.
no code implementations • 28 Oct 2020 • Qiuqiang Kong, Keunwoo Choi, Yuxuan Wang
Music classification is a task to classify a music piece into labels such as genres or composers.
3 code implementations • 25 Oct 2020 • Yin Cao, Turab Iqbal, Qiuqiang Kong, Fengyan An, Wenwu Wang, Mark D. Plumbley
Polyphonic sound event localization and detection (SELD), which jointly performs sound event detection (SED) and direction-of-arrival (DoA) estimation, detects the type and occurrence time of sound events as well as their corresponding DoA angles simultaneously.
Sound Audio and Speech Processing
3 code implementations • 11 Oct 2020 • Qiuqiang Kong, Bochen Li, Jitong Chen, Yuxuan Wang
In this article, we create a GiantMIDI-Piano (GP) dataset containing 38, 700, 838 transcribed notes and 10, 855 unique solo piano works composed by 2, 786 composers.
3 code implementations • 5 Oct 2020 • Qiuqiang Kong, Bochen Li, Xuchen Song, Yuan Wan, Yuxuan Wang
In addition, previous AMT systems are sensitive to the misaligned onset and offset labels of audio recordings.
Sound Audio and Speech Processing
2 code implementations • 30 Sep 2020 • Yin Cao, Turab Iqbal, Qiuqiang Kong, Yue Zhong, Wenwu Wang, Mark D. Plumbley
In this paper, a novel event-independent network for polyphonic sound event localization and detection is proposed.
Audio and Speech Processing Sound
no code implementations • 25 Jul 2020 • Jingqiao Zhao, Zhen-Hua Feng, Qiuqiang Kong, Xiaoning Song, Xiao-Jun Wu
This paper presents a Depthwise Disout Convolutional Neural Network (DD-CNN) for the detection and classification of urban acoustic scenes.
no code implementations • 16 Jul 2020 • Boqing Zhu, Kele Xu, Qiuqiang Kong, Huaimin Wang, Yuxing Peng
Yet, it is labor-intensive to accurately annotate large amount of audio data, and the dataset may contain noisy labels in the practical settings.
1 code implementation • 11 Feb 2020 • Turab Iqbal, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang
The proposed method uses an auxiliary classifier, trained on data that is known to be in-distribution, for detection and relabelling.
no code implementations • 3 Jul 2019 • Jie Jiang, Qiuqiang Kong, Mark Plumbley, Nigel Gilbert
On the basis of energy disaggregation, we then investigate the performance of two deep-learning based frameworks for the task of on/off detection which aims at estimating whether an appliance is in operation or not.
1 code implementation • 14 Jun 2019 • Qiuqiang Kong, Yong Xu, Wenwu Wang, Philip J. B. Jackson, Mark D. Plumbley
Single-channel signal separation and deconvolution aims to separate and deconvolve individual sources from a single-channel mixture and is a challenging problem in which no prior knowledge of the mixing filters is available.
1 code implementation • 1 May 2019 • Yin Cao, Qiuqiang Kong, Turab Iqbal, Fengyan An, Wenwu Wang, Mark D. Plumbley
In this paper, it is experimentally shown that the training information of SED is able to contribute to the direction of arrival estimation (DOAE).
Sound Audio and Speech Processing
2 code implementations • 30 Oct 2018 • Kele Xu, Boqing Zhu, Qiuqiang Kong, Haibo Mi, Bo Ding, Dezhi Wang, Huaimin Wang
Audio tagging is challenging due to the limited size of data and noisy labels.
1 code implementation • 6 Aug 2018 • Yuanbo Hou, Qiuqiang Kong, Shengchen Li
To use the order information of sound events, we propose sequential labelled data (SLD), where both the presence or absence and the order information of sound events are known.
2 code implementations • 12 Apr 2018 • Qiuqiang Kong, Yong Xu, Iwona Sobieraj, Wenwu Wang, Mark D. Plumbley
Sound event detection (SED) aims to detect when and recognize what sound events happen in an audio clip.
Sound Audio and Speech Processing
5 code implementations • 6 Mar 2018 • Changsong Yu, Karim Said Barsim, Qiuqiang Kong, Bin Yang
The objective of audio classification is to predict the presence or absence of audio events in an audio clip.
2 code implementations • 8 Nov 2017 • Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark D. Plumbley
First, we propose a separation mapping from the time-frequency (T-F) representation of an audio to the T-F segmentation masks of the audio events.
Sound Audio and Speech Processing
5 code implementations • 2 Nov 2017 • Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark D. Plumbley
Then the classification of a bag is the expectation of the classification output of the instances in the bag with respect to the learned probability measure.
Sound Audio and Speech Processing
3 code implementations • 1 Oct 2017 • Yong Xu, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley
In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly supervised sound event detection task of Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 challenge.
Sound Audio and Speech Processing
1 code implementation • 17 Mar 2017 • Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley
Audio tagging aims to perform multi-label classification on audio chunks and it is a newly proposed task in the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge.
Sound
2 code implementations • 24 Feb 2017 • Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley
In this paper, we propose to use a convolutional neural network (CNN) to extract robust features from mel-filter banks (MFBs), spectrograms or even raw waveforms for audio tagging.
1 code implementation • 6 Oct 2016 • Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark Plumbley
The labeling of an audio clip is often based on the audio events in the clip and no event level label is provided to the user.
Sound