no code implementations • 21 Feb 2024 • Rui Zhou, Xian Li, Ying Fang, Xiaofei Li
In this work, we propose Mel-FullSubNet, a single-channel Mel-spectrogram denoising and dereverberation network for improving both speech quality and automatic speech recognition (ASR) performance.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 25 Sep 2023 • Di Liang, Nian Shao, Xiaofei Li
This work proposes a frame-wise online/streaming end-to-end neural diarization (FS-EEND) method in a frame-in-frame-out fashion.
1 code implementation • 15 Sep 2023 • Nian Shao, Xian Li, Xiaofei Li
In this work, we study the fine-tuning method of the pretrained models for SED.
Ranked #1 on Sound Event Detection on DESED (using extra training data)
1 code implementation • 15 Sep 2023 • Ying Fang, Xiaofei Li
Then, the feature frames with unimodal weights are integrated and further processed by a decoder.
Ranked #5 on Speech Recognition on AISHELL-1
1 code implementation • 15 Sep 2023 • Pengyu Wang, Xiaofei Li
In this work, we propose a generative dereverberation method.
no code implementations • 14 Sep 2023 • Jiahao Chen, Xiaofei Li
This study aims to develop an advanced high-frequency trading algorithm and compare the performance of three different mathematical models: the combination of the cross-entropy loss function and the quasi-Newton algorithm, the FCNN model, and the vector machine.
2 code implementations • 7 Jun 2023 • Xian Li, Nian Shao, Xiaofei Li
In order to tackle both clip-level and frame-level tasks, this paper proposes Audio Teacher-Student Transformer (ATST), with a clip-level version (named ATST-Clip) and a frame-level version (named ATST-Frame), responsible for learning clip-level and frame-level representations, respectively.
Ranked #8 on Audio Classification on AudioSet
1 code implementation • 31 May 2023 • Yabo Wang, Bing Yang, Xiaofei Li
Extracting direct-path spatial features is critical for sound source localization in adverse acoustic environments.
no code implementations • 19 Dec 2022 • Xiaofei Li, Daniel Wiechmann, Yu Qiao, Elma Kerz
In this paper we present our contribution to the TSAR-2022 Shared Task on Lexical Simplification of the EMNLP 2022 Workshop on Text Simplification, Accessibility, and Readability.
no code implementations • 19 Dec 2022 • Yu Qiao, Xiaofei Li, Daniel Wiechmann, Elma Kerz
State-of-the-art text simplification (TS) systems adopt end-to-end neural network models to directly generate the simplified version of the input text, and usually function as a blackbox.
2 code implementations • 18 Dec 2022 • Xiang Hao, Xiaofei Li
FullSubNet is our recently proposed real-time single-channel speech enhancement network that achieves outstanding performance on the Deep Noise Suppression (DNS) Challenge dataset.
1 code implementation • 16 Nov 2022 • Yujie Yang, Changsheng Quan, Xiaofei Li
In multichannel speech enhancement, both spectral and spatial information are vital for discriminating between speech and noise.
1 code implementation • 20 Oct 2022 • Rui Zhou, Wenye Zhu, Xiaofei Li
The proposed RTS target suppresses reverberation and meanwhile maintains the exponential decaying property of reverberation, which will ease the network training, and thus reduce signal distortion caused by the prediction error.
4 code implementations • 26 Apr 2022 • Xian Li, Xiaofei Li
Self-supervised learning (SSL) learns knowledge from a large amount of unlabeled data, and then transfers the knowledge to a specific problem with a limited number of labeled data.
Ranked #2 on Spoken Command Recognition on Speech Command v2
no code implementations • 19 Apr 2022 • Rui Zhou, Wenye Zhu, Xiaofei Li
The proposed RTS target suppresses reverberation and meanwhile maintains the exponential decaying property of reverberation, which will ease the network training, and thus reduce signal distortion caused by the prediction error.
2 code implementations • 21 Oct 2021 • Nian Shao, Erfan Loweimi, Xiaofei Li
Sound event detection (SED), as a core module of acoustic environmental analysis, suffers from the problem of data deficiency.
Ranked #4 on Sound Event Detection on DESED
no code implementations • 3 Aug 2021 • Tianwei Zhang, Huayan Zhang, Xiaofei Li, Junfeng Chen, Tin Lun Lam, Sethu Vijayakumar
Dynamic objects in the environment, such as people and other agents, lead to challenges for existing simultaneous localization and mapping (SLAM) approaches.
1 code implementation • 27 Jul 2021 • Siyuan Zhang, Xiaofei Li
This paper addresses the problem of microphone array generalization for deep-learning-based end-to-end multichannel speech enhancement.
no code implementations • 13 May 2021 • Xiaofei Li, Zhong Dong
Inspired by the conclusion that humans choose the visual cortex regions corresponding to the real size of an object to analyze its features when identifying objects in the real world, this paper presents a framework, SizeNet, which is based on both the real sizes and features of objects to solve object recognition problems.
6 code implementations • 29 Oct 2020 • Xiang Hao, Xiangdong Su, Radu Horaud, Xiaofei Li
In our proposed FullSubNet, we connect a pure full-band model and a pure sub-band model sequentially and use practical joint training to integrate these two types of models' advantages.
no code implementations • 29 May 2020 • Xiang Hao, Shixue Wen, Xiangdong Su, Yun Liu, Guanglai Gao, Xiaofei Li
In single-channel speech enhancement, methods based on full-band spectral features have been widely studied.
no code implementations • 31 Mar 2016 • Israel D. Gebru, Silèye Ba, Xiaofei Li, Radu Horaud
An audio-visual spatiotemporal diarization model is proposed.