Search Results for author: Xiaofei Li

Found 22 papers, 13 papers with code

Mel-FullSubNet: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR

no code implementations21 Feb 2024 Rui Zhou, Xian Li, Ying Fang, Xiaofei Li

In this work, we propose Mel-FullSubNet, a single-channel Mel-spectrogram denoising and dereverberation network for improving both speech quality and automatic speech recognition (ASR) performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based attractors

1 code implementation25 Sep 2023 Di Liang, Nian Shao, Xiaofei Li

This work proposes a frame-wise online/streaming end-to-end neural diarization (FS-EEND) method in a frame-in-frame-out fashion.

speaker-diarization Speaker Diarization

Fine-tune the pretrained ATST model for sound event detection

1 code implementation15 Sep 2023 Nian Shao, Xian Li, Xiaofei Li

In this work, we study the fine-tuning method of the pretrained models for SED.

 Ranked #1 on Sound Event Detection on DESED (using extra training data)

Event Detection Self-Supervised Learning +1

Unimodal Aggregation for CTC-based Speech Recognition

1 code implementation15 Sep 2023 Ying Fang, Xiaofei Li

Then, the feature frames with unimodal weights are integrated and further processed by a decoder.

Automatic Speech Recognition speech-recognition

Analysis of frequent trading effects of various machine learning models

no code implementations14 Sep 2023 Jiahao Chen, Xiaofei Li

This study aims to develop an advanced high-frequency trading algorithm and compare the performance of three different mathematical models: the combination of the cross-entropy loss function and the quasi-Newton algorithm, the FCNN model, and the vector machine.

Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks

2 code implementations7 Jun 2023 Xian Li, Nian Shao, Xiaofei Li

In order to tackle both clip-level and frame-level tasks, this paper proposes Audio Teacher-Student Transformer (ATST), with a clip-level version (named ATST-Clip) and a frame-level version (named ATST-Frame), responsible for learning clip-level and frame-level representations, respectively.

Audio Classification Audio Tagging +8

FN-SSL: Full-Band and Narrow-Band Fusion for Sound Source Localization

1 code implementation31 May 2023 Yabo Wang, Bing Yang, Xiaofei Li

Extracting direct-path spatial features is critical for sound source localization in adverse acoustic environments.

MANTIS at TSAR-2022 Shared Task: Improved Unsupervised Lexical Simplification with Pretrained Encoders

no code implementations19 Dec 2022 Xiaofei Li, Daniel Wiechmann, Yu Qiao, Elma Kerz

In this paper we present our contribution to the TSAR-2022 Shared Task on Lexical Simplification of the EMNLP 2022 Workshop on Text Simplification, Accessibility, and Readability.

Language Modelling Lexical Simplification +4

(Psycho-)Linguistic Features Meet Transformer Models for Improved Explainable and Controllable Text Simplification

no code implementations19 Dec 2022 Yu Qiao, Xiaofei Li, Daniel Wiechmann, Elma Kerz

State-of-the-art text simplification (TS) systems adopt end-to-end neural network models to directly generate the simplified version of the input text, and usually function as a blackbox.

Text Simplification

Fast FullSubNet: Accelerate Full-band and Sub-band Fusion Model for Single-channel Speech Enhancement

2 code implementations18 Dec 2022 Xiang Hao, Xiaofei Li

FullSubNet is our recently proposed real-time single-channel speech enhancement network that achieves outstanding performance on the Deep Noise Suppression (DNS) Challenge dataset.

Computational Efficiency Speech Enhancement

McNet: Fuse Multiple Cues for Multichannel Speech Enhancement

1 code implementation16 Nov 2022 Yujie Yang, Changsheng Quan, Xiaofei Li

In multichannel speech enhancement, both spectral and spatial information are vital for discriminating between speech and noise.

Speech Enhancement

Speech Dereverberation with a Reverberation Time Shortening Target

1 code implementation20 Oct 2022 Rui Zhou, Wenye Zhu, Xiaofei Li

The proposed RTS target suppresses reverberation and meanwhile maintains the exponential decaying property of reverberation, which will ease the network training, and thus reduce signal distortion caused by the prediction error.

Denoising Speech Denoising +1

ATST: Audio Representation Learning with Teacher-Student Transformer

4 code implementations26 Apr 2022 Xian Li, Xiaofei Li

Self-supervised learning (SSL) learns knowledge from a large amount of unlabeled data, and then transfers the knowledge to a specific problem with a limited number of labeled data.

Audio Classification Instrument Recognition +5

Speech Dereverberation with A Reverberation Time Shortening Target

no code implementations19 Apr 2022 Rui Zhou, Wenye Zhu, Xiaofei Li

The proposed RTS target suppresses reverberation and meanwhile maintains the exponential decaying property of reverberation, which will ease the network training, and thus reduce signal distortion caused by the prediction error.

Denoising Speech Denoising +1

RCT: Random Consistency Training for Semi-supervised Sound Event Detection

2 code implementations21 Oct 2021 Nian Shao, Erfan Loweimi, Xiaofei Li

Sound event detection (SED), as a core module of acoustic environmental analysis, suffers from the problem of data deficiency.

Data Augmentation Event Detection +1

AcousticFusion: Fusing Sound Source Localization to Visual SLAM in Dynamic Environments

no code implementations3 Aug 2021 Tianwei Zhang, Huayan Zhang, Xiaofei Li, Junfeng Chen, Tin Lun Lam, Sethu Vijayakumar

Dynamic objects in the environment, such as people and other agents, lead to challenges for existing simultaneous localization and mapping (SLAM) approaches.

Depth Estimation Object +1

Microphone Array Generalization for Multichannel Narrowband Deep Speech Enhancement

1 code implementation27 Jul 2021 Siyuan Zhang, Xiaofei Li

This paper addresses the problem of microphone array generalization for deep-learning-based end-to-end multichannel speech enhancement.

Speech Enhancement

SizeNet: Object Recognition via Object Real Size-based Convolutional Networks

no code implementations13 May 2021 Xiaofei Li, Zhong Dong

Inspired by the conclusion that humans choose the visual cortex regions corresponding to the real size of an object to analyze its features when identifying objects in the real world, this paper presents a framework, SizeNet, which is based on both the real sizes and features of objects to solve object recognition problems.

Object Object Recognition

FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement

6 code implementations29 Oct 2020 Xiang Hao, Xiangdong Su, Radu Horaud, Xiaofei Li

In our proposed FullSubNet, we connect a pure full-band model and a pure sub-band model sequentially and use practical joint training to integrate these two types of models' advantages.

Speech Enhancement

Sub-Band Knowledge Distillation Framework for Speech Enhancement

no code implementations29 May 2020 Xiang Hao, Shixue Wen, Xiangdong Su, Yun Liu, Guanglai Gao, Xiaofei Li

In single-channel speech enhancement, methods based on full-band spectral features have been widely studied.

Knowledge Distillation Speech Enhancement

Cannot find the paper you are looking for? You can Submit a new open access paper.