Search Results for author: Yiwen Shao

Found 9 papers, 3 papers with code

RIR-SF: Room Impulse Response Based Spatial Feature for Multi-channel Multi-talker ASR

no code implementations31 Oct 2023 Yiwen Shao, Shi-Xiong Zhang, Dong Yu

Multi-channel multi-talker automatic speech recognition (ASR) presents ongoing challenges within the speech community, particularly when confronted with significant reverberation effects.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

UniX-Encoder: A Universal $X$-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing

no code implementations25 Oct 2023 Zili Huang, Yiwen Shao, Shi-Xiong Zhang, Dong Yu

2) Multi-Task Capability: Beyond the single-task focus of previous systems, UniX-Encoder acts as a robust upstream model, adeptly extracting features for diverse tasks including ASR and speaker recognition.

speaker-diarization Speaker Diarization +3

Challenges and Insights: Exploring 3D Spatial Features and Complex Networks on the MISP Dataset

no code implementations5 Oct 2023 Yiwen Shao

Multi-channel multi-talker speech recognition presents formidable challenges in the realm of speech processing, marked by issues such as background noise, reverberation, and overlapping speech.

speech-recognition Speech Recognition

Multi-Channel Multi-Speaker ASR Using 3D Spatial Feature

no code implementations22 Nov 2021 Yiwen Shao, Shi-Xiong Zhang, Dong Yu

Experimental results show that 1) the proposed ALL-In-One model achieved a comparable error rate to the pipelined system while reducing the inference time by half; 2) the proposed 3D spatial feature significantly outperformed (31\% CERR) all previous works of using the 1D directional information in both paradigms.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Adversarial Attacks and Defenses for Speech Recognition Systems

no code implementations31 Mar 2021 Piotr Żelasko, Sonal Joshi, Yiwen Shao, Jesus Villalba, Jan Trmal, Najim Dehak, Sanjeev Khudanpur

We investigate two threat models: a denial-of-service scenario where fast gradient-sign method (FGSM) or weak projected gradient descent (PGD) attacks are used to degrade the model's word error rate (WER); and a targeted scenario where a more potent imperceptible attack forces the system to recognize a specific phrase.

Adversarial Robustness Automatic Speech Recognition +2

PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR

1 code implementation20 May 2020 Yiwen Shao, Yiming Wang, Daniel Povey, Sanjeev Khudanpur

We present PyChain, a fully parallelized PyTorch implementation of end-to-end lattice-free maximum mutual information (LF-MMI) training for the so-called \emph{chain models} in the Kaldi automatic speech recognition (ASR) toolkit.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Speaker Diarization with Region Proposal Network

1 code implementation14 Feb 2020 Zili Huang, Shinji Watanabe, Yusuke Fujita, Paola Garcia, Yiwen Shao, Daniel Povey, Sanjeev Khudanpur

Speaker diarization is an important pre-processing step for many speech applications, and it aims to solve the "who spoke when" problem.

Region Proposal speaker-diarization +1

Espresso: A Fast End-to-end Neural Speech Recognition Toolkit

1 code implementation18 Sep 2019 Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe, Sanjeev Khudanpur

We present Espresso, an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit fairseq.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Cannot find the paper you are looking for? You can Submit a new open access paper.