Search Results for author: Yiwen Shao

Found 12 papers, 3 papers with code

Advancing Multi-talker ASR Performance with Large Language Models

no code implementations30 Aug 2024 Mohan Shi, Zengrui Jin, Yaoxun Xu, Yong Xu, Shi-Xiong Zhang, Kun Wei, Yiwen Shao, Chunlei Zhang, Dong Yu

Recognizing overlapping speech from multiple speakers in conversational scenarios is one of the most challenging problem for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Co-benefits of Agricultural Diversification and Technology for Food and Nutrition Security in China

no code implementations1 Jul 2024 Thomas Cherico Wanger, Estelle Raveloaritiana, Siyan Zeng, Haixiu Gao, Xueqing He, Yiwen Shao, Panlong Wu, Kris A. G. Wyckhuys, Wenwu Zhou, Yi Zou, Zengrong Zhu, Ling Li, Haiyan Cen, Yunhui Liu, Shenggen Fan

While a strategic choice and management of crop and livestock species can improve nutrition, the environmental and production benefits of agricultural diversification are currently not well interlinked with Chinas food and nutrition security discussions.

Nutrition

Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment

no code implementations13 Jun 2024 Yiwen Shao, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu, Daniel Povey, Sanjeev Khudanpur

In the field of multi-channel, multi-speaker Automatic Speech Recognition (ASR), the task of discerning and accurately transcribing a target speaker's speech within background noise remains a formidable challenge.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

UniX-Encoder: A Universal $X$-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing

no code implementations25 Oct 2023 Zili Huang, Yiwen Shao, Shi-Xiong Zhang, Dong Yu

2) Multi-Task Capability: Beyond the single-task focus of previous systems, UniX-Encoder acts as a robust upstream model, adeptly extracting features for diverse tasks including ASR and speaker recognition.

speaker-diarization Speaker Diarization +3

Challenges and Insights: Exploring 3D Spatial Features and Complex Networks on the MISP Dataset

no code implementations5 Oct 2023 Yiwen Shao

Multi-channel multi-talker speech recognition presents formidable challenges in the realm of speech processing, marked by issues such as background noise, reverberation, and overlapping speech.

speech-recognition Speech Recognition

Multi-Channel Multi-Speaker ASR Using 3D Spatial Feature

no code implementations22 Nov 2021 Yiwen Shao, Shi-Xiong Zhang, Dong Yu

Experimental results show that 1) the proposed ALL-In-One model achieved a comparable error rate to the pipelined system while reducing the inference time by half; 2) the proposed 3D spatial feature significantly outperformed (31\% CERR) all previous works of using the 1D directional information in both paradigms.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Adversarial Attacks and Defenses for Speech Recognition Systems

no code implementations31 Mar 2021 Piotr Żelasko, Sonal Joshi, Yiwen Shao, Jesus Villalba, Jan Trmal, Najim Dehak, Sanjeev Khudanpur

We investigate two threat models: a denial-of-service scenario where fast gradient-sign method (FGSM) or weak projected gradient descent (PGD) attacks are used to degrade the model's word error rate (WER); and a targeted scenario where a more potent imperceptible attack forces the system to recognize a specific phrase.

Adversarial Robustness Automatic Speech Recognition +3

PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR

1 code implementation20 May 2020 Yiwen Shao, Yiming Wang, Daniel Povey, Sanjeev Khudanpur

We present PyChain, a fully parallelized PyTorch implementation of end-to-end lattice-free maximum mutual information (LF-MMI) training for the so-called \emph{chain models} in the Kaldi automatic speech recognition (ASR) toolkit.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Speaker Diarization with Region Proposal Network

1 code implementation14 Feb 2020 Zili Huang, Shinji Watanabe, Yusuke Fujita, Paola Garcia, Yiwen Shao, Daniel Povey, Sanjeev Khudanpur

Speaker diarization is an important pre-processing step for many speech applications, and it aims to solve the "who spoke when" problem.

Region Proposal speaker-diarization +1

Espresso: A Fast End-to-end Neural Speech Recognition Toolkit

1 code implementation18 Sep 2019 Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe, Sanjeev Khudanpur

We present Espresso, an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit fairseq.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Cannot find the paper you are looking for? You can Submit a new open access paper.