Search Results for author: Shuo-Yiin Chang

Found 20 papers, 4 papers with code

Towards General-Purpose Text-Instruction-Guided Voice Conversion

no code implementations25 Sep 2023 Chun-Yi Kuan, Chen An Li, Tsu-Yuan Hsu, Tse-Yang Lin, Ho-Lam Chung, Kai-Wei Chang, Shuo-Yiin Chang, Hung-Yi Lee

This paper introduces a novel voice conversion (VC) model, guided by text instructions such as "articulate slowly with a deep tone" or "speak in a cheerful boyish voice".

Text Injection for Capitalization and Turn-Taking Prediction in Speech Models

no code implementations14 Aug 2023 Shaan Bijwadia, Shuo-Yiin Chang, Weiran Wang, Zhong Meng, Hao Zhang, Tara N. Sainath

Text injection for automatic speech recognition (ASR), wherein unpaired text-only data is used to supplement paired audio-text data, has shown promising improvements for word error rate.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR

no code implementations28 May 2023 W. Ronny Huang, Hao Zhang, Shankar Kumar, Shuo-Yiin Chang, Tara N. Sainath

We address this limitation by distilling punctuation knowledge from a bidirectional teacher language model (LM) trained on written, punctuated text.

Language Modelling Semantic Segmentation

UML: A Universal Monolingual Output Layer for Multilingual ASR

no code implementations22 Feb 2023 Chao Zhang, Bo Li, Tara N. Sainath, Trevor Strohman, Shuo-Yiin Chang

Consequently, the UML enables to switch in the interpretation of each output node depending on the language of the input speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems

no code implementations1 Nov 2022 Shaan Bijwadia, Shuo-Yiin Chang, Bo Li, Tara Sainath, Chao Zhang, Yanzhang He

In this work, we propose a method to jointly train the ASR and EP tasks in a single end-to-end (E2E) multitask model, improving EP quality by optionally leveraging information from the ASR audio encoder.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification

no code implementations13 Sep 2022 Chao Zhang, Bo Li, Tara Sainath, Trevor Strohman, Sepand Mavandadi, Shuo-Yiin Chang, Parisa Haghani

Language identification is critical for many downstream tasks in automatic speech recognition (ASR), and is beneficial to integrate into multilingual end-to-end ASR as an additional task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Streaming Intended Query Detection using E2E Modeling for Continued Conversation

no code implementations29 Aug 2022 Shuo-Yiin Chang, Guru Prakash, Zelin Wu, Qiao Liang, Tara N. Sainath, Bo Li, Adam Stambler, Shyam Upadhyay, Manaal Faruqui, Trevor Strohman

In voice-enabled applications, a predetermined hotword isusually used to activate a device in order to attend to the query. However, speaking queries followed by a hotword each timeintroduces a cognitive burden in continued conversations.

Turn-Taking Prediction for Natural Conversational Speech

no code implementations29 Aug 2022 Shuo-Yiin Chang, Bo Li, Tara N. Sainath, Chao Zhang, Trevor Strohman, Qiao Liang, Yanzhang He

This makes doing speech recognition with conversational speech, including one with multiple queries, a challenging task.

speech-recognition Speech Recognition

E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR

no code implementations22 Apr 2022 W. Ronny Huang, Shuo-Yiin Chang, David Rybach, Rohit Prabhavalkar, Tara N. Sainath, Cyril Allauzen, Cal Peyser, Zhiyun Lu

Improving the performance of end-to-end ASR models on long utterances ranging from minutes to hours in length is an ongoing challenge in speech recognition.

speech-recognition Speech Recognition

Improving the fusion of acoustic and text representations in RNN-T

no code implementations25 Jan 2022 Chao Zhang, Bo Li, Zhiyun Lu, Tara N. Sainath, Shuo-Yiin Chang

The recurrent neural network transducer (RNN-T) has recently become the mainstream end-to-end approach for streaming automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

A Better and Faster End-to-End Model for Streaming ASR

no code implementations21 Nov 2020 Bo Li, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han, Qiao Liang, Yu Zhang, Trevor Strohman, Yonghui Wu

To address this, we explore replacing the LSTM layers in the encoder of our E2E model with Conformer layers [4], which has shown good improvements for ASR.

Audio and Speech Processing Sound

FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization

1 code implementation21 Oct 2020 Jiahui Yu, Chung-Cheng Chiu, Bo Li, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han, Anmol Gulati, Yonghui Wu, Ruoming Pang

FastEmit also improves streaming ASR accuracy from 4. 4%/8. 9% to 3. 1%/7. 5% WER, meanwhile reduces 90th percentile latency from 210 ms to only 30 ms on LibriSpeech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Towards Fast and Accurate Streaming End-to-End ASR

no code implementations24 Apr 2020 Bo Li, Shuo-Yiin Chang, Tara N. Sainath, Ruoming Pang, Yanzhang He, Trevor Strohman, Yonghui Wu

RNN-T EP+LAS, together with MWER training brings in 18. 7% relative WER reduction and 160ms 90-percentile latency reductions compared to the original proposed RNN-T EP model.

Audio and Speech Processing

A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency

no code implementations28 Mar 2020 Tara N. Sainath, Yanzhang He, Bo Li, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-Yiin Chang, Wei Li, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alex Gruenstein, Ke Hu, Minho Jin, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirko Visontai, Yonghui Wu, Yu Zhang, Ding Zhao

Thus far, end-to-end (E2E) models have not been shown to outperform state-of-the-art conventional models with respect to both quality, i. e., word error rate (WER), and latency, i. e., the time the hypothesis is finalized after the user stops speaking.

On Neural Phone Recognition of Mixed-Source ECoG Signals

no code implementations12 Dec 2019 Ahmed Hussen Abdelaziz, Shuo-Yiin Chang, Nelson Morgan, Erik Edwards, Dorothea Kolossa, Dan Ellis, David A. Moses, Edward F. Chang

The emerging field of neural speech recognition (NSR) using electrocorticography has recently attracted remarkable research interest for studying how human brains recognize speech in quiet and noisy surroundings.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Personal VAD: Speaker-Conditioned Voice Activity Detection

2 code implementations12 Aug 2019 Shaojin Ding, Quan Wang, Shuo-Yiin Chang, Li Wan, Ignacio Lopez Moreno

In this paper, we propose "personal VAD", a system to detect the voice activity of a target speaker at the frame level.

Action Detection Activity Detection +4

Deep Learning for Audio Signal Processing

1 code implementation30 Apr 2019 Hendrik Purwins, Bo Li, Tuomas Virtanen, Jan Schlüter, Shuo-Yiin Chang, Tara Sainath

Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing.

Audio Signal Processing Automatic Speech Recognition +5

Cannot find the paper you are looking for? You can Submit a new open access paper.