Search Results for author: Shaojin Ding

Found 10 papers, 4 papers with code

Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition

no code implementations8 Apr 2022 Shaojin Ding, Rajeev Rikhye, Qiao Liang, Yanzhang He, Quan Wang, Arun Narayanan, Tom O'Malley, Ian McGraw

Personalization of on-device speech recognition (ASR) has seen explosive growth in recent years, largely due to the increasing popularity of personal assistant features on mobile devices and smart home speakers.

Action Detection Activity Detection +2

4-bit Conformer with Native Quantization Aware Training for Speech Recognition

no code implementations29 Mar 2022 Shaojin Ding, Phoenix Meadowlark, Yanzhang He, Lukasz Lew, Shivani Agrawal, Oleg Rybakov

Reducing the latency and model size has always been a significant research problem for live Automatic Speech Recognition (ASR) application scenarios.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Towards Lifelong Learning of Multilingual Text-To-Speech Synthesis

1 code implementation9 Oct 2021 Mu Yang, Shaojin Ding, Tianlong Chen, Tong Wang, Zhangyang Wang

This work presents a lifelong learning approach to train a multilingual Text-To-Speech (TTS) system, where each language was seen as an individual task and was learned sequentially and continually.

Speech Synthesis Text-To-Speech Synthesis

Audio Lottery: Speech Recognition Made Ultra-Lightweight, Noise-Robust, and Transferable

no code implementations ICLR 2022 Shaojin Ding, Tianlong Chen, Zhangyang Wang

In this paper, we investigate the tantalizing possibility of using lottery ticket hypothesis to discover lightweight speech recognition models, that are (1) robust to various noise existing in speech; (2) transferable to fit the open-world personalization; and 3) compatible with structured sparsity.

speech-recognition Speech Recognition

Textual Echo Cancellation

no code implementations13 Aug 2020 Shaojin Ding, Ye Jia, Ke Hu, Quan Wang

In this paper, we propose Textual Echo Cancellation (TEC) - a framework for cancelling the text-to-speech (TTS) playback echo from overlapping speech recordings.

Acoustic echo cancellation speech-recognition +1

AutoSpeech: Neural Architecture Search for Speaker Recognition

3 code implementations7 May 2020 Shaojin Ding, Tianlong Chen, Xinyu Gong, Weiwei Zha, Zhangyang Wang

Speaker recognition systems based on Convolutional Neural Networks (CNNs) are often built with off-the-shelf backbones such as VGG-Net or ResNet.

Image Classification Neural Architecture Search +3

Personal VAD: Speaker-Conditioned Voice Activity Detection

2 code implementations12 Aug 2019 Shaojin Ding, Quan Wang, Shuo-Yiin Chang, Li Wan, Ignacio Lopez Moreno

In this paper, we propose "personal VAD", a system to detect the voice activity of a target speaker at the frame level.

Action Detection Activity Detection +4

Cannot find the paper you are looking for? You can Submit a new open access paper.