no code implementations • 15 Mar 2023 • Steven M. Hernandez, Ding Zhao, Shaojin Ding, Antoine Bruguier, Rohit Prabhavalkar, Tara N. Sainath, Yanzhang He, Ian McGraw
Such a model allows us to achieve always-on ambient speech recognition on edge devices with low-memory neural processors.
no code implementations • 13 Apr 2022 • Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang, Rina Panigrahy, Qiao Liang, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman
In this paper, we propose a dynamic cascaded encoder Automatic Speech Recognition (ASR) model, which unifies models for different deployment scenarios.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 8 Apr 2022 • Shaojin Ding, Rajeev Rikhye, Qiao Liang, Yanzhang He, Quan Wang, Arun Narayanan, Tom O'Malley, Ian McGraw
Personalization of on-device speech recognition (ASR) has seen explosive growth in recent years, largely due to the increasing popularity of personal assistant features on mobile devices and smart home speakers.
no code implementations • 29 Mar 2022 • Shaojin Ding, Phoenix Meadowlark, Yanzhang He, Lukasz Lew, Shivani Agrawal, Oleg Rybakov
Reducing the latency and model size has always been a significant research problem for live Automatic Speech Recognition (ASR) application scenarios.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 9 Oct 2021 • Mu Yang, Shaojin Ding, Tianlong Chen, Tong Wang, Zhangyang Wang
This work presents a lifelong learning approach to train a multilingual Text-To-Speech (TTS) system, where each language was seen as an individual task and was learned sequentially and continually.
no code implementations • ICLR 2022 • Shaojin Ding, Tianlong Chen, Zhangyang Wang
In this paper, we investigate the tantalizing possibility of using lottery ticket hypothesis to discover lightweight speech recognition models, that are (1) robust to various noise existing in speech; (2) transferable to fit the open-world personalization; and 3) compatible with structured sparsity.
no code implementations • 13 Aug 2020 • Shaojin Ding, Ye Jia, Ke Hu, Quan Wang
In this paper, we propose Textual Echo Cancellation (TEC) - a framework for cancelling the text-to-speech (TTS) playback echo from overlapping speech recordings.
3 code implementations • 7 May 2020 • Shaojin Ding, Tianlong Chen, Xinyu Gong, Weiwei Zha, Zhangyang Wang
Speaker recognition systems based on Convolutional Neural Networks (CNNs) are often built with off-the-shelf backbones such as VGG-Net or ResNet.
Ranked #6 on
Speaker Identification
on VoxCeleb1
2 code implementations • 12 Aug 2019 • Shaojin Ding, Quan Wang, Shuo-Yiin Chang, Li Wan, Ignacio Lopez Moreno
In this paper, we propose "personal VAD", a system to detect the voice activity of a target speaker at the frame level.
5 code implementations • ICCV 2019 • Tianlong Chen, Shaojin Ding, Jingyi Xie, Ye Yuan, Wuyang Chen, Yang Yang, Zhou Ren, Zhangyang Wang
Attention mechanism has been shown to be effective for person re-identification (Re-ID).
Ranked #17 on
Person Re-Identification
on Market-1501-C