no code implementations • 13 Dec 2023 • Shaojin Ding, David Qiu, David Rim, Yanzhang He, Oleg Rybakov, Bo Li, Rohit Prabhavalkar, Weiran Wang, Tara N. Sainath, Zhonglin Han, Jian Li, Amir Yazdanbakhsh, Shivani Agrawal
We conducted extensive experiments with a 2-billion parameter USM on a large-scale voice search dataset to evaluate our proposed method.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 26 May 2023 • Oleg Rybakov, Phoenix Meadowlark, Shaojin Ding, David Qiu, Jian Li, David Rim, Yanzhang He
With the large-scale training data, we obtain a 2-bit Conformer model with over 40% model size reduction against the 4-bit version at the cost of 17% relative word error rate degradation
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 24 May 2023 • David Qiu, David Rim, Shaojin Ding, Oleg Rybakov, Yanzhang He
With the rapid increase in the size of neural networks, model compression has become an important area of research.
no code implementations • 15 Mar 2023 • Steven M. Hernandez, Ding Zhao, Shaojin Ding, Antoine Bruguier, Rohit Prabhavalkar, Tara N. Sainath, Yanzhang He, Ian McGraw
Such a model allows us to achieve always-on ambient speech recognition on edge devices with low-memory neural processors.
no code implementations • 13 Apr 2022 • Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang, Rina Panigrahy, Qiao Liang, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman
In this paper, we propose a dynamic cascaded encoder Automatic Speech Recognition (ASR) model, which unifies models for different deployment scenarios.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 8 Apr 2022 • Shaojin Ding, Rajeev Rikhye, Qiao Liang, Yanzhang He, Quan Wang, Arun Narayanan, Tom O'Malley, Ian McGraw
Personalization of on-device speech recognition (ASR) has seen explosive growth in recent years, largely due to the increasing popularity of personal assistant features on mobile devices and smart home speakers.
1 code implementation • 29 Mar 2022 • Shaojin Ding, Phoenix Meadowlark, Yanzhang He, Lukasz Lew, Shivani Agrawal, Oleg Rybakov
Reducing the latency and model size has always been a significant research problem for live Automatic Speech Recognition (ASR) application scenarios.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 9 Oct 2021 • Mu Yang, Shaojin Ding, Tianlong Chen, Tong Wang, Zhangyang Wang
This work presents a lifelong learning approach to train a multilingual Text-To-Speech (TTS) system, where each language was seen as an individual task and was learned sequentially and continually.
no code implementations • ICLR 2022 • Shaojin Ding, Tianlong Chen, Zhangyang Wang
In this paper, we investigate the tantalizing possibility of using lottery ticket hypothesis to discover lightweight speech recognition models, that are (1) robust to various noise existing in speech; (2) transferable to fit the open-world personalization; and 3) compatible with structured sparsity.
no code implementations • 13 Aug 2020 • Shaojin Ding, Ye Jia, Ke Hu, Quan Wang
In this paper, we propose Textual Echo Cancellation (TEC) - a framework for cancelling the text-to-speech (TTS) playback echo from overlapping speech recordings.
3 code implementations • 7 May 2020 • Shaojin Ding, Tianlong Chen, Xinyu Gong, Weiwei Zha, Zhangyang Wang
Speaker recognition systems based on Convolutional Neural Networks (CNNs) are often built with off-the-shelf backbones such as VGG-Net or ResNet.
Ranked #8 on
Speaker Identification
on VoxCeleb1
(using extra training data)
2 code implementations • 12 Aug 2019 • Shaojin Ding, Quan Wang, Shuo-Yiin Chang, Li Wan, Ignacio Lopez Moreno
In this paper, we propose "personal VAD", a system to detect the voice activity of a target speaker at the frame level.
4 code implementations • ICCV 2019 • Tianlong Chen, Shaojin Ding, Jingyi Xie, Ye Yuan, Wuyang Chen, Yang Yang, Zhou Ren, Zhangyang Wang
Attention mechanism has been shown to be effective for person re-identification (Re-ID).
Ranked #15 on
Person Re-Identification
on Market-1501-C