Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines.
no code implementations • 9 Oct 2021 • Xuankai Chang, Takashi Maekaku, Pengcheng Guo, Jing Shi, Yen-Ju Lu, Aswin Shanmugam Subramanian, Tianzi Wang, Shu-wen Yang, Yu Tsao, Hung-Yi Lee, Shinji Watanabe
We select several pretrained speech representations and present the experimental results on various open-source and publicly available corpora for E2E-ASR.
Non-autoregressive (NAR) modeling has gained more and more attention in speech processing.
We propose a system to concatenate audio segmentation and non-autoregressive ASR to realize high accuracy and low RTF ASR.
Then, we show the effect of emotion on speaker recognition.