no code implementations • 3 Mar 2024 • Ravi Shankar, Ke Tan, Buye Xu, Anurag Kumar
Self-supervised learned models have been found to be very effective for certain speech tasks such as automatic speech recognition, speaker identification, keyword spotting and others.
no code implementations • 6 Dec 2023 • Fei Yang, Shuang Peng, Ning Sun, Fangyu Wang, Ke Tan, Fu Wu, Jiezhong Qiu, Aimin Pan
Large language models (LLMs) such as GPT-3, OPT, and LLaMA have demonstrated remarkable accuracy in a wide range of tasks.
no code implementations • 4 Apr 2023 • Anurag Kumar, Ke Tan, Zhaoheng Ni, Pranay Manocha, Xiaohui Zhang, Ethan Henderson, Buye Xu
To enable this, a variety of metrics to measure quality and intelligibility under different assumptions have been developed.
no code implementations • 11 Jan 2023 • Haibin Wu, Ke Tan, Buye Xu, Anurag Kumar, Daniel Wong
By comparing complex- and real-valued versions of fundamental building blocks in the recently developed gated convolutional recurrent network (GCRN), we show how different mechanisms for basic blocks affect the performance.
no code implementations • 16 Nov 2022 • Kuan-Lin Chen, Daniel D. E. Wong, Ke Tan, Buye Xu, Anurag Kumar, Vamsi Krishna Ithapu
During training, our approach augments a model learning complex spectral mapping with a temporary submodel to predict the covariance of the enhancement error at each time-frequency bin.
no code implementations • 8 Oct 2021 • Hassan Taherian, Ke Tan, DeLiang Wang
We further demonstrate the effectiveness of LBT for the separation of four and five concurrent speakers.
1 code implementation • 2 Sep 2020 • Ke Tan, Buye Xu, Anurag Kumar, Eliya Nachmani, Yossi Adi
In addition, our approach effectively preserves the interaural cues, which improves the accuracy of sound localization.
Audio and Speech Processing Sound
no code implementations • 16 Sep 2019 • Ke Tan, Yong Xu, Shi-Xiong Zhang, Meng Yu, Dong Yu
Background noise, interfering speech and room reverberation frequently distort target speech in real listening environments.
Audio and Speech Processing Sound Signal Processing
no code implementations • 11 Mar 2019 • Peidong Wang, Ke Tan, DeLiang Wang
In this study, we analyze the distortion problem, compare different acoustic models, and investigate a distortion-independent training scheme for monaural speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 22 Nov 2018 • Zhong-Qiu Wang, Ke Tan, DeLiang Wang
This study investigates phase reconstruction for deep learning based monaural talker-independent speaker separation in the short-time Fourier transform (STFT) domain.