no code implementations • 17 Oct 2023 • Shahram Ghorbani, John H. L. Hansen
In this study, embeddings from advanced pre-trained language identification (LID) and speaker identification (SID) models are leveraged to improve the accuracy of accent classification and non-native accentedness assessment.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 17 Jul 2020 • Vinay Kothapally, Wei Xia, Shahram Ghorbani, John H. L. Hansen, Wei Xue, Jing Huang
The reliability of using fully convolutional networks (FCNs) has been successfully demonstrated by recent studies in many speech applications.
no code implementations • 6 Jan 2020 • Jianwei Yu, Shi-Xiong Zhang, Jian Wu, Shahram Ghorbani, Bo Wu, Shiyin Kang, Shansong Liu, Xunying Liu, Helen Meng, Dong Yu
Experiments on overlapped speech simulated from the LRS2 dataset suggest the proposed AVSR system outperformed the audio only baseline LF-MMI DNN system by up to 29. 98\% absolute in word error rate (WER) reduction, and produced recognition performance comparable to a more complex pipelined system.
Ranked #4 on Audio-Visual Speech Recognition on LRS2
Audio-Visual Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 1 Oct 2019 • Shahram Ghorbani, Soheil Khorram, John H. L. Hansen
An obvious approach to leverage data from a new domain (e. g., new accented speech) is to first generate a comprehensive dataset of all domains, by combining all available data, and then use this dataset to retrain the acoustic models.