no code implementations • 14 Feb 2024 • Ruchao Fan, Natarajan Balaji Shanka, Abeer Alwan
UniEnc-CASSNAT consists of only an encoder as the major module, which can be the SFM.
1 code implementation • 2 Jun 2023 • Jinhan Wang, Vijay Ravi, Abeer Alwan
We find that a greater adversarial weight for the initial layers leads to performance improvement.
1 code implementation • 28 Apr 2023 • Ruchao Fan, Yunzheng Zhu, Jinhan Wang, Abeer Alwan
With the proposed methods (E-APC and DRAFT), the relative WER improvements are even larger (30% and 19% on the OGI and MyST data, respectively) when compared to the models without using pretraining methods.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 15 Apr 2023 • Ruchao Fan, Wei Chu, Peng Chang, Abeer Alwan
During inference, an error-based alignment sampling method is investigated in depth to reduce the alignment mismatch in the training and testing processes.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 28 Jun 2022 • Amber Afshan, Abeer Alwan
Using the SITW evaluation tasks, which involve different conversational speech tasks, the proposed loss combined with self-attention conditioning results in significant relative improvements in EER by 2-5% and minDCF by 6-12% over baseline.
no code implementations • 28 Jun 2022 • Amber Afshan, Abeer Alwan
However, self-attentive embeddings perform weighted pooling such that the weights correspond to the importance of the frames in a speaker classification task.
no code implementations • 27 Jun 2022 • Jinhan Wang, Vijay Ravi, Jonathan Flint, Abeer Alwan
To learn instance-spread-out embeddings, we explore methods for sampling instances for a training batch (distinct speaker-based and random sampling).
no code implementations • 20 Jun 2022 • Vijay Ravi, Jinhan Wang, Jonathan Flint, Abeer Alwan
With adversarial training, depression classification improves for every feature when compared to the baseline.
no code implementations • 16 Jun 2022 • Ruchao Fan, Abeer Alwan
However, models trained through SSL are biased to the pretraining data which is usually different from the data used in finetuning tasks, causing a domain shifting problem, and thus resulting in limited knowledge transfer.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 3 Apr 2022 • Alexander Johnson, Kevin Everson, Vijay Ravi, Anissa Gladney, Mari Ostendorf, Abeer Alwan
In this paper, we explore automatic prediction of dialect density of the African American English (AAE) dialect, where dialect density is defined as the percentage of words in an utterance that contain characteristics of the non-standard dialect.
no code implementations • 24 Feb 2022 • Yunzheng Zhu, Ruchao Fan, Abeer Alwan
When data are scarce, the model might overfit to the training data, and hence good starting points for training are essential.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 19 Feb 2022 • Alexander Johnson, Alejandra Martin, Marlen Quintero, Alison Bailey, Abeer Alwan
This paper presents the results of a pilot study that introduces social robots into kindergarten and first-grade classroom tasks.
no code implementations • 19 Feb 2022 • Alexander Johnson, Ruchao Fan, Robin Morris, Abeer Alwan
This paper proposes a novel linear prediction coding-based data aug-mentation method for children's low and zero resource dialect ASR.
no code implementations • 11 Feb 2022 • Vijay Ravi, Jinhan Wang, Jonathan Flint, Abeer Alwan
The improvements for the CONVERGE (Mandarin) dataset when using the x-vector embeddings with CNN as the backend and MFCCs as input features were 9. 32% (validation) and 12. 99% (test).
no code implementations • 18 Jun 2021 • Jinhan Wang, Yunzheng Zhu, Ruchao Fan, Wei Chu, Abeer Alwan
~ 5 hours of transcribed data and ~ 60 hours of untranscribed data are provided to develop a German ASR system for children.
no code implementations • 18 Jun 2021 • Ruchao Fan, Wei Chu, Peng Chang, Jing Xiao, Abeer Alwan
For the analyses, we plot attention weight distributions in the decoders to visualize the relationships between token-level acoustic embeddings.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 18 Feb 2021 • Gary Yeung, Ruchao Fan, Abeer Alwan
Because of the lack of publicly available young child speech data, feature extraction strategies such as feature normalization and data augmentation must be considered to successfully train child ASR systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 12 Feb 2021 • Ruchao Fan, Amber Afshan, Abeer Alwan
We present a bidirectional unsupervised model pre-training (UPT) method and apply it to children's automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 8 Oct 2020 • Trang Tran, Morgan Tinkler, Gary Yeung, Abeer Alwan, Mari Ostendorf
Disfluencies are prevalent in spontaneous speech, as shown in many studies of adult speech.
no code implementations • 8 Aug 2020 • Amber Afshan, Jinxi Guo, Soo Jin Park, Vijay Ravi, Alan McCree, Abeer Alwan
For instance, when enrolled with conversation utterances, the EER increased to 3. 03%, 2. 96% and 22. 12% when tested on read, narrative, and pet-directed speech, respectively.
no code implementations • 8 Aug 2020 • Vijay Ravi, Ruchao Fan, Amber Afshan, Huanhua Lu, Abeer Alwan
A fusion of the x-vector/PLDA baseline and the SID/PLDA scores prior to PID fusion further improved performance by 15% indicating complementarity of the proposed approach to the x-vector system.
no code implementations • 8 Aug 2020 • Amber Afshan, Jody Kreiman, Abeer Alwan
Native listeners performed better than machines in the style-matched conditions (EERs of 6. 96% versus 14. 35% for read speech, and 15. 12% versus 19. 87%, for conversations), but for style-mismatched conditions, there was no significant difference between native listeners and machines.
no code implementations • 29 Dec 2019 • Thomas Drugman, Paavo Alku, Abeer Alwan, Bayya Yegnanarayana
The great majority of current voice technology applications relies on acoustic features characterizing the vocal tract response, such as the widely used MFCC of LPC parameters.
no code implementations • 28 Dec 2019 • Thomas Drugman, Abeer Alwan
This paper focuses on the problem of pitch tracking in noisy conditions.
no code implementations • 16 Oct 2018 • Jinxi Guo, Ning Xu, Kailun Qian, Yang Shi, Kaiyuan Xu, Ying-Nian Wu, Abeer Alwan
Experimental results using the NIST SRE 2010 dataset show that both methods provide significant improvement and result in a max of 28. 43% relative improvement in Equal Error Rates from a baseline system, when using deep encoder with residual blocks and adding an additional phoneme vector.