Search Results for author: Abeer Alwan

Found 25 papers, 2 papers with code

UniEnc-CASSNAT: An Encoder-only Non-autoregressive ASR for Speech SSL Models

no code implementations • 14 Feb 2024 • Ruchao Fan, Natarajan Balaji Shanka, Abeer Alwan

UniEnc-CASSNAT consists of only an encoder as the major module, which can be the SFM.

Automatic Speech Recognition speech-recognition +1

Paper
Add Code

Non-uniform Speaker Disentanglement For Depression Detection From Raw Speech Signals

1 code implementation • 2 Jun 2023 • Jinhan Wang, Vijay Ravi, Abeer Alwan

We find that a greater adversarial weight for the initial layers leads to performance improvement.

Ranked #1 on Depression Detection on Distress Analysis Interview Corpus/Wizard-of-Oz set (DAIC-WOZ)

Depression Detection Disentanglement +2

Paper
Code

Towards Better Domain Adaptation for Self-supervised Models: A Case Study of Child ASR

1 code implementation • 28 Apr 2023 • Ruchao Fan, Yunzheng Zhu, Jinhan Wang, Abeer Alwan

With the proposed methods (E-APC and DRAFT), the relative WER improvements are even larger (30% and 19% on the OGI and MyST data, respectively) when compared to the models without using pretraining methods.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Code

A CTC Alignment-based Non-autoregressive Transformer for End-to-end Automatic Speech Recognition

no code implementations • 15 Apr 2023 • Ruchao Fan, Wei Chu, Peng Chang, Abeer Alwan

During inference, an error-based alignment sampling method is investigated in depth to reduce the alignment mismatch in the training and testing processes.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Learning from human perception to improve automatic speaker verification in style-mismatched conditions

no code implementations • 28 Jun 2022 • Amber Afshan, Abeer Alwan

Using the SITW evaluation tasks, which involve different conversational speech tasks, the proposed loss combined with self-attention conditioning results in significant relative improvements in EER by 2-5% and minDCF by 6-12% over baseline.

Speaker Verification

Paper
Add Code

Attention-based conditioning methods using variable frame rate for style-robust speaker verification

no code implementations • 28 Jun 2022 • Amber Afshan, Abeer Alwan

However, self-attentive embeddings perform weighted pooling such that the weights correspond to the importance of the frames in a speaker classification task.

Text-Independent Speaker Verification

Paper
Add Code

Unsupervised Instance Discriminative Learning for Depression Detection from Speech Signals

no code implementations • 27 Jun 2022 • Jinhan Wang, Vijay Ravi, Jonathan Flint, Abeer Alwan

To learn instance-spread-out embeddings, we explore methods for sampling instances for a training batch (distinct speaker-based and random sampling).

Data Augmentation Depression Detection +1

Paper
Add Code

A Step Towards Preserving Speakers' Identity While Detecting Depression Via Speaker Disentanglement

no code implementations • 20 Jun 2022 • Vijay Ravi, Jinhan Wang, Jonathan Flint, Abeer Alwan

With adversarial training, depression classification improves for every feature when compared to the baseline.

Depression Detection Disentanglement

Paper
Add Code

DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised Learning and Its Application to Children's ASR

no code implementations • 16 Jun 2022 • Ruchao Fan, Abeer Alwan

However, models trained through SSL are biased to the pretraining data which is usually different from the data used in finetuning tasks, causing a domain shifting problem, and thus resulting in limited knowledge transfer.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Automatic Dialect Density Estimation for African American English

no code implementations • 3 Apr 2022 • Alexander Johnson, Kevin Everson, Vijay Ravi, Anissa Gladney, Mari Ostendorf, Abeer Alwan

In this paper, we explore automatic prediction of dialect density of the African American English (AAE) dialect, where dialect density is defined as the percentage of words in an utterance that contain characteristics of the non-standard dialect.

Density Estimation Language Modelling

Paper
Add Code

Towards Better Meta-Initialization with Task Augmentation for Kindergarten-aged Speech Recognition

no code implementations • 24 Feb 2022 • Yunzheng Zhu, Ruchao Fan, Abeer Alwan

When data are scarce, the model might overfit to the training data, and hence good starting points for training are essential.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Can Social Robots Effectively Elicit Curiosity in STEM Topics from K-1 Students During Oral Assessments?

no code implementations • 19 Feb 2022 • Alexander Johnson, Alejandra Martin, Marlen Quintero, Alison Bailey, Abeer Alwan

This paper presents the results of a pilot study that introduces social robots into kindergarten and first-grade classroom tasks.

Paper
Add Code

LPC Augment: An LPC-Based ASR Data Augmentation Algorithm for Low and Zero-Resource Children's Dialects

no code implementations • 19 Feb 2022 • Alexander Johnson, Ruchao Fan, Robin Morris, Abeer Alwan

This paper proposes a novel linear prediction coding-based data aug-mentation method for children's low and zero resource dialect ASR.

Data Augmentation

Paper
Add Code

FrAUG: A Frame Rate Based Data Augmentation Method for Depression Detection from Speech Signals

no code implementations • 11 Feb 2022 • Vijay Ravi, Jinhan Wang, Jonathan Flint, Abeer Alwan

The improvements for the CONVERGE (Mandarin) dataset when using the x-vector embeddings with CNN as the backend and MFCCs as input features were 9. 32% (validation) and 12. 99% (test).

Data Augmentation Depression Detection

Paper
Add Code

Low Resource German ASR with Untranscribed Data Spoken by Non-native Children -- INTERSPEECH 2021 Shared Task SPAPL System

no code implementations • 18 Jun 2021 • Jinhan Wang, Yunzheng Zhu, Ruchao Fan, Wei Chu, Abeer Alwan

~ 5 hours of transcribed data and ~ 60 hours of untranscribed data are provided to develop a German ASR system for children.

Acoustic Modelling Automatic Speech Recognition +4

Paper
Add Code

An Improved Single Step Non-autoregressive Transformer for Automatic Speech Recognition

no code implementations • 18 Jun 2021 • Ruchao Fan, Wei Chu, Peng Chang, Jing Xiao, Abeer Alwan

For the analyses, we plot attention weight distributions in the decoders to visualize the relationships between token-level acoustic embeddings.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Fundamental Frequency Feature Normalization and Data Augmentation for Child Speech Recognition

no code implementations • 18 Feb 2021 • Gary Yeung, Ruchao Fan, Abeer Alwan

Because of the lack of publicly available young child speech data, feature extraction strategies such as feature normalization and data augmentation must be considered to successfully train child ASR systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Bi-APC: Bidirectional Autoregressive Predictive Coding for Unsupervised Pre-training and Its Application to Children's ASR

no code implementations • 12 Feb 2021 • Ruchao Fan, Amber Afshan, Abeer Alwan

We present a bidirectional unsupervised model pre-training (UPT) method and apply it to children's automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Analysis of Disfluency in Children's Speech

no code implementations • 8 Oct 2020 • Trang Tran, Morgan Tinkler, Gary Yeung, Abeer Alwan, Mari Ostendorf

Disfluencies are prevalent in spontaneous speech, as shown in many studies of adult speech.

Paper
Add Code

Variable frame rate-based data augmentation to handle speaking-style variability for automatic speaker verification

no code implementations • 8 Aug 2020 • Amber Afshan, Jinxi Guo, Soo Jin Park, Vijay Ravi, Alan McCree, Abeer Alwan

For instance, when enrolled with conversation utterances, the EER increased to 3. 03%, 2. 96% and 22. 12% when tested on read, narrative, and pet-directed speech, respectively.

Data Augmentation Speaker Verification

Paper
Add Code

Exploring the Use of an Unsupervised Autoregressive Model as a Shared Encoder for Text-Dependent Speaker Verification

no code implementations • 8 Aug 2020 • Vijay Ravi, Ruchao Fan, Amber Afshan, Huanhua Lu, Abeer Alwan

A fusion of the x-vector/PLDA baseline and the SID/PLDA scores prior to PID fusion further improved performance by 15% indicating complementarity of the proposed approach to the x-vector system.

Text-Dependent Speaker Verification

Paper
Add Code

Speaker discrimination in humans and machines: Effects of speaking style variability

no code implementations • 8 Aug 2020 • Amber Afshan, Jody Kreiman, Abeer Alwan

Native listeners performed better than machines in the style-matched conditions (EERs of 6. 96% versus 14. 35% for read speech, and 15. 12% versus 19. 87%, for conversations), but for style-mismatched conditions, there was no significant difference between native listeners and machines.

Speaker Verification

Paper
Add Code

Glottal Source Processing: from Analysis to Applications

no code implementations • 29 Dec 2019 • Thomas Drugman, Paavo Alku, Abeer Alwan, Bayya Yegnanarayana

The great majority of current voice technology applications relies on acoustic features characterizing the vocal tract response, such as the widely used MFCC of LPC parameters.

Paper
Add Code

Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics

no code implementations • 28 Dec 2019 • Thomas Drugman, Abeer Alwan

This paper focuses on the problem of pitch tracking in noisy conditions.

Paper
Add Code

Deep neural network based i-vector mapping for speaker verification using short utterances

no code implementations • 16 Oct 2018 • Jinxi Guo, Ning Xu, Kailun Qian, Yang Shi, Kaiyuan Xu, Ying-Nian Wu, Abeer Alwan

Experimental results using the NIST SRE 2010 dataset show that both methods provide significant improvement and result in a max of 28. 43% relative improvement in Equal Error Rates from a baseline system, when using deep encoder with residual blocks and adding an additional phoneme vector.

Speaker Recognition Speaker Verification +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.