Search Results for author: Carol Espy-Wilson

Found 22 papers, 2 papers with code

A multi-modal approach for identifying schizophrenia using cross-modal attention

no code implementations • 26 Sep 2023 • Gowtham Premananth, Yashish M. Siriwardena, Philip Resnik, Carol Espy-Wilson

This study focuses on how different modalities of human communication can be used to distinguish between healthy controls and subjects with schizophrenia who exhibit strong positive symptoms.

Paper
Add Code

Improving Speech Inversion Through Self-Supervised Embeddings and Enhanced Tract Variables

no code implementations • 17 Sep 2023 • Ahmed Adel Attia, Yashish M. Siriwardena, Carol Espy-Wilson

The performance of deep learning models depends significantly on their capacity to encode input features efficiently and decode them into meaningful outputs.

Self-Supervised Learning

Paper
Add Code

Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults

no code implementations • 12 Sep 2023 • Ahmed Adel Attia, Jing Liu, Wei Ai, Dorottya Demszky, Carol Espy-Wilson

Recent advancements in Automatic Speech Recognition (ASR) systems, exemplified by Whisper, have demonstrated the potential of these systems to approach human-level performance given sufficient data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Speaker-independent Speech Inversion for Estimation of Nasalance

1 code implementation • 31 May 2023 • Yashish M. Siriwardena, Carol Espy-Wilson, Suzanne Boyce, Mark K. Tiede, Liran Oren

Nasalance is an objective measure derived from the oral and nasal acoustic signals that correlate with nasality.

Paper
Code

Acoustic-to-Articulatory Speech Inversion Features for Mispronunciation Detection of /r/ in Child Speech Sound Disorders

no code implementations • 25 May 2023 • Nina R Benway, Yashish M Siriwardena, Jonathan L Preston, Elaine Hitchcock, Tara McAllister, Carol Espy-Wilson

Acoustic-to-articulatory speech inversion could enhance automated clinical mispronunciation detection to provide detailed articulatory feedback unattainable by formant-based mispronunciation detection algorithms; however, it is unclear the extent to which a speech inversion system trained on adult speech performs in the context of (1) child and (2) clinical speech.

Paper
Add Code

Learning to Compute the Articulatory Representations of Speech with the MIRRORNET

no code implementations • 29 Oct 2022 • Yashish M. Siriwardena, Carol Espy-Wilson, Shihab Shamma

Most organisms including humans function by coordinating and integrating sensory signals with motor actions to survive and accomplish desired tasks.

Paper
Add Code

The Secret Source : Incorporating Source Features to Improve Acoustic-to-Articulatory Speech Inversion

no code implementations • 29 Oct 2022 • Yashish M. Siriwardena, Carol Espy-Wilson

The proposed SI system with the HPRC dataset gains an improvement of close to 28% when the source features are used as additional targets.

Paper
Add Code

Masked Autoencoders Are Articulatory Learners

1 code implementation • 27 Oct 2022 • Ahmed Adel Attia, Carol Espy-Wilson

Articulatory recordings track the positions and motion of different articulators along the vocal tract and are widely used to study speech production and to develop speech technologies such as articulatory based speech synthesizers and speech inversion systems.

Paper
Code

An Empirical Analysis on the Vulnerabilities of End-to-End Speech Segregation Models

no code implementations • 20 Jun 2022 • Rahil Parikh, Gaspar Rochette, Carol Espy-Wilson, Shihab Shamma

Knowing that harmonicity is a critical cue for these networks to group sources, in this work, we perform a thorough investigation on ConvTasnet and DPT-Net to analyze how they perform a harmonic analysis of the input mixture.

Transfer Learning

Paper
Add Code

Acoustic-to-articulatory Speech Inversion with Multi-task Learning

no code implementations • 27 May 2022 • Yashish M. Siriwardena, Ganesh Sivaraman, Carol Espy-Wilson

Multi-task learning (MTL) frameworks have proven to be effective in diverse speech related tasks like automatic speech recognition (ASR) and speech emotion recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Audio Data Augmentation for Acoustic-to-articulatory Speech Inversion using Bidirectional Gated RNNs

no code implementations • 25 May 2022 • Yashish M. Siriwardena, Ahmed Adel Attia, Ganesh Sivaraman, Carol Espy-Wilson

In this work, we compare and contrast different ways of doing data augmentation and show how this technique improves the performance of articulatory speech inversion not only on noisy speech, but also on clean speech data.

Data Augmentation

Paper
Add Code

Acoustic To Articulatory Speech Inversion Using Multi-Resolution Spectro-Temporal Representations Of Speech Signals

no code implementations • 11 Mar 2022 • Rahil Parikh, Nadee Seneviratne, Ganesh Sivaraman, Shihab Shamma, Carol Espy-Wilson

We used U. of Wisconsin X-ray Microbeam (XRMB) database of clean speech signals to train a feed-forward deep neural network (DNN) to estimate articulatory trajectories of six tract variables.

Paper
Add Code

Harmonicity Plays a Critical Role in DNN Based Versus in Biologically-Inspired Monaural Speech Segregation Systems

no code implementations • 8 Mar 2022 • Rahil Parikh, Ilya Kavalerov, Carol Espy-Wilson, Shihab Shamma

We evaluate their performance with mixtures of natural speech versus slightly manipulated inharmonic speech, where harmonics are slightly frequency jittered.

Ranked #1 on Adversarial Attack on WSJ0-2mix

Adversarial Attack Speech Separation

Paper
Add Code

Multimodal Depression Classification Using Articulatory Coordination Features And Hierarchical Attention Based Text Embeddings

no code implementations • 13 Feb 2022 • Nadee Seneviratne, Carol Espy-Wilson

The multimodal system is developed by combining embeddings from the session-level audio model and the HAN text model

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Multimodal Approach for Assessing Neuromotor Coordination in Schizophrenia Using Convolutional Neural Networks

no code implementations • 9 Oct 2021 • Yashish M. Siriwardena, Chris Kitchen, Deanna L. Kelly, Carol Espy-Wilson

This study investigates the speech articulatory coordination in schizophrenia subjects exhibiting strong positive symptoms (e. g. hallucinations and delusions), using two distinct channel-delay correlation methods.

Paper
Add Code

Speech based Depression Severity Level Classification Using a Multi-Stage Dilated CNN-LSTM Model

no code implementations • 9 Apr 2021 • Nadee Seneviratne, Carol Espy-Wilson

The ACFs derived from the vocal tract variables (TVs) are used to train a dilated Convolutional Neural Network based depression classification model to obtain segment-level predictions.

Binary Classification Classification +1

Paper
Add Code

Generalized Dilated CNN Models for Depression Detection Using Inverted Vocal Tract Variables

no code implementations • 13 Nov 2020 • Nadee Seneviratne, Carol Espy-Wilson

We show that ACFs derived from Vocal Tract Variables (TVs) show promise as a robust set of features for depression detection.

Cross-corpus Depression Detection

Paper
Add Code

Spoken Language Interaction with Robots: Research Issues and Recommendations, Report from the NSF Future Directions Workshop

no code implementations • 11 Nov 2020 • Matthew Marge, Carol Espy-Wilson, Nigel Ward

Fourth, more powerful adaptation methods are needed, to enable robots to communicate in new environments, for new tasks, and with diverse user populations, without extensive re-engineering or the collection of massive training data.

Paper
Add Code

Modeling Feature Representations for Affective Speech using Generative Adversarial Networks

no code implementations • 31 Oct 2019 • Saurabh Sahu, Rahul Gupta, Carol Espy-Wilson

In this work, we experiment with variants of GAN architectures to generate feature vectors corresponding to an emotion in two ways: (i) A generator is trained with samples from a mixture prior.

Cross-corpus Emotion Recognition +1

Paper
Add Code

On Enhancing Speech Emotion Recognition using Generative Adversarial Networks

no code implementations • 18 Jun 2018 • Saurabh Sahu, Rahul Gupta, Carol Espy-Wilson

GANs consist of a discriminator and a generator working in tandem playing a min-max game to learn a target underlying data distribution; when fed with data-points sampled from a simpler distribution (like uniform or Gaussian distribution).

Cross-corpus Speech Emotion Recognition

Paper
Add Code

Semi-supervised and Transfer learning approaches for low resource sentiment classification

no code implementations • 7 Jun 2018 • Rahul Gupta, Saurabh Sahu, Carol Espy-Wilson, Shrikanth Narayanan

Sentiment classification involves quantifying the affective reaction of a human to a document, media item or an event.

Classification General Classification +3

Paper
Add Code

Adversarial Auto-encoders for Speech Based Emotion Recognition

no code implementations • 6 Jun 2018 • Saurabh Sahu, Rahul Gupta, Ganesh Sivaraman, Wael Abd-Almageed, Carol Espy-Wilson

Recently, generative adversarial networks and adversarial autoencoders have gained a lot of attention in machine learning community due to their exceptional performance in tasks such as digit classification and face recognition.

Emotion Recognition Face Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.