Search Results for author: Panayiotis Georgiou

Found 29 papers, 5 papers with code

Unsupervised Latent Behavior Manifold Learning from Acoustic Features: audio2behavior

no code implementations12 Jan 2017 Haoqi Li, Brian Baucom, Panayiotis Georgiou

Behavioral annotation using signal processing and machine learning is highly dependent on training data and manual annotations of behavioral labels.

Learning from Past Mistakes: Improving Automatic Speech Recognition Output via Noisy-Clean Phrase Context Modeling

1 code implementation7 Feb 2018 Prashanth Gurunath Shivakumar, Haoqi Li, Kevin Knight, Panayiotis Georgiou

In this work we model ASR as a phrase-based noisy transformation channel and propose an error correction system that can learn from the aggregate errors of all the independent modules constituting the ASR and attempt to invert those.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Neural Predictive Coding using Convolutional Neural Networks towards Unsupervised Learning of Speaker Characteristics

no code implementations22 Feb 2018 Arindam Jati, Panayiotis Georgiou

Two sets of experiments are done in different scenarios to evaluate the strength of NPC embeddings and compare with state-of-the-art in-domain supervised methods.

Speaker Identification Speaker Recognition +3

Towards an Unsupervised Entrainment Distance in Conversational Speech using Deep Neural Networks

no code implementations23 Apr 2018 Md Nasir, Brian Baucom, Shrikanth Narayanan, Panayiotis Georgiou

Entrainment is a known adaptation mechanism that causes interaction participants to adapt or synchronize their acoustic characteristics.

Transfer Learning from Adult to Children for Speech Recognition: Evaluation, Analysis and Recommendations

no code implementations8 May 2018 Prashanth Gurunath Shivakumar, Panayiotis Georgiou

Evaluations are presented on (i) comparisons of earlier GMM-HMM and the newer DNN Models, (ii) effectiveness of standard adaptation techniques versus transfer learning, (iii) various adaptation configurations in tackling the variabilities present in children speech, in terms of (a) acoustic spectral variability, and (b) pronunciation variability and linguistic constraints.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Modeling Interpersonal Influence of Verbal Behavior in Couples Therapy Dyadic Interactions

no code implementations23 May 2018 Sandeep Nallan Chakravarthula, Brian Baucom, Panayiotis Georgiou

Dyadic interactions among humans are marked by speakers continuously influencing and reacting to each other in terms of responses and behaviors, among others.

Unsupervised Online Multitask Learning of Behavioral Sentence Embeddings

no code implementations18 Jul 2018 Shao-Yen Tseng, Brian Baucom, Panayiotis Georgiou

Unsupervised learning has been an attractive method for easily deriving meaningful data representations from vast amounts of unlabeled data.

Domain Adaptation Sentence +1

Multi-label Multi-task Deep Learning for Behavioral Coding

no code implementations29 Oct 2018 James Gibson, David C. Atkins, Torrey Creed, Zac Imel, Panayiotis Georgiou, Shrikanth Narayanan

We propose a methodology for estimating human behaviors in psychotherapy sessions using mutli-label and multi-task learning paradigms.

Multi-Task Learning

Confusion2Vec: Towards Enriching Vector Space Word Representations with Representational Ambiguities

no code implementations8 Nov 2018 Prashanth Gurunath Shivakumar, Panayiotis Georgiou

In this paper, we propose a novel word vector representation, Confusion2Vec, motivated from the human speech production and perception that encodes representational ambiguity.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Speaker Diarization With Lexical Information

no code implementations27 Nov 2018 Tae Jin Park, Kyu Han, Ian Lane, Panayiotis Georgiou

This work presents a novel approach to leverage lexical information for speaker diarization.

Clustering speaker-diarization +1

Spoken Language Intent Detection using Confusion2Vec

1 code implementation7 Apr 2019 Prashanth Gurunath Shivakumar, Mu Yang, Panayiotis Georgiou

In this paper, we address the spoken language intent detection under noisy conditions imposed by automatic speech recognition (ASR) systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Modeling Interpersonal Linguistic Coordination in Conversations using Word Mover's Distance

no code implementations12 Apr 2019 Md Nasir, Sandeep Nallan Chakravarthula, Brian Baucom, David C. Atkins, Panayiotis Georgiou, Shrikanth Narayanan

We find that our proposed measure is correlated with the therapist's empathy towards their patient in Motivational Interviewing and with affective behaviors in Couples Therapy.

Predicting Behavior in Cancer-Afflicted Patient and Spouse Interactions using Speech and Language

no code implementations2 Aug 2019 Sandeep Nallan Chakravarthula, Haoqi Li, Shao-Yen Tseng, Maija Reblin, Panayiotis Georgiou

Cancer impacts the quality of life of those diagnosed as well as their spouse caregivers, in addition to potentially influencing their day-to-day behaviors.

Behavior Gated Language Models

no code implementations31 Aug 2019 Prashanth Gurunath Shivakumar, Shao-Yen Tseng, Panayiotis Georgiou, Shrikanth Narayanan

In this work we derive motivation from psycholinguistics and propose the addition of behavioral information into the context of language modeling.

Language Modelling

Multimodal Embeddings from Language Models

1 code implementation10 Sep 2019 Shao-Yen Tseng, Panayiotis Georgiou, Shrikanth Narayanan

Word embeddings such as ELMo have recently been shown to model word semantics with greater efficacy through contextualized learning on large-scale language corpora, resulting in significant improvement in state of the art across many natural language tasks.

Emotion Recognition Language Modelling +1

Linking emotions to behaviors through deep transfer learning

1 code implementation8 Oct 2019 Haoqi Li, Brian Baucom, Panayiotis Georgiou

Further, we investigate the importance of emotional-context in the expression of behavior by constraining (or not) the neural networks' contextual view of the data.

Emotion Recognition Transfer Learning

RNN based Incremental Online Spoken Language Understanding

no code implementations23 Oct 2019 Prashanth Gurunath Shivakumar, Naveen Kumar, Panayiotis Georgiou, Shrikanth Narayanan

We introduce and analyze different recurrent neural network architectures for incremental and online processing of the ASR transcripts and compare it to the existing offline systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +8

Speaker-invariant Affective Representation Learning via Adversarial Training

no code implementations4 Nov 2019 Haoqi Li, Ming Tu, Jing Huang, Shrikanth Narayanan, Panayiotis Georgiou

In this paper, we propose a machine learning framework to obtain speech emotion representations by limiting the effect of speaker variability in the speech signals.

Emotion Classification Representation Learning +1

An analysis of observation length requirements for machine understanding of human behaviors from spoken language

no code implementations21 Nov 2019 Sandeep Nallan Chakravarthula, Brian Baucom, Shrikanth Narayanan, Panayiotis Georgiou

In this paper, we investigate this link and present an analysis framework that determines appropriate window lengths for the task of behavior estimation.

Speaker Diarization with Lexical Information

no code implementations13 Apr 2020 Tae Jin Park, Kyu J. Han, Jing Huang, Xiaodong He, Bo-Wen Zhou, Panayiotis Georgiou, Shrikanth Narayanan

This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Confusion2vec 2.0: Enriching Ambiguous Spoken Language Representations with Subwords

1 code implementation3 Feb 2021 Prashanth Gurunath Shivakumar, Panayiotis Georgiou, Shrikanth Narayanan

Confusion2vec, motivated from human speech production and perception, is a word vector representation which encodes ambiguities present in human spoken language in addition to semantics and syntactic information.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Automated Evaluation Of Psychotherapy Skills Using Speech And Language Technologies

no code implementations22 Feb 2021 Nikolaos Flemotomos, Victor R. Martinez, Zhuohao Chen, Karan Singla, Victor Ardulov, Raghuveer Peri, Derek D. Caperton, James Gibson, Michael J. Tanana, Panayiotis Georgiou, Jake Van Epps, Sarah P. Lord, Tad Hirsch, Zac E. Imel, David C. Atkins, Shrikanth Narayanan

With the growing prevalence of psychological interventions, it is vital to have measures which rate the effectiveness of psychological care to assist in training, supervision, and quality assurance of services.

Unsupervised Speech Representation Learning for Behavior Modeling using Triplet Enhanced Contextualized Networks

no code implementations1 Apr 2021 Haoqi Li, Brian Baucom, Shrikanth Narayanan, Panayiotis Georgiou

In this paper, we exploit the stationary properties of human behavior within an interaction and present a representation learning method to capture behavioral information from speech in an unsupervised way.

Representation Learning

Analysis and Tuning of a Voice Assistant System for Dysfluent Speech

no code implementations18 Jun 2021 Vikramjit Mitra, Zifang Huang, Colin Lea, Lauren Tooley, Sarah Wu, Darren Botten, Ashwini Palekar, Shrinath Thelapurath, Panayiotis Georgiou, Sachin Kajarekar, Jefferey Bigham

Dysfluencies and variations in speech pronunciation can severely degrade speech recognition performance, and for many individuals with moderate-to-severe speech disorders, voice operated systems do not work.

Intent Recognition speech-recognition +1

CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations

no code implementations8 Feb 2022 Vin Sachidananda, Shao-Yen Tseng, Erik Marchi, Sachin Kajarekar, Panayiotis Georgiou

By aligning audio representations to pretrained language representations and utilizing contrastive information between acoustic inputs, CALM is able to bootstrap audio embedding competitive with existing audio representation models in only a few hours of training time.

Emotion Recognition Natural Language Understanding

Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models

no code implementations6 Dec 2023 Dominik Wagner, Alexander Churchill, Siddharth Sigtia, Panayiotis Georgiou, Matt Mirsamadi, Aarshee Mishra, Erik Marchi

We compare the proposed system to unimodal baselines and show that the multimodal approach achieves lower equal-error-rates (EERs), while using only a fraction of the training data.

Automatic Speech Recognition Language Modelling +3

Cannot find the paper you are looking for? You can Submit a new open access paper.