Search Results for author: Sachin Kajarekar

Found 9 papers, 0 papers with code

CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations

no code implementations • 8 Feb 2022 • Vin Sachidananda, Shao-Yen Tseng, Erik Marchi, Sachin Kajarekar, Panayiotis Georgiou

By aligning audio representations to pretrained language representations and utilizing contrastive information between acoustic inputs, CALM is able to bootstrap audio embedding competitive with existing audio representation models in only a few hours of training time.

Emotion Recognition Natural Language Understanding

Paper
Add Code

Streaming on-device detection of device directed speech from voice and touch-based invocation

no code implementations • 9 Oct 2021 • Ognjen Rudovic, Akanksha Bindal, Vineet Garg, Pramod Simha, Pranay Dighe, Sachin Kajarekar

When interacting with smart devices such as mobile phones or wearables, the user typically invokes a virtual assistant (VA) by saying a keyword or by pressing a button on the device.

Computational Efficiency

Paper
Add Code

Analysis and Tuning of a Voice Assistant System for Dysfluent Speech

no code implementations • 18 Jun 2021 • Vikramjit Mitra, Zifang Huang, Colin Lea, Lauren Tooley, Sarah Wu, Darren Botten, Ashwini Palekar, Shrinath Thelapurath, Panayiotis Georgiou, Sachin Kajarekar, Jefferey Bigham

Dysfluencies and variations in speech pronunciation can severely degrade speech recognition performance, and for many individuals with moderate-to-severe speech disorders, voice operated systems do not work.

Intent Recognition speech-recognition +1

Paper
Add Code

SEP-28k: A Dataset for Stuttering Event Detection From Podcasts With People Who Stutter

no code implementations • 24 Feb 2021 • Colin Lea, Vikramjit Mitra, Aparna Joshi, Sachin Kajarekar, Jeffrey P. Bigham

The ability to automatically detect stuttering events in speech could help speech pathologists track an individual's fluency over time or help improve speech recognition systems for people with atypical speech patterns.

Event Detection speech-recognition +1

Paper
Add Code

Knowledge Transfer for Efficient On-device False Trigger Mitigation

no code implementations • 20 Oct 2020 • Pranay Dighe, Erik Marchi, Srikanth Vishnubhotla, Sachin Kajarekar, Devang Naik

But in case of a false trigger, transcribing the audio using ASR itself is strongly undesirable.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Audiovisual Speech Synthesis using Tacotron2

no code implementations • 3 Aug 2020 • Ahmed Hussen Abdelaziz, Anushree Prasanna Kumar, Chloe Seivwright, Gabriele Fanelli, Justin Binder, Yannis Stylianou, Sachin Kajarekar

The output acoustic features are used to condition a WaveRNN to reconstruct the speech waveform, and the output facial controllers are used to generate the corresponding video of the talking face.

Face Model Sentence +1

Paper
Add Code

On the Role of Visual Cues in Audiovisual Speech Enhancement

no code implementations • 25 Apr 2020 • Zakaria Aldeneh, Anushree Prasanna Kumar, Barry-John Theobald, Erik Marchi, Sachin Kajarekar, Devang Naik, Ahmed Hussen Abdelaziz

One byproduct of this finding is that the learned visual embeddings can be used as features for other visual speech applications.

Self-Supervised Learning Speech Enhancement

Paper
Add Code

Detecting Emotion Primitives from Speech and their use in discerning Categorical Emotions

no code implementations • 31 Jan 2020 • Vasudha Kowtha, Vikramjit Mitra, Chris Bartels, Erik Marchi, Sue Booker, William Caruso, Sachin Kajarekar, Devang Naik

Emotion plays an essential role in human-to-human communication, enabling us to convey feelings such as happiness, frustration, and sincerity.

Natural Language Understanding speech-recognition +1

Paper
Add Code

Multi-task Learning for Speaker Verification and Voice Trigger Detection

no code implementations • 26 Jan 2020 • Siddharth Sigtia, Erik Marchi, Sachin Kajarekar, Devang Naik, John Bridle

We train the network in a supervised multi-task learning setup, where the speech transcription branch of the network is trained to minimise a phonetic connectionist temporal classification (CTC) loss while the speaker recognition branch of the network is trained to label the input sequence with the correct label for the speaker.

Multi-Task Learning Speaker Recognition +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.