Search Results for author: Sachin Kajarekar

Found 9 papers, 0 papers with code

CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations

no code implementations8 Feb 2022 Vin Sachidananda, Shao-Yen Tseng, Erik Marchi, Sachin Kajarekar, Panayiotis Georgiou

By aligning audio representations to pretrained language representations and utilizing contrastive information between acoustic inputs, CALM is able to bootstrap audio embedding competitive with existing audio representation models in only a few hours of training time.

Emotion Recognition Natural Language Understanding

Streaming on-device detection of device directed speech from voice and touch-based invocation

no code implementations9 Oct 2021 Ognjen Rudovic, Akanksha Bindal, Vineet Garg, Pramod Simha, Pranay Dighe, Sachin Kajarekar

When interacting with smart devices such as mobile phones or wearables, the user typically invokes a virtual assistant (VA) by saying a keyword or by pressing a button on the device.

Computational Efficiency

Analysis and Tuning of a Voice Assistant System for Dysfluent Speech

no code implementations18 Jun 2021 Vikramjit Mitra, Zifang Huang, Colin Lea, Lauren Tooley, Sarah Wu, Darren Botten, Ashwini Palekar, Shrinath Thelapurath, Panayiotis Georgiou, Sachin Kajarekar, Jefferey Bigham

Dysfluencies and variations in speech pronunciation can severely degrade speech recognition performance, and for many individuals with moderate-to-severe speech disorders, voice operated systems do not work.

Intent Recognition speech-recognition +1

SEP-28k: A Dataset for Stuttering Event Detection From Podcasts With People Who Stutter

no code implementations24 Feb 2021 Colin Lea, Vikramjit Mitra, Aparna Joshi, Sachin Kajarekar, Jeffrey P. Bigham

The ability to automatically detect stuttering events in speech could help speech pathologists track an individual's fluency over time or help improve speech recognition systems for people with atypical speech patterns.

Event Detection speech-recognition +1

Audiovisual Speech Synthesis using Tacotron2

no code implementations3 Aug 2020 Ahmed Hussen Abdelaziz, Anushree Prasanna Kumar, Chloe Seivwright, Gabriele Fanelli, Justin Binder, Yannis Stylianou, Sachin Kajarekar

The output acoustic features are used to condition a WaveRNN to reconstruct the speech waveform, and the output facial controllers are used to generate the corresponding video of the talking face.

Face Model Sentence +1

Multi-task Learning for Speaker Verification and Voice Trigger Detection

no code implementations26 Jan 2020 Siddharth Sigtia, Erik Marchi, Sachin Kajarekar, Devang Naik, John Bridle

We train the network in a supervised multi-task learning setup, where the speech transcription branch of the network is trained to minimise a phonetic connectionist temporal classification (CTC) loss while the speaker recognition branch of the network is trained to label the input sequence with the correct label for the speaker.

Multi-Task Learning Speaker Recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.