no code implementations • 19 Apr 2024 • Darshan Prabhu, Sai Ganesh Mirishkar, Pankaj Wasnik
Self-supervised learned (SSL) models such as Wav2vec and HuBERT yield state-of-the-art results on speech-related tasks.
no code implementations • 20 Mar 2024 • Shivam Ratnakant Mhaskar, Nirmesh J. Shah, Mohammadi Zaki, Ashishkumar P. Gudmalwar, Pankaj Wasnik, Rajiv Ratn Shah
In this paper, we present the development of an isometric NMT system using Reinforcement Learning (RL), with a focus on optimizing the alignment of phoneme counts in the source and target language sentence pairs.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6
no code implementations • 23 Feb 2024 • Purbayan Kar, Vishal Chudasama, Naoyuki Onoe, Pankaj Wasnik, Vineeth Balasubramanian
To effectively utilize the newly proposed augmentation technique, we employ a Siamese architecture-based training mechanism with a Deep Canonical Correlation Analysis (DCCA)-based loss to achieve collective learning of high-level feature representations from two different views of the input images.
Ranked #1 on Facial Landmark Detection on WFLW
no code implementations • 4 Jun 2023 • Purbayan Kar, Vishal Chudasama, Naoyuki Onoe, Pankaj Wasnik
Semi-supervised object detection (SSOD) has made significant progress with the development of pseudo-label-based end-to-end methods.
no code implementations • 5 Jun 2022 • Vishal Chudasama, Purbayan Kar, Ashish Gudmalwar, Nirmesh Shah, Pankaj Wasnik, Naoyuki Onoe
We introduce a new feature extractor to extract latent features from the audio and visual modality.
Ranked #10 on Emotion Recognition in Conversation on MELD
no code implementations • 5 Dec 2019 • Raghavendra Ramachandra, Martin Stokkenes, Amir Mohammadi, Sushma Venkatesh, Kiran Raja, Pankaj Wasnik, Eric Poiret, Sébastien Marcel, Christoph Busch
One of the unique features of this dataset is that it is collected in four different geographic locations representing a diverse population and ethnicity.