no code implementations • 15 Nov 2021 • Kranti Kumar Parida, Siddharth Srivastava, Gaurav Sharma
In this work, we argue that depth map of the scene can act as a proxy for inducing distance information of different objects in the scene, for the task of audio binauralization.
no code implementations • 10 Aug 2021 • Kranti Kumar Parida, Siddharth Srivastava, Neeraj Matiyali, Gaurav Sharma
Binaural audio gives the listener the feeling of being in the recording place and enhances the immersive experience if coupled with AR/VR.
no code implementations • 25 Mar 2021 • Kranti Kumar Parida, Gaurav Sharma
Cross-modal retrieval is generally performed by projecting and aligning the data from two different modalities onto a shared representation space.
1 code implementation • CVPR 2021 • Kranti Kumar Parida, Siddharth Srivastava, Gaurav Sharma
We propose a novel multi modal fusion technique, which incorporates the material properties explicitly while combining audio (echoes) and visual modalities to predict the scene depth.
no code implementations • 27 May 2020 • Pratik Mazumder, Pravendra Singh, Kranti Kumar Parida, Vinay P. Namboodiri
We use the semantic relatedness of text embeddings as a means for zero-shot learning by aligning audio and video embeddings with the corresponding class label text feature space.
Ranked #6 on GZSL Video Classification on ActivityNet-GZSL(main)
no code implementations • 19 Oct 2019 • Kranti Kumar Parida, Neeraj Matiyali, Tanaya Guha, Gaurav Sharma
We present an audio-visual multimodal approach for the task of zeroshot learning (ZSL) for classification and retrieval of videos.
Ranked #5 on GZSL Video Classification on VGGSound-GZSL(main)