Search Results for author: Kranti Kumar Parida

Found 6 papers, 1 papers with code

Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross Modal Attention

no code implementations • 15 Nov 2021 • Kranti Kumar Parida, Siddharth Srivastava, Gaurav Sharma

In this work, we argue that depth map of the scene can act as a proxy for inducing distance information of different objects in the scene, for the task of audio binauralization.

Paper
Add Code

Depth Infused Binaural Audio Generation using Hierarchical Cross-Modal Attention

no code implementations • 10 Aug 2021 • Kranti Kumar Parida, Siddharth Srivastava, Neeraj Matiyali, Gaurav Sharma

Binaural audio gives the listener the feeling of being in the recording place and enhances the immersive experience if coupled with AR/VR.

Audio Generation

Paper
Add Code

Discriminative Semantic Transitive Consistency for Cross-Modal Learning

no code implementations • 25 Mar 2021 • Kranti Kumar Parida, Gaurav Sharma

Cross-modal retrieval is generally performed by projecting and aligning the data from two different modalities onto a shared representation space.

Cross-Modal Retrieval Retrieval

Paper
Add Code

Beyond Image to Depth: Improving Depth Prediction using Echoes

1 code implementation • CVPR 2021 • Kranti Kumar Parida, Siddharth Srivastava, Gaurav Sharma

We propose a novel multi modal fusion technique, which incorporates the material properties explicitly while combining audio (echoes) and visual modalities to predict the scene depth.

Depth Estimation Depth Prediction

Paper
Code

AVGZSLNet: Audio-Visual Generalized Zero-Shot Learning by Reconstructing Label Features from Multi-Modal Embeddings

no code implementations • 27 May 2020 • Pratik Mazumder, Pravendra Singh, Kranti Kumar Parida, Vinay P. Namboodiri

We use the semantic relatedness of text embeddings as a means for zero-shot learning by aligning audio and video embeddings with the corresponding class label text feature space.

Ranked #6 on GZSL Video Classification on ActivityNet-GZSL(main)

Generalized Zero-Shot Learning GZSL Video Classification +1

Paper
Add Code

Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zeroshot Classification and Retrieval of Videos

no code implementations • 19 Oct 2019 • Kranti Kumar Parida, Neeraj Matiyali, Tanaya Guha, Gaurav Sharma

We present an audio-visual multimodal approach for the task of zeroshot learning (ZSL) for classification and retrieval of videos.

Ranked #5 on GZSL Video Classification on VGGSound-GZSL(main)

General Classification GZSL Video Classification +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.