no code implementations • 10 Jun 2025 • Gonçalo Dias Pais, Valter Piedade, Moitreya Chatterjee, Marcus Greiff, Pedro Miraldo
In this paper, we leverage the implicit surface representation of the foreground scene and model a probability density function in a 3D image projection space to achieve a more targeted sampling of the rays toward regions of interest, resulting in improved rendering.
1 code implementation • 20 May 2025 • Hao Tang, Kevin Ellis, Suhas Lohit, Michael J. Jones, Moitreya Chatterjee
The task of estimating the world model describing the dynamics of a real world process assumes immense importance for anticipating and preparing for future outcomes.
1 code implementation • CVPR 2025 • Yung-Hsuan Lai, Janek Ebbers, Yu-Chiang Frank Wang, François Germain, Michael Jeffrey Jones, Moitreya Chatterjee
Audio-Visual Video Parsing (AVVP) entails the challenging task of localizing both uni-modal events (i. e., those occurring exclusively in either the visual or acoustic modality of a video) and multi-modal events (i. e., those occurring in both modalities concurrently).
1 code implementation • CVPR 2024 • Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang, Pedro Miraldo, Suhas Lohit, Moitreya Chatterjee
Extensions of Neural Radiance Fields (NeRFs) to model dynamic scenes have enabled their near photo-realistic, free-viewpoint rendering.
Ranked #1 on
Novel View Synthesis
on .
no code implementations • 28 Sep 2023 • Manish Sharma, Moitreya Chatterjee, Kuan-Chuan Peng, Suhas Lohit, Michael Jones
We first pretrain these factor matrices on the RGB modality, for which plenty of training data are assumed to exist and then augment only a few trainable parameters for training on the IR modality to avoid over-fitting, while encouraging them to capture complementary cues from those trained only on the RGB modality.
no code implementations • 6 Jun 2023 • Xiulong Liu, Sudipta Paul, Moitreya Chatterjee, Anoop Cherian
Audio-visual navigation of an agent towards locating an audio goal is a challenging task especially when the audio is sporadic or the environment is noisy.
1 code implementation • 29 Oct 2022 • Moitreya Chatterjee, Narendra Ahuja, Anoop Cherian
In this paper, we propose to use this connection between audio and visual dynamics for solving two challenging tasks simultaneously, namely: (i) separating audio sources from a mixture using visual cues, and (ii) predicting the 3D visual motion of a sounding source using its separated audio.
Audio Source Separation
Visually Guided Sound Source Separation
no code implementations • ICCV 2021 • Moitreya Chatterjee, Narendra Ahuja, Anoop Cherian
Predicting the future frames of a video is a challenging task, in part due to the underlying stochastic real-world phenomena.
1 code implementation • ICCV 2021 • Moitreya Chatterjee, Jonathan Le Roux, Narendra Ahuja, Anoop Cherian
At its core, AVSGS uses a recursive neural network that emits mutually-orthogonal sub-graph embeddings of the visual graph using multi-head attention.
Audio Source Separation
Visually Guided Sound Source Separation
no code implementations • 1 Jan 2021 • Moitreya Chatterjee, Anoop Cherian, Narendra Ahuja
Predicting the future frames of a video is a challenging task, in part due to the underlying stochastic real-world phenomena.
no code implementations • ECCV 2020 • Anoop Cherian, Moitreya Chatterjee, Narendra Ahuja
To tackle this problem, we present Sound2Sight, a deep variational framework, that is trained to learn a per frame stochastic prior conditioned on a joint embedding of audio and past frames.
no code implementations • 8 Jul 2020 • Shijie Geng, Peng Gao, Moitreya Chatterjee, Chiori Hori, Jonathan Le Roux, Yongfeng Zhang, Hongsheng Li, Anoop Cherian
Given an input video, its associated audio, and a brief caption, the audio-visual scene aware dialog (AVSD) task requires an agent to indulge in a question-answer dialog with a human about the audio-visual content.
no code implementations • ECCV 2018 • Moitreya Chatterjee, Alexander G. Schwing
Paragraph generation from images, which has gained popularity recently, is an important task for video summarization, editing, and support of the disabled.
no code implementations • ECCV 2018 • Abhimanyu Dubey, Moitreya Chatterjee, Narendra Ahuja
We propose a novel Convolutional Neural Network (CNN) compression algorithm based on coreset representations of filters.
1 code implementation • NeurIPS 2016 • Arulkumar Subramaniam, Moitreya Chatterjee, Anurag Mittal
A novel inexact matching technique then matches pixels in the first representation with those of the second.
no code implementations • 27 Apr 2015 • Moitreya Chatterjee, Anton Leuski
Conventional multimedia annotation/retrieval systems such as Normalized Continuous Relevance Model (NormCRM) [16] require a fully labeled training data for a good performance.