no code implementations • 25 Aug 2021 • Zhaoheng Zheng, Arka Sadhu, Ram Nevatia
We explore object detection with two attributes: color and material.
no code implementations • NAACL 2021 • Arka Sadhu, Kan Chen, Ram Nevatia
Video Question Answering (VidQA) evaluation metrics have been limited to a single-word answer or selecting a phrase from a fixed set of phrases.
1 code implementation • CVPR 2021 • Arka Sadhu, Tanmay Gupta, Mark Yatskar, Ram Nevatia, Aniruddha Kembhavi
We propose a new framework for understanding and representing related salient events in a video using visual semantic role labeling.
no code implementations • 5 Nov 2020 • Haidong Zhu, Arka Sadhu, Zhaoheng Zheng, Ram Nevatia
The annotated language queries available during training are limited, which also limits the variations of language combinations that a model can see during training.
1 code implementation • NeurIPS 2021 • Xisen Jin, Arka Sadhu, Junyi Du, Xiang Ren
We explore task-free continual learning (CL), in which a model is trained to avoid catastrophic forgetting in the absence of explicit task boundaries or identities.
2 code implementations • EMNLP 2020 • Xisen Jin, Junyi Du, Arka Sadhu, Ram Nevatia, Xiang Ren
To study this human-like language acquisition ability, we present VisCOLL, a visually grounded language learning task, which simulates the continual acquisition of compositional phrases from streaming visual scenes.
1 code implementation • CVPR 2020 • Arka Sadhu, Kan Chen, Ram Nevatia
We explore the task of Video Object Grounding (VOG), which grounds objects in videos referred to in natural language descriptions.
1 code implementation • ICCV 2019 • Arka Sadhu, Kan Chen, Ram Nevatia
A phrase grounding system localizes a particular object in an image referred to by a natural language query.