Search Results for author: Arka Sadhu

Found 8 papers, 5 papers with code

Video Question Answering with Phrases via Semantic Roles

no code implementations NAACL 2021 Arka Sadhu, Kan Chen, Ram Nevatia

Video Question Answering (VidQA) evaluation metrics have been limited to a single-word answer or selecting a phrase from a fixed set of phrases.

Question Answering Video Question Answering

Visual Semantic Role Labeling for Video Understanding

1 code implementation CVPR 2021 Arka Sadhu, Tanmay Gupta, Mark Yatskar, Ram Nevatia, Aniruddha Kembhavi

We propose a new framework for understanding and representing related salient events in a video using visual semantic role labeling.

Semantic Role Labeling Video Recognition +1

Utilizing Every Image Object for Semi-supervised Phrase Grounding

no code implementations5 Nov 2020 Haidong Zhu, Arka Sadhu, Zhaoheng Zheng, Ram Nevatia

The annotated language queries available during training are limited, which also limits the variations of language combinations that a model can see during training.

Phrase Grounding Referring Expression

Gradient-based Editing of Memory Examples for Online Task-free Continual Learning

1 code implementation NeurIPS 2021 Xisen Jin, Arka Sadhu, Junyi Du, Xiang Ren

We explore task-free continual learning (CL), in which a model is trained to avoid catastrophic forgetting in the absence of explicit task boundaries or identities.

Continual Learning

Visually Grounded Continual Learning of Compositional Phrases

2 code implementations EMNLP 2020 Xisen Jin, Junyi Du, Arka Sadhu, Ram Nevatia, Xiang Ren

To study this human-like language acquisition ability, we present VisCOLL, a visually grounded language learning task, which simulates the continual acquisition of compositional phrases from streaming visual scenes.

Continual Learning Grounded language learning +1

Video Object Grounding using Semantic Roles in Language Description

1 code implementation CVPR 2020 Arka Sadhu, Kan Chen, Ram Nevatia

We explore the task of Video Object Grounding (VOG), which grounds objects in videos referred to in natural language descriptions.

Object Position

Cannot find the paper you are looking for? You can Submit a new open access paper.