Natural Language Visual Grounding

16 papers with code • 0 benchmarks • 6 datasets

This task has no description! Would you like to contribute one?

ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

askforalfred/alfred CVPR 2020

We present ALFRED (Action Learning From Realistic Environments and Directives), a benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks.

331
03 Dec 2019

Searching for Ambiguous Objects in Videos using Relational Referring Expressions

hazananayurt/viref 3 Aug 2019

Especially in ambiguous settings, humans prefer expressions (called relational referring expressions) that describe an object with respect to a distinguishing, unique object.

10
03 Aug 2019

Modularized Textual Grounding for Counterfactual Resilience

jacobswan1/MTG-pytorch CVPR 2019

Computer Vision applications often require a textual grounding module with precision, interpretability, and resilience to counterfactual inputs/queries.

12
07 Apr 2019

Self-Monitoring Navigation Agent via Auxiliary Progress Estimation

chihyaoma/selfmonitoring-agent ICLR 2019

The Vision-and-Language Navigation (VLN) task entails an agent following navigational instruction in photo-realistic unknown environments.

117
10 Jan 2019

Robust Change Captioning

Seth-Park/RobustChangeCaptioning ICCV 2019

We present a novel Dual Dynamic Attention Model (DUDA) to perform robust Change Captioning.

41
08 Jan 2019

Grounding of Textual Phrases in Images by Reconstruction

akirafukui/vqa-mcb 12 Nov 2015

We propose a novel approach which learns grounding by reconstructing a given phrase using an attention mechanism, which can be either latent or optimized directly.

218
12 Nov 2015