Natural Language Visual Grounding
16 papers with code • 0 benchmarks • 6 datasets
Benchmarks
These leaderboards are used to track progress in Natural Language Visual Grounding
Latest papers
ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
We present ALFRED (Action Learning From Realistic Environments and Directives), a benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks.
Searching for Ambiguous Objects in Videos using Relational Referring Expressions
Especially in ambiguous settings, humans prefer expressions (called relational referring expressions) that describe an object with respect to a distinguishing, unique object.
Modularized Textual Grounding for Counterfactual Resilience
Computer Vision applications often require a textual grounding module with precision, interpretability, and resilience to counterfactual inputs/queries.
Self-Monitoring Navigation Agent via Auxiliary Progress Estimation
The Vision-and-Language Navigation (VLN) task entails an agent following navigational instruction in photo-realistic unknown environments.
Robust Change Captioning
We present a novel Dual Dynamic Attention Model (DUDA) to perform robust Change Captioning.
Grounding of Textual Phrases in Images by Reconstruction
We propose a novel approach which learns grounding by reconstructing a given phrase using an attention mechanism, which can be either latent or optimized directly.