Reasoning

Natural Language Visual Grounding

16 papers with code • 0 benchmarks • 6 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Natural Language Visual Grounding

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Datasets

Most implemented papers

Most implemented Social Latest No code

ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

askforalfred/alfred • • CVPR 2020

We present ALFRED (Action Learning From Realistic Environments and Directives), a benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks.

Paper
Code

Grounding of Textual Phrases in Images by Reconstruction

akirafukui/vqa-mcb • • 12 Nov 2015

We propose a novel approach which learns grounding by reconstructing a given phrase using an attention mechanism, which can be either latent or optimized directly.

Paper
Code

Self-Monitoring Navigation Agent via Auxiliary Progress Estimation

chihyaoma/selfmonitoring-agent • • ICLR 2019

The Vision-and-Language Navigation (VLN) task entails an agent following navigational instruction in photo-realistic unknown environments.

Paper
Code

Composing Pick-and-Place Tasks By Grounding Language

mees/AIS-Alexa-Robot • 16 Feb 2021

Controlling robots to perform tasks via natural language is one of the most challenging topics in human-robot interaction.

Paper
Code

Robust Change Captioning

Seth-Park/RobustChangeCaptioning • • ICCV 2019

We present a novel Dual Dynamic Attention Model (DUDA) to perform robust Change Captioning.

Paper
Code

Modularized Textual Grounding for Counterfactual Resilience

jacobswan1/MTG-pytorch • • CVPR 2019

Computer Vision applications often require a textual grounding module with precision, interpretability, and resilience to counterfactual inputs/queries.

Paper
Code

Searching for Ambiguous Objects in Videos using Relational Referring Expressions

hazananayurt/viref • • 3 Aug 2019

Especially in ambiguous settings, humans prefer expressions (called relational referring expressions) that describe an object with respect to a distinguishing, unique object.

Paper
Code

Learning Cross-modal Context Graph for Visual Grounding

youngfly11/LCMCG-PyTorch • • AAAI-2020 2020

To address their limitations, this paper proposes a language-guided graph representation to capture the global context of grounding entities and their relations, and develop a cross-modal graph matching strategy for the multiple-phrase visual grounding task.

Paper
Code

A Linguistic Analysis of Visually Grounded Dialogues Based on Spatial Expressions

Alab-NII/onecommon • • Findings of the Association for Computational Linguistics 2020

Recent models achieve promising results in visually grounded dialogues.

Paper
Code

ALFWorld: Aligning Text and Embodied Environments for Interactive Learning

alfworld/alfworld • • 8 Oct 2020

ALFWorld enables the creation of a new BUTLER agent whose abstract knowledge, learned in TextWorld, corresponds directly to concrete, visually grounded actions.

Paper
Code

Natural Language Visual Grounding

Benchmarks Add a Result

Datasets

Most implemented papers

Content

Benchmarks

Add a Result