Natural Language Visual Grounding

16 papers with code • 0 benchmarks • 6 datasets

This task has no description! Would you like to contribute one?

Localizing Moments in Long Video Via Multimodal Guidance

waybarrios/guidance-based-video-grounding ICCV 2023

In this paper, we propose a method for improving the performance of natural language grounding in long videos by identifying and pruning out non-describable windows.

13
26 Feb 2023

Belief Revision based Caption Re-ranker with Visual Semantic Information

ahmedssabir/belief-revision-score COLING 2022

In this work, we focus on improving the captions generated by image-caption generation systems.

9
16 Sep 2022

TubeDETR: Spatio-Temporal Video Grounding with Transformers

antoyang/TubeDETR CVPR 2022

We consider the problem of localizing a spatio-temporal tube in a video corresponding to a given text query.

157
30 Mar 2022

CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

mees/calvin 6 Dec 2021

We show that a baseline model based on multi-context imitation learning performs poorly on CALVIN, suggesting that there is significant room for developing innovative agents that learn to relate human language to their world models with this benchmark.

265
06 Dec 2021

Panoptic Narrative Grounding

bcv-uniandes/png 10 Sep 2021

This paper proposes Panoptic Narrative Grounding, a spatially fine and general formulation of the natural language visual grounding problem.

60
10 Sep 2021

Composing Pick-and-Place Tasks By Grounding Language

mees/AIS-Alexa-Robot 16 Feb 2021

Controlling robots to perform tasks via natural language is one of the most challenging topics in human-robot interaction.

7
16 Feb 2021

Panoptic Narrative Grounding

bcv-uniandes/png ICCV 2021

This paper proposes Panoptic Narrative Grounding, a spatially fine and general formulation of the natural language visual grounding problem.

60
01 Jan 2021

ALFWorld: Aligning Text and Embodied Environments for Interactive Learning

alfworld/alfworld 8 Oct 2020

ALFWorld enables the creation of a new BUTLER agent whose abstract knowledge, learned in TextWorld, corresponds directly to concrete, visually grounded actions.

252
08 Oct 2020

Learning Cross-modal Context Graph for Visual Grounding

youngfly11/LCMCG-PyTorch AAAI-2020 2020

To address their limitations, this paper proposes a language-guided graph representation to capture the global context of grounding entities and their relations, and develop a cross-modal graph matching strategy for the multiple-phrase visual grounding task.

57
13 Feb 2020