About

Given an image and a corresponding caption, the Phrase Grounding task aims to ground each entity mentioned by a noun phrase in the caption to a region in the image.

Source: Phrase Grounding by Soft-Label Chain Conditional Random Field

Benchmarks

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Datasets

Greatest papers with code

Grounding of Textual Phrases in Images by Reconstruction

12 Nov 2015akirafukui/vqa-mcb

We propose a novel approach which learns grounding by reconstructing a given phrase using an attention mechanism, which can be either latent or optimized directly.

LANGUAGE MODELLING NATURAL LANGUAGE VISUAL GROUNDING PHRASE GROUNDING VISUAL GROUNDING

Zero-Shot Grounding of Objects from Natural Language Queries

ICCV 2019 TheShadow29/zsgnet-pytorch

A phrase grounding system localizes a particular object in an image referred to by a natural language query.

OBJECT DETECTION PHRASE GROUNDING

Contrastive Learning for Weakly Supervised Phrase Grounding

ECCV 2020 BigRedT/info-ground

Given pairs of images and captions, we maximize compatibility of the attention-weighted regions and the words in the corresponding caption, compared to non-corresponding pairs of images and captions.

LANGUAGE MODELLING PHRASE GROUNDING

Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment

ICCV 2019 BigRedT/info-ground

We propose a novel end-to-end model that uses caption-to-image retrieval as a `downstream' task to guide the process of phrase localization.

IMAGE RETRIEVAL PHRASE GROUNDING VISUAL GROUNDING

Revisiting Image-Language Networks for Open-ended Phrase Detection

17 Nov 2018BryanPlummer/cite

Most existing work that grounds natural language phrases in images starts with the assumption that the phrase in question is relevant to the image.

OBJECT DETECTION PHRASE GROUNDING

Conditional Image-Text Embedding Networks

ECCV 2018 BryanPlummer/cite

This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model.

PHRASE GROUNDING

Learning Cross-modal Context Graph for Visual Grounding

AAAI-2020 2020 youngfly11/LCMCG-PyTorch

To address their limitations, this paper proposes a language-guided graph representation to capture the global context of grounding entities and their relations, and develop a cross-modal graph matching strategy for the multiple-phrase visual grounding task.

GRAPH MATCHING LANGUAGE MODELLING NATURAL LANGUAGE VISUAL GROUNDING PHRASE GROUNDING VISUAL GROUNDING

Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding

CVPR 2019 hassanhub/MultiGrounding

Following dedicated non-linear mappings for visual features at each level, word, and sentence embeddings, we obtain multiple instantiations of our common semantic space in which comparisons between any target text and the visual content is performed with cosine similarity.

LANGUAGE MODELLING PHRASE GROUNDING SENTENCE EMBEDDINGS

Phrase Grounding by Soft-Label Chain Conditional Random Field

IJCNLP 2019 liujch1998/SoftLabelCCRF

In this paper, we formulate phrase grounding as a sequence labeling task where we treat candidate regions as potential labels, and use neural chain Conditional Random Fields (CRFs) to model dependencies among regions for adjacent mentions.

PHRASE GROUNDING STRUCTURED PREDICTION