Phrase Grounding

36 papers with code • 5 benchmarks • 6 datasets

Given an image and a corresponding caption, the Phrase Grounding task aims to ground each entity mentioned by a noun phrase in the caption to a region in the image.

Source: Phrase Grounding by Soft-Label Chain Conditional Random Field

Benchmarks

Add a Result

These leaderboards are used to track progress in Phrase Grounding

Dataset	Best Model	Compare
Flickr30k Entities Test	GLIPv2	See all
Visual Genome	GbS VG	See all
Flickr30k	GBS Ensemble + 12-in-1	See all
ReferIt	VG_BiLSTM_VGG	See all
Flickr30k Entities Dev	Fiber-B	See all

Libraries

Use these libraries to find Phrase Grounding models and implementations

microsoft/GLIP

2 papers

1,957

Datasets

Subtasks

Grounded Open Vocabulary Acquisition

Most implemented papers

Most implemented Social Latest No code

Learning Cross-modal Context Graph for Visual Grounding

youngfly11/LCMCG-PyTorch • • AAAI-2020 2020

To address their limitations, this paper proposes a language-guided graph representation to capture the global context of grounding entities and their relations, and develop a cross-modal graph matching strategy for the multiple-phrase visual grounding task.

Paper
Code

Contrastive Learning for Weakly Supervised Phrase Grounding

BigRedT/info-ground • • ECCV 2020

Given pairs of images and captions, we maximize compatibility of the attention-weighted regions and the words in the corresponding caption, compared to non-corresponding pairs of images and captions.

Paper
Code

Neural Parameter Allocation Search

bryanplummer/ssn • • ICLR 2022

We introduce Neural Parameter Allocation Search (NPAS), a novel task where the goal is to train a neural network given an arbitrary, fixed parameter budget.

Paper
Code

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

jhuang81/weak-sup-visual-grounding • • CVPR 2021

Our core innovation is the learning of a region-phrase score function, based on which an image-sentence score function is further constructed.

Paper
Code

MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding

qinzzz/Multimodal-Alignment-Framework • • EMNLP 2020

Phrase localization is a task that studies the mapping from textual phrases to regions of an image.

Paper
Code

Learning to ground medical text in a 3D human atlas

gorjanradevski/text2atlas • • CONLL 2020

In this paper, we develop a method for grounding medical text into a physically meaningful and interpretable space corresponding to a human atlas.

Paper
Code

MDETR - Modulated Detection for End-to-End Multi-Modal Understanding

ashkamath/mdetr • • ICCV 2021

We also investigate the utility of our model as an object detector on a given label set when fine-tuned in a few-shot setting.

Paper
Code

Detector-Free Weakly Supervised Grounding by Separation

aarbelle/GroundingBySeparation • • ICCV 2021

In this work, we focus on the task of Detector-Free WSG (DF-WSG) to solve WSG without relying on a pre-trained detector.

Paper
Code

Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships

bigai-research/vlgae • • CVPR 2022

Our goal is to bridge the visual scene graphs and linguistic dependency trees seamlessly.

Paper
Code

Making the Most of Text Semantics to Improve Biomedical Vision--Language Processing

microsoft/hi-ml • • 21 Apr 2022

We release a new dataset with locally-aligned phrase grounding annotations by radiologists to facilitate the study of complex semantic modelling in biomedical vision--language processing.

Paper
Code

Phrase Grounding

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result