Phrase Grounding

36 papers with code • 5 benchmarks • 6 datasets

Given an image and a corresponding caption, the Phrase Grounding task aims to ground each entity mentioned by a noun phrase in the caption to a region in the image.

Source: Phrase Grounding by Soft-Label Chain Conditional Random Field

Libraries

Use these libraries to find Phrase Grounding models and implementations
2 papers
1,957

Most implemented papers

Learning Cross-modal Context Graph for Visual Grounding

youngfly11/LCMCG-PyTorch AAAI-2020 2020

To address their limitations, this paper proposes a language-guided graph representation to capture the global context of grounding entities and their relations, and develop a cross-modal graph matching strategy for the multiple-phrase visual grounding task.

Contrastive Learning for Weakly Supervised Phrase Grounding

BigRedT/info-ground ECCV 2020

Given pairs of images and captions, we maximize compatibility of the attention-weighted regions and the words in the corresponding caption, compared to non-corresponding pairs of images and captions.

Neural Parameter Allocation Search

bryanplummer/ssn ICLR 2022

We introduce Neural Parameter Allocation Search (NPAS), a novel task where the goal is to train a neural network given an arbitrary, fixed parameter budget.

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

jhuang81/weak-sup-visual-grounding CVPR 2021

Our core innovation is the learning of a region-phrase score function, based on which an image-sentence score function is further constructed.

MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding

qinzzz/Multimodal-Alignment-Framework EMNLP 2020

Phrase localization is a task that studies the mapping from textual phrases to regions of an image.

Learning to ground medical text in a 3D human atlas

gorjanradevski/text2atlas CONLL 2020

In this paper, we develop a method for grounding medical text into a physically meaningful and interpretable space corresponding to a human atlas.

MDETR - Modulated Detection for End-to-End Multi-Modal Understanding

ashkamath/mdetr ICCV 2021

We also investigate the utility of our model as an object detector on a given label set when fine-tuned in a few-shot setting.

Detector-Free Weakly Supervised Grounding by Separation

aarbelle/GroundingBySeparation ICCV 2021

In this work, we focus on the task of Detector-Free WSG (DF-WSG) to solve WSG without relying on a pre-trained detector.

Making the Most of Text Semantics to Improve Biomedical Vision--Language Processing

microsoft/hi-ml 21 Apr 2022

We release a new dataset with locally-aligned phrase grounding annotations by radiologists to facilitate the study of complex semantic modelling in biomedical vision--language processing.