Phrase Grounding

36 papers with code • 5 benchmarks • 6 datasets

Given an image and a corresponding caption, the Phrase Grounding task aims to ground each entity mentioned by a noun phrase in the caption to a region in the image.

Source: Phrase Grounding by Soft-Label Chain Conditional Random Field

Benchmarks

Add a Result

These leaderboards are used to track progress in Phrase Grounding

Dataset	Best Model	Compare
Flickr30k Entities Test	GLIPv2	See all
Visual Genome	GbS VG	See all
Flickr30k	GBS Ensemble + 12-in-1	See all
ReferIt	VG_BiLSTM_VGG	See all
Flickr30k Entities Dev	Fiber-B	See all

Libraries

Use these libraries to find Phrase Grounding models and implementations

microsoft/GLIP

2 papers

1,957

Datasets

Subtasks

Grounded Open Vocabulary Acquisition

Latest papers with no code

Most implemented Social Latest No code

Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models

no code yet • 19 Apr 2024

In this work, we use a publicly available Foundation Model, namely the Latent Diffusion Model, to solve this challenging task.

Paper
Add Code

MedRG: Medical Report Grounding with Multi-modal Large Language Model

no code yet • 10 Apr 2024

Medical Report Grounding is pivotal in identifying the most relevant regions in medical images based on a given phrase query, a critical aspect in medical image analysis and radiological diagnosis.

Paper
Add Code

Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training

no code yet • 4 Mar 2024

Highlighting particularly relevant regions of an image can improve the performance of vision-language models (VLMs) on various vision-language (VL) tasks by guiding the model to attend more closely to these regions of interest.

Paper
Add Code

How to Understand "Support"? An Implicit-enhanced Causal Inference Approach for Weakly-supervised Phrase Grounding

no code yet • 29 Feb 2024

Weakly-supervised Phrase Grounding (WPG) is an emerging task of inferring the fine-grained phrase-region matching, while merely leveraging the coarse-grained sentence-image pairs for training.

Paper
Add Code

Phrase Grounding-based Style Transfer for Single-Domain Generalized Object Detection

no code yet • 2 Feb 2024

Single-domain generalized object detection aims to enhance a model's generalizability to multiple unseen target domains using only data from a single source domain during training.

Paper
Add Code

Enhancing the vision-language foundation model with key semantic knowledge-emphasized report refinement

no code yet • 21 Jan 2024

Particularly, raw radiology reports are refined to highlight the key information according to a constructed clinical dictionary and two model-optimized knowledge-enhancement metrics.

Paper
Add Code

Catalog Phrase Grounding (CPG): Grounding of Product Textual Attributes in Product Images for e-commerce Vision-Language Applications

no code yet • 30 Aug 2023

We present Catalog Phrase Grounding (CPG), a model that can associate product textual data (title, brands) into corresponding regions of product images (isolated product region, brand logo region) for e-commerce vision-language applications.

Paper
Add Code

Read, look and detect: Bounding box annotation from image-caption pairs

no code yet • 9 Jun 2023

Various methods have been proposed to detect objects while reducing the cost of data annotation.

Paper
Add Code

ELVIS: Empowering Locality of Vision Language Pre-training with Intra-modal Similarity

no code yet • 11 Apr 2023

Deep learning has shown great potential in assisting radiologists in reading chest X-ray (CXR) images, but its need for expensive annotations for improving performance prevents widespread clinical application.

Paper
Add Code

CAVL: Learning Contrastive and Adaptive Representations of Vision and Language

no code yet • 10 Apr 2023

Visual and linguistic pre-training aims to learn vision and language representations together, which can be transferred to visual-linguistic downstream tasks.

Paper
Add Code

Phrase Grounding

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result