Generalized Referring Expression Segmentation

6 papers with code • 1 benchmarks • 1 datasets

Generalized Referring Expression Segmentation (GRES), introduced by Liu et al in CVPR 2023, allows expressions indicating any number of target objects. GRES takes an image and a referring expression as input, and requires mask prediction of the target object(s).

Benchmarks

Add a Result

These leaderboards are used to track progress in Generalized Referring Expression Segmentation

Trend	Dataset	Best Model	Paper	Code	Compare
	gRefCOCO	GROUNDHOG			See all

Datasets

gRefCOCO

Most implemented papers

Most implemented Social Latest No code

GRES: Generalized Referring Expression Segmentation

henghuiding/ReLA • • CVPR 2023

Existing classic RES datasets and methods commonly support single-target expressions only, i. e., one expression refers to one target object.

Paper
Code

MAttNet: Modular Attention Network for Referring Expression Comprehension

lichengunc/MAttNet • • CVPR 2018

In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression.

Paper
Code

Vision-Language Transformer and Query Generation for Referring Segmentation

henghuiding/Vision-Language-Transformer • • ICCV 2021

We introduce transformer and multi-head attention to build a network with an encoder-decoder attention mechanism architecture that "queries" the given image with the language expression.

Paper
Code

CRIS: CLIP-Driven Referring Image Segmentation

DerrickWang005/CRIS.pytorch • • CVPR 2022

In addition, we present text-to-pixel contrastive learning to explicitly enforce the text feature similar to the related pixel-level features and dissimilar to the irrelevances.

Paper
Code

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation

yz93/lavt-ris • • CVPR 2022

Referring image segmentation is a fundamental vision-language task that aims to segment out an object referred to by a natural language expression from an image.

Paper
Code

PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model

zamling/psalm • • 21 Mar 2024

PSALM is a powerful extension of the Large Multi-modal Model (LMM) to address the segmentation task challenges.

Paper
Code

Generalized Referring Expression Segmentation

Benchmarks Add a Result

Datasets

Most implemented papers

GRES: Generalized Referring Expression Segmentation

MAttNet: Modular Attention Network for Referring Expression Comprehension

Vision-Language Transformer and Query Generation for Referring Segmentation

CRIS: CLIP-Driven Referring Image Segmentation

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation

PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model

Content

Benchmarks

Add a Result