Generalized Referring Expression Segmentation

9 papers with code • 1 benchmarks • 1 datasets

Generalized Referring Expression Segmentation (GRES), introduced by Liu et al in CVPR 2023, allows expressions indicating any number of target objects. GRES takes an image and a referring expression as input, and requires mask prediction of the target object(s).

Datasets


Most implemented papers

GRES: Generalized Referring Expression Segmentation

henghuiding/ReLA CVPR 2023

Existing classic RES datasets and methods commonly support single-target expressions only, i. e., one expression refers to one target object.

CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation

robertluo1/cohd 24 May 2024

By decoupling the intricate referring semantics into different granularity with a visual-linguistic hierarchy, and dynamic aggregating it with intra- and inter-selection, CoHD boosts multi-granularity comprehension with the reciprocal benefit of the hierarchical nature.

MAttNet: Modular Attention Network for Referring Expression Comprehension

lichengunc/MAttNet CVPR 2018

In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression.

Vision-Language Transformer and Query Generation for Referring Segmentation

henghuiding/Vision-Language-Transformer ICCV 2021

We introduce transformer and multi-head attention to build a network with an encoder-decoder attention mechanism architecture that "queries" the given image with the language expression.

CRIS: CLIP-Driven Referring Image Segmentation

DerrickWang005/CRIS.pytorch CVPR 2022

In addition, we present text-to-pixel contrastive learning to explicitly enforce the text feature similar to the related pixel-level features and dissimilar to the irrelevances.

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation

yz93/lavt-ris CVPR 2022

Referring image segmentation is a fundamental vision-language task that aims to segment out an object referred to by a natural language expression from an image.

GSVA: Generalized Segmentation via Multimodal Large Language Models

leaplabthu/gsva CVPR 2024

Generalized Referring Expression Segmentation (GRES) extends the scope of classic RES to refer to multiple objects in one expression or identify the empty targets absent in the image.

PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model

zamling/psalm 21 Mar 2024

PSALM is a powerful extension of the Large Multi-modal Model (LMM) to address the segmentation task challenges.

Bring Adaptive Binding Prototypes to Generalized Referring Expression Segmentation

buptlwz/mabp 24 May 2024

Referring Expression Segmentation (RES) has attracted rising attention, aiming to identify and segment objects based on natural language expressions.