Referring Expression Segmentation

68 papers with code • 25 benchmarks • 11 datasets

The task aims at labeling the pixels of an image or video that represent an object instance referred by a linguistic expression. In particular, the referring expression (RE) must allow the identification of an individual object in a discourse or scene (the referent). REs unambiguously identify the target instance.

Benchmarks

Add a Result

These leaderboards are used to track progress in Referring Expression Segmentation

Dataset	Best Model	Compare
A2D Sentences	SgMg (Video-Swin-B)	See all
RefCoCo val	HIPIE	See all
Refer-YouTube-VOS (2021 public validation)	GLEE-Pro	See all
RefCOCO testA	UNINEXT-H	See all
RefCOCO+ val	HIPIE	See all
J-HMDB	SgMg (Video-Swin-B)	See all
RefCOCO testB	UNINEXT-H	See all
RefCOCO+ testA	UniLSeg-100	See all
RefCOCO+ test B	UniLSeg-100	See all
DAVIS 2017 (val)	UNINEXT-H	See all
RefCoCo val	UNINEXT-H	See all
RefCOCOg-val	UniLSeg-100	See all
RefCOCOg-test	UniLSeg-100	See all
PhraseCut	GLIPv2	See all
ReferIt	PolyFormer-L	See all
Refer-YouTube-VOS	RefVOS-Human REs	See all
RefCOCO	GLEE-Pro	See all
RefCOCO testA	EVP	See all
RefCOCO testB	EVP	See all
CLEVR-Ref+	IEP-Ref (700K prog.)	See all
A2Dre test	RefVos	See all
Referring Expressions for DAVIS 2016 & 2017	MUTR	See all
G-Ref val	MaIL	See all
G-Ref test A	MaIL	See all
G-Ref test B	MaIL	See all

Show all 25 benchmarks

Collapse benchmarks

Datasets

Subtasks

Generalized Referring Expression Segmentation

Latest papers

Most implemented Social Latest No code

Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation

heshuting555/dshmp • 4 Apr 2024

In fact, static cues can sometimes interfere with temporal perception by overshadowing motion cues.

04 Apr 2024

Paper
Code

PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model

zamling/psalm • • 21 Mar 2024

PSALM is a powerful extension of the Large Multi-modal Model (LMM) to address the segmentation task challenges.

117

21 Mar 2024

Paper
Code

UniVS: Unified and Universal Video Segmentation with Prompts as Queries

minghanli/univs • • 28 Feb 2024

Despite the recent advances in unified image segmentation (IS), developing a unified video segmentation (VS) model remains a challenge.

119

28 Feb 2024

Paper
Code

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

foundationvision/uniref • • 25 Dec 2023

We evaluate our unified models on various benchmarks.

219

25 Dec 2023

Paper
Code

General Object Foundation Model for Images and Videos at Scale

FoundationVision/GLEE • • 14 Dec 2023

We present GLEE in this work, an object-level foundation model for locating and identifying objects in images and videos.

882

14 Dec 2023

Paper
Code

EVP: Enhanced Visual Perception using Inverse Multi-Attentive Feature Refinement and Regularized Image-Text Alignment

lavreniuk/evp • • 13 Dec 2023

Second, we propose a novel image-text alignment module for improved feature extraction of the Stable Diffusion backbone.

13 Dec 2023

Paper
Code

Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression Segmentation

rubics-xuan/mres • • 13 Dec 2023

To foster future research into fine-grained visual grounding, our benchmark RefCOCOm, the MRES-32M dataset and model UniRES will be publicly available at https://github. com/Rubics-Xuan/MRES

13 Dec 2023

Paper
Code

Universal Segmentation at Arbitrary Granularity with Language Instruction

workforai/UniLSeg • • 4 Dec 2023

This paper aims to achieve universal segmentation of arbitrary semantic level.

04 Dec 2023

Paper
Code

InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation

rongyaofang/instructseq • 30 Nov 2023

In this work, we introduce InstructSeq, an instruction-conditioned multi-modal modeling framework that unifies diverse vision tasks through flexible natural language control and handling of both visual and textual data.

30 Nov 2023

Paper
Code

NExT-Chat: An LMM for Chat, Detection and Segmentation

next-chatv/next-chat • • 8 Nov 2023

The development of large language models (LLMs) has greatly advanced the field of multimodal understanding, leading to the emergence of large multimodal models (LMMs).

160

08 Nov 2023

Paper
Code

Referring Expression Segmentation

Benchmarks Add a Result

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result