Referring Expression Segmentation

68 papers with code • 25 benchmarks • 11 datasets

The task aims at labeling the pixels of an image or video that represent an object instance referred by a linguistic expression. In particular, the referring expression (RE) must allow the identification of an individual object in a discourse or scene (the referent). REs unambiguously identify the target instance.

Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation

heshuting555/dshmp 4 Apr 2024

In fact, static cues can sometimes interfere with temporal perception by overshadowing motion cues.

17
04 Apr 2024

PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model

zamling/psalm 21 Mar 2024

PSALM is a powerful extension of the Large Multi-modal Model (LMM) to address the segmentation task challenges.

117
21 Mar 2024

UniVS: Unified and Universal Video Segmentation with Prompts as Queries

minghanli/univs 28 Feb 2024

Despite the recent advances in unified image segmentation (IS), developing a unified video segmentation (VS) model remains a challenge.

119
28 Feb 2024

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

foundationvision/uniref 25 Dec 2023

We evaluate our unified models on various benchmarks.

219
25 Dec 2023

General Object Foundation Model for Images and Videos at Scale

FoundationVision/GLEE 14 Dec 2023

We present GLEE in this work, an object-level foundation model for locating and identifying objects in images and videos.

882
14 Dec 2023

EVP: Enhanced Visual Perception using Inverse Multi-Attentive Feature Refinement and Regularized Image-Text Alignment

lavreniuk/evp 13 Dec 2023

Second, we propose a novel image-text alignment module for improved feature extraction of the Stable Diffusion backbone.

48
13 Dec 2023

Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression Segmentation

rubics-xuan/mres 13 Dec 2023

To foster future research into fine-grained visual grounding, our benchmark RefCOCOm, the MRES-32M dataset and model UniRES will be publicly available at https://github. com/Rubics-Xuan/MRES

48
13 Dec 2023

Universal Segmentation at Arbitrary Granularity with Language Instruction

workforai/UniLSeg 4 Dec 2023

This paper aims to achieve universal segmentation of arbitrary semantic level.

59
04 Dec 2023

InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation

rongyaofang/instructseq 30 Nov 2023

In this work, we introduce InstructSeq, an instruction-conditioned multi-modal modeling framework that unifies diverse vision tasks through flexible natural language control and handling of both visual and textual data.

8
30 Nov 2023

NExT-Chat: An LMM for Chat, Detection and Segmentation

next-chatv/next-chat 8 Nov 2023

The development of large language models (LLMs) has greatly advanced the field of multimodal understanding, leading to the emergence of large multimodal models (LMMs).

160
08 Nov 2023