Generalized Referring Expression Comprehension

5 papers with code • 1 benchmarks • 1 datasets

Generalized Referring Expression Comprehension (GREC) allows expressions indicating any number of target objects. GREC takes an image and a referring expression as input, and requires bounding box(es) prediction of the target object(s).

Datasets


Most implemented papers

MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding

ashkamath/mdetr 26 Apr 2021

We also investigate the utility of our model as an object detector on a given label set when fine-tuned in a few-shot setting.

Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation

luogen1996/MCN CVPR 2020

In addition, we address a key challenge in this multi-task setup, i. e., the prediction conflict, with two innovative designs namely, Consistency Energy Maximization (CEM) and Adaptive Soft Non-Located Suppression (ASNLS).

Vision-Language Transformer and Query Generation for Referring Segmentation

henghuiding/Vision-Language-Transformer ICCV 2021

We introduce transformer and multi-head attention to build a network with an encoder-decoder attention mechanism architecture that "queries" the given image with the language expression.

Universal Instance Perception as Object Discovery and Retrieval

MasterBin-IIAU/UNINEXT CVPR 2023

All instance perception tasks aim at finding certain objects specified by some queries such as category names, language expressions, and target annotations, but this complete field has been split into multiple independent subtasks.

GREC: Generalized Referring Expression Comprehension

henghuiding/grefcoco 30 Aug 2023

This dataset encompasses a range of expressions: those referring to multiple targets, expressions with no specific target, and the single-target expressions.