About

Benchmarks

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Datasets

Greatest papers with code

UNITER: UNiversal Image-TExt Representation Learning

ECCV 2020 ChenRocks/UNITER

Different from previous work that applies joint random masking to both modalities, we use conditional masking on pre-training tasks (i. e., masked language/region modeling is conditioned on full observation of image/text).

LANGUAGE MODELLING QUESTION ANSWERING REFERRING EXPRESSION COMPREHENSION REPRESENTATION LEARNING TEXT MATCHING VISUAL COMMONSENSE REASONING VISUAL ENTAILMENT VISUAL QUESTION ANSWERING

Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation

CVPR 2020 luogen1996/MCN

In addition, we address a key challenge in this multi-task setup, i. e., the prediction conflict, with two innovative designs namely, Consistency Energy Maximization (CEM) and Adaptive Soft Non-Located Suppression (ASNLS).

REFERRING EXPRESSION COMPREHENSION REFERRING EXPRESSION SEGMENTATION

A Fast and Accurate One-Stage Approach to Visual Grounding

ICCV 2019 zyang-ur/onestage_grounding

We propose a simple, fast, and accurate one-stage approach to visual grounding, inspired by the following insight.

REFERRING EXPRESSION COMPREHENSION VISUAL GROUNDING

Unifying Vision-and-Language Tasks via Text Generation

4 Feb 2021j-min/VL-T5

On 7 popular vision-and-language benchmarks, including visual question answering, referring expression comprehension, visual commonsense reasoning, most of which have been previously modeled as discriminative tasks, our generative approach (with a single unified architecture) reaches comparable performance to recent task-specific state-of-the-art vision-and-language models.

CONDITIONAL TEXT GENERATION IMAGE CAPTIONING LANGUAGE MODELLING MULTI-TASK LEARNING QUESTION ANSWERING REFERRING EXPRESSION COMPREHENSION VISUAL COMMONSENSE REASONING VISUAL QUESTION ANSWERING

A Joint Speaker-Listener-Reinforcer Model for Referring Expressions

CVPR 2017 lichengunc/speaker_listener_reinforcer

The speaker generates referring expressions, the listener comprehends referring expressions, and the reinforcer introduces a reward function to guide sampling of more discriminative expressions.

REFERRING EXPRESSION COMPREHENSION

Understanding Synonymous Referring Expressions via Contrastive Features

20 Apr 2021wenz116/RefContrast

While prior work usually treats each sentence and attends it to an object separately, we focus on learning a referring expression comprehension model that considers the property in synonymous sentences.

REFERRING EXPRESSION COMPREHENSION TRANSFER LEARNING