In this paper, we propose a unified network for TAD, termed Faster-TAD, by re-purposing a Faster-RCNN like architecture.
SEAL consists of two kinds of annotations, SEAL Tubes and SEAL Clips.
In addition, we present text-to-pixel contrastive learning to explicitly enforce the text feature similar to the related pixel-level features and dissimilar to the irrelevances.
Ranked #1 on Referring Expression Segmentation on RefCoCo val
Racial bias is an important issue in biometric, but has not been thoroughly studied in deep face recognition.