We propose NorCal, Normalized Calibration for long-tailed object detection and instance segmentation, a simple and straightforward recipe that reweighs the predicted scores of each class by its training sample size.
no code implementations • 26 Apr 2021 • Yu-Chuan Su, Soravit Changpinyo, Xiangning Chen, Sathish Thoppay, Cho-Jui Hsieh, Lior Shapira, Radu Soricut, Hartwig Adam, Matthew Brown, Ming-Hsuan Yang, Boqing Gong
To enable progress on this task, we create a new dataset consisting of 220k human-annotated 2. 5D relationships among 512K objects from 11K images.
The availability of large-scale image captioning and visual question answering datasets has contributed significantly to recent successes in vision-and-language pre-training.
Many objects do not appear frequently enough in complex scenes (e. g., certain handbags in living rooms) for training an accurate object detector, but are often found frequently by themselves (e. g., in product images).
Most existing image retrieval systems use text queries as a way for the user to express what they are looking for.
Training large-scale image captioning (IC) models demands access to a rich and diverse set of training examples, gathered from the wild, often from noisy alt-text data.
We ask annotators to describe an image with their voice while simultaneously hovering their mouse over the region they are describing.
Ranked #1 on Image Captioning on Localized Narratives
Object detection plays an important role in current solutions to vision and language tasks like image captioning and visual question answering.
Ranked #4 on Visual Question Answering on VizWiz 2018
Leveraging class semantic descriptions and examples of known objects, zero-shot learning makes it possible to train a recognition model for an object class whose examples are not available.
Zero-shot learning (ZSL) methods have been studied in the unrealistic setting where test data are assumed to come from unseen classes only.
Given semantic descriptions of object classes, zero-shot learning aims to accurately recognize objects of the unseen classes, from which no examples are available at the training stage, by associating them to the seen classes, from which labeled examples are provided.
Ranked #1 on Few-Shot Image Classification on AWA - 0-Shot
Moreover, we show how SCA can be instrumental in exploratory analysis of data, where we gain insights about the data by examining patterns hidden in its latent components' local similarity values.