SimVLM is a minimalist pretraining framework to reduce training complexity by exploiting large-scale weak supervision. It is trained end-to-end with a single prefix language modeling (PrefixLM) objective. PrefixLM enables bidirectional attention within the prefix sequence, and thus it is applicable for both decoder-only and encoder-decoder sequence-to-sequence language models.
Source: SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Image Captioning | 2 | 11.11% |
Visual Question Answering | 2 | 11.11% |
Visual Question Answering (VQA) | 2 | 11.11% |
Language Modelling | 2 | 11.11% |
Action Classification | 1 | 5.56% |
Image Classification | 1 | 5.56% |
Retrieval | 1 | 5.56% |
Video Retrieval | 1 | 5.56% |
Visual Entailment | 1 | 5.56% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |