Simple Visual Language Model

Introduced by Wang et al. in SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

SimVLM is a minimalist pretraining framework to reduce training complexity by exploiting large-scale weak supervision. It is trained end-to-end with a single prefix language modeling (PrefixLM) objective. PrefixLM enables bidirectional attention within the prefix sequence, and thus it is applicable for both decoder-only and encoder-decoder sequence-to-sequence language models.

Source: SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Image Captioning	2	11.11%
Visual Question Answering	2	11.11%
Visual Question Answering (VQA)	2	11.11%
Language Modelling	2	11.11%
Action Classification	1	5.56%
Image Classification	1	5.56%
Retrieval	1	5.56%
Video Retrieval	1	5.56%
Visual Entailment	1	5.56%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Vision and Language Pre-Trained Models