Vision and Language Pre-Trained Models

Simple Visual Language Model

Introduced by Wang et al. in SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

SimVLM is a minimalist pretraining framework to reduce training complexity by exploiting large-scale weak supervision. It is trained end-to-end with a single prefix language modeling (PrefixLM) objective. PrefixLM enables bidirectional attention within the prefix sequence, and thus it is applicable for both decoder-only and encoder-decoder sequence-to-sequence language models.

Source: SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

Papers


Paper Code Results Date Stars

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories