Representation Learning

VideoBERT adapts the powerful BERT model to learn a joint visual-linguistic representation for video. It is used in numerous tasks, including action classification and video captioning.

Source: VideoBERT: A Joint Model for Video and Language Representation Learning


Paper Code Results Date Stars


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign