VideoBERT adapts the powerful BERT model to learn a joint visual-linguistic representation for video. It is used in numerous tasks, including action classification and video captioning.
Source: VideoBERT: A Joint Model for Video and Language Representation LearningPAPER | DATE |
---|---|
Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training
• • • • • • • • |
2021-04-19 |
Visual Grounding Strategies for Text-Only Natural Language Processing
|
2021-03-25 |
VideoBERT: A Joint Model for Video and Language Representation Learning
• • • • |
2019-04-03 |
TASK | PAPERS | SHARE |
---|---|---|
Language Modelling | 3 | 27.27% |
Image Retrieval | 1 | 9.09% |
Question Answering | 1 | 9.09% |
Visual Question Answering | 1 | 9.09% |
Action Classification | 1 | 9.09% |
Quantization | 1 | 9.09% |
Self-Supervised Learning | 1 | 9.09% |
Speech Recognition | 1 | 9.09% |
Video Captioning | 1 | 9.09% |
COMPONENT | TYPE |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |