Search Results for author: Wei-Hong Chuang

Found 1 papers, 1 papers with code

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

2 code implementations • NeurIPS 2021 • Hassan Akbari, Liangzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, Boqing Gong

We train VATT end-to-end from scratch using multimodal contrastive losses and evaluate its performance by the downstream tasks of video action recognition, audio event classification, image classification, and text-to-video retrieval.

Ranked #3 on Zero-Shot Video Retrieval on YouCook2 (text-to-video Mean Rank metric)

Action Classification Action Recognition In Videos +9

32,816

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.