|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
We first introduce the vanilla video transformer and show that transformer module is able to perform spatio-temporal modeling from raw pixels, but with heavy memory usage.
Ranked #1 on Action Classification on Kinetics-700 (Vid acc@1 metric)
Besides efficiency, Zeus is capable of answering the query at a user-specified target accuracy using a query optimizer that trains the agent based on an accuracy-aware reward function.
Learning robust representations to discriminate cell phenotypes based on microscopy images is important for drug discovery.
In this work, we propose a Motion Band-pass Module (MBPM) for separating the fine-grained information from coarse information in raw video data.
Ranked #1 on Action Recognition on Something-Something V1
This results in a task discrepancy problem for the video encoder -- trained for action classification, but used for TAL.
Our proposed temporal contrastive learning framework achieves significant improvement over the state-of-the-art results in various downstream video understanding tasks such as action recognition, limited-label action classification, and nearest-neighbor video retrieval on multiple video datasets and backbones.
Ranked #1 on Self-supervised Video Retrieval on UCF101
ACTION CLASSIFICATION ACTION CLASSIFICATION CLASSIFICATION REPRESENTATION LEARNING SELF-SUPERVISED ACTION RECOGNITION SELF-SUPERVISED LEARNING SELF-SUPERVISED VIDEO RETRIEVAL VIDEO RETRIEVAL VIDEO UNDERSTANDING
The proposed approach intends to show the usefulness of every layer termed as global-local attention in 3D CNN via visual attribution, weakly-supervised action localization, and action recognition.