In this work, we perform a large-scale robustness analysis of these existing models for video action recognition.
Given a video and a set of action classes, our method predicts a set of confidence scores for each class independently.
This technical report presents our approach "Knights" to solve the action recognition task on a small subset of Kinetics-400 i. e. Kinetics400ViPriors without using any extra-data.
We present this as a benchmark dataset in noisy learning for video understanding.
We propose to improve action localization performance by modeling these action dependencies in a novel attention-based Multi-Label Action Dependency (MLAD)layer.
Ranked #1 on Action Detection on Multi-THUMOS
We demonstrate the effectiveness of the proposed model on two different large-scale and publicly available datasets, YFCC100M and NUS-WIDE.