X3D: Expanding Architectures for Efficient Video Recognition

This paper presents X3D, a family of efficient video networks that progressively expand a tiny 2D image classification architecture along multiple network axes, in space, time, width and depth. Inspired by feature selection methods in machine learning, a simple stepwise network expansion approach is employed that expands a single axis in each step, such that good accuracy to complexity trade-off is achieved... (read more)

PDF Abstract CVPR 2020 PDF CVPR 2020 Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK BENCHMARK
Action Classification Kinetics-400 X3D-XXL Vid acc@1 80.4 # 16
Vid acc@5 94.6 # 11
Action Classification Kinetics-400 X3D-XL Vid acc@1 79.1 # 27
Vid acc@5 93.9 # 20
Action Classification Kinetics-400 X3D-L Vid acc@1 77.5 # 43
Vid acc@5 92.9 # 35
Action Classification Kinetics-400 X3D-M Vid acc@1 76 # 53
Vid acc@5 92.3 # 38

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet