3D ResNet-RS is an architecture and scaling strategy for 3D ResNets for video recognition. The key additions are:
3D ResNet-D stem: The ResNet-D stem is adapted to 3D inputs by using three consecutive 3D convolutional layers. The first convolutional layer employs a temporal kernel size of 5 while the remaining two convolutional layers employ a temporal kernel size of 1.
3D Squeeze-and-Excitation: Squeeze-and-Excite is adapted to spatio-temporal inputs by using a 3D global average pooling operation for the squeeze operation. A SE ratio of 0.25 is applied in each 3D bottleneck block for all experiments.
Self-gating: A self-gating module is used in each 3D bottleneck block after the SE module.
Paper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Action Classification | 1 | 20.00% |
Video Recognition | 1 | 20.00% |
Classification | 1 | 20.00% |
General Classification | 1 | 20.00% |
Image Classification | 1 | 20.00% |
Component | Type |
|
---|---|---|
![]() |
Convolutions | |
![]() |
Convolutional Neural Networks | |
![]() |
Image Model Blocks |