End-to-end Video-level Representation Learning for Action Recognition

From the frame/clip-level feature learning to the video-level representation building, deep learning methods in action recognition have developed rapidly in recent years. However, current methods suffer from the confusion caused by partial observation training, or without end-to-end learning, or restricted to single temporal scale modeling and so on... (read more)

Results in Papers With Code
(↓ scroll down to see all results)