|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
Second, frame-based models perform quite well on action recognition; is pre-training for good image features sufficient or is pre-training for spatio-temporal features valuable for optimal transfer learning?
To understand the world, we humans constantly need to relate the present to the past, and put events in context.
#2 best model for Egocentric Activity Recognition on EPIC-Kitchens
Our dataset and experiments can be of interest to communities of 3D hand pose estimation, 6D object pose, and robotics as well as action recognition.
We focus on multi-modal fusion for egocentric action recognition, and propose a novel architecture for multi-modal temporal-binding, i. e. the combination of modalities within a range of temporal offsets.
#3 best model for Egocentric Activity Recognition on EPIC-Kitchens
Our model is built on the observation that egocentric activities are highly characterized by the objects and their locations in the video.
#3 best model for Egocentric Activity Recognition on EGTEA
The per-frame (per-segment) extracted features are considered as a set of time series, and inter and intra-time series relations are employed to represent the video descriptors.