|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
Second, frame-based models perform quite well on action recognition; is pre-training for good image features sufficient or is pre-training for spatio-temporal features valuable for optimal transfer learning?
Ranked #1 on Egocentric Activity Recognition on EPIC-KITCHENS-55 (Actions Top-1 (S2) metric)
To understand the world, we humans constantly need to relate the present to the past, and put events in context.
Ranked #3 on Egocentric Activity Recognition on EPIC-KITCHENS-55
Our dataset and experiments can be of interest to communities of 3D hand pose estimation, 6D object pose, and robotics as well as action recognition.
We focus on multi-modal fusion for egocentric action recognition, and propose a novel architecture for multi-modal temporal-binding, i. e. the combination of modalities within a range of temporal offsets.
Ranked #1 on Egocentric Activity Recognition on EPIC-KITCHENS-55
Our method is ranked first in the public leaderboard of the EPIC-Kitchens egocentric action anticipation challenge 2019.
Ranked #2 on Egocentric Activity Recognition on EPIC-KITCHENS-55
Our model is built on the observation that egocentric activities are highly characterized by the objects and their locations in the video.
Ranked #4 on Egocentric Activity Recognition on EGTEA
The per-frame (per-segment) extracted features are considered as a set of time series, and inter and intra-time series relations are employed to represent the video descriptors.