139 papers with code • 1 benchmarks • 20 datasets
Human Activity Recognition is the problem of identifying events performed by humans given a video input. It is formulated as a binary (or multiclass) classification problem of outputting activity class labels. Activity Recognition is an important problem with many societal applications including smart surveillance, video search/retrieval, intelligent robots, and other monitoring systems.
We create a family of powerful video models which are able to: (i) learn interactions between semantic object information and raw appearance and motion features, and (ii) deploy attention in order to better learn the importance of features at each convolutional block of the network.
Ranked #3 on Action Classification on Toyota Smarthome dataset
This paper proposes Adaptive RNNs (AdaRNN) to tackle the TCS problem by building an adaptive model that generalizes well on the unseen test data.
In this paper, we propose substructure-level matching for domain adaptation (SSDA) to better utilize the locality information of activity data for accurate and efficient knowledge transfer.
Second, frame-based models perform quite well on action recognition; is pre-training for good image features sufficient or is pre-training for spatio-temporal features valuable for optimal transfer learning?
Ranked #2 on Egocentric Activity Recognition on EPIC-KITCHENS-55 (Actions Top-1 (S2) metric)
The ubiquitous availability of wearable sensors is responsible for driving the Internet-of-Things but is also making an impact on sport sciences and precision medicine.
Human activity recognition (HAR) has become a popular topic in research because of its wide application.
In this work, we aim to further advance the state of the art by establishing "PoseTrack", a new large-scale benchmark for video-based human pose estimation and articulated tracking, and bringing together the community of researchers working on visual human analysis.
Ranked #3 on Multi-Person Pose Estimation on PoseTrack2017
In this context, deep learning has emerged in recent years as one of the most effective methods for tackling the supervised classification task, particularly in the field of computer vision.
Ranked #1 on Time Series Classification on FordA (mean average precision metric)
Discriminating Spatial and Temporal Relevance in Deep Taylor Decompositions for Explainable Activity Recognition
However, by exploiting a simple technique that removes motion information, we show that it is not the case that this technique is effective as-is for representing relevance in non-image tasks.