Hierarchical Temporal Convolution Network:Towards Privacy-Centric Activity Recognition

In response to the healthcare issues associated with the ageing population, various ambient assisted living technologies are being developed. To mitigate privacy concerns related to cloud-based data processing, recent methods have shifted towards using edge devices for local data processing. Despite their perceived benefits, the limited computational resources of these edge devices present a significant challenge for real-time performance, which is often an imperative requirement. However, recent computer vision-based methods for recognising activities of daily living among the elderly face increased computational complexity when capturing the multi-scale temporal context essential for accurate activity recognition. In this context, we propose HT-ConvNet (Hierarchical Temporal Convolution Network) to capture multi-scale temporal information without increasing computational complexity. HT-ConvNet employs exponentially increasing receptive fields across successive convolution layers to enable efficient hierarchical extraction of temporal features. Furthermore, HT-ConvNet provides an adaptive weighting mechanism to emphasise the most important features. Experimental results show that the multi-scale temporal feature extraction and the feature-weighted fusion mechanisms outperform existing methods in enhancing accuracy without increasing model complexity. The code is publicly available in: https://github.com/Gbouna/HT-ConvNet.

PDF
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Skeleton Based Action Recognition JHMDB (2D poses only) HT-ConvNet Accuracy 86.1 # 1
Average accuracy of 3 splits 86.1 # 1
No. parameters 1.75 # 1
Skeleton Based Action Recognition SHREC 2017 track on 3D Hand Gesture Recognition HT-ConvNet 28 gestures accuracy 94.3 # 2
14 gestures accuracy 97.1 # 1
No. Parameters 1.75 # 1

Methods