Real-Time Multi-View 3D Human Pose Estimation using Semantic Feedback to Smart Edge Sensors

28 Jun 2021  ·  Simon Bultmann, Sven Behnke ·

We present a novel method for estimation of 3D human poses from a multi-camera setup, employing distributed smart edge sensors coupled with a backend through a semantic feedback loop. 2D joint detection for each camera view is performed locally on a dedicated embedded inference processor. Only the semantic skeleton representation is transmitted over the network and raw images remain on the sensor board. 3D poses are recovered from 2D joints on a central backend, based on triangulation and a body model which incorporates prior knowledge of the human skeleton. A feedback channel from backend to individual sensors is implemented on a semantic level. The allocentric 3D pose is backprojected into the sensor views where it is fused with 2D joint detections. The local semantic model on each sensor can thus be improved by incorporating global context information. The whole pipeline is capable of real-time operation. We evaluate our method on three public datasets, where we achieve state-of-the-art results and show the benefits of our feedback architecture, as well as in our own setup for multi-person experiments. Using the feedback signal improves the 2D joint detections and in turn the estimated 3D poses.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
3D Multi-Person Pose Estimation Campus SmartEdgeSensor PCP3D 97 # 2
3D Human Pose Estimation Human3.6M SmartEdgeSensor (H36M+COCO) Average MPJPE (mm) 23.5 # 16
Using 2D ground-truth joints No # 2
Multi-View or Monocular Multi-View # 1
3D Human Pose Estimation Human3.6M SmartEdgeSensor Average MPJPE (mm) 29.8 # 30
Using 2D ground-truth joints No # 2
Multi-View or Monocular Multi-View # 1
3D Multi-Person Pose Estimation Shelf SmartEdgeSensor PCP3D 97.4 # 10

Methods


No methods listed for this paper. Add relevant methods here