Video feedback provides a wealth of information about surgical procedures and is the main sensory cue for surgeons.
In this paper, we present the dataset, its annotations, as well as baseline results from several recent person detection and 2D/3D pose estimation methods.
As 2D poses are collected at test time using a SV pose detector, which might generate inaccurate detections, we model its characteristics and incorporate this information during training.
In this paper, we propose an approach for multi-view 3D human pose estimation from RGB-D images and demonstrate the benefits of using the additional depth channel for pose refinement beyond its use for the generation of improved features.
Proposed methods for the operating room (OR) rely either on foreground estimation using a multi-camera system, which is a challenge in real ORs due to color similarities and frequent illumination changes, or on wearable sensors or markers, which are invasive and therefore difficult to introduce in the room.