In this work, we investigate the application of contrastive learning to the domain of medical image analysis.
Surgical robotics holds much promise for improving patient safety and clinician experience in the Operating Room (OR).
We highlight that the primary limitation in the current surgical VQA systems is the lack of scene knowledge to answer complex queries.
Methods: In this work, we employ self-supervised learning to flexibly leverage diverse surgical datasets, thereby learning taskagnostic representations that can be used for various surgical downstream tasks.
SurgVLP constructs a new contrastive learning objective to align video clip embeddings with the corresponding multiple text embeddings by bringing them together within a joint latent space.
1 code implementation • 1 Jul 2022 • Sanat Ramesh, Vinkle Srivastav, Deepak Alapatt, Tong Yu, Aditya Murali, Luca Sestini, Chinedu Innocent Nwoye, Idris Hamoud, Saurav Sharma, Antoine Fleurentin, Georgios Exarchakis, Alexandros Karargyris, Nicolas Padoy
Correct transfer of these methods to surgery, as described and conducted in this work, leads to substantial performance gains over generic uses of SSL - up to 7. 4% on phase recognition and 20% on tool presence detection - as well as state-of-the-art semi-supervised phase recognition approaches by up to 14%.
Ranked #1 on Semantic Segmentation on Endoscapes
Second, to address the domain shift and the lack of annotations, we propose a novel unsupervised domain adaptation method, called AdaptOR, to adapt a model from an in-the-wild labeled source domain to a statistically different unlabeled target domain.
Deep neural networks power most recent successes of artificial intelligence, spanning from self-driving cars to computer aided diagnosis in radiology and pathology.
Human pose estimation (HPE) is a key building block for developing AI-based context-aware systems inside the operating room (OR).
The objective of the current study was to develop a modified version (Neuro-Endo-Trainer-Online Assessment System (NET-OAS)) by providing a stand-alone system with online evaluation and real-time feedback.
2D/3D human pose estimation is needed to develop novel intelligent tools for the operating room that can analyze and support the clinical activities.
Methods: We propose a comparison of 6 state-of-the-art face detectors on clinical data using Multi-View Operating Room Faces (MVOR-Faces), a dataset of operating room images capturing real surgical activities.
In this paper, we present the dataset, its annotations, as well as baseline results from several recent person detection and 2D/3D pose estimation methods.