TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Action Triplet Recognition	CholecT50	Rendezvous (TensorFlow v1)	Mean AP	29.9	# 1
Action Triplet Recognition	CholecT50	Attention Tripnet (TensorFlow v1)	Mean AP	23.4	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rendezvous-attention-mechanisms-for-the/action-triplet-recognition-on-cholect50)](https://paperswithcode.com/sota/action-triplet-recognition-on-cholect50?p=rendezvous-attention-mechanisms-for-the)`

Rendezvous: Attention Mechanisms for the Recognition of Surgical Action Triplets in Endoscopic Videos

7 Sep 2021 · Chinedu Innocent Nwoye, Tong Yu, Cristians Gonzalez, Barbara Seeliger, Pietro Mascagni, Didier Mutter, Jacques Marescaux, Nicolas Padoy ·

Out of all existing frameworks for surgical workflow analysis in endoscopic videos, action triplet recognition stands out as the only one aiming to provide truly fine-grained and comprehensive information on surgical activities. This information, presented as <instrument, verb, target> combinations, is highly challenging to be accurately identified. Triplet components can be difficult to recognize individually; in this task, it requires not only performing recognition simultaneously for all three triplet components, but also correctly establishing the data association between them. To achieve this task, we introduce our new model, the Rendezvous (RDV), which recognizes triplets directly from surgical videos by leveraging attention at two different levels. We first introduce a new form of spatial attention to capture individual action triplet components in a scene; called Class Activation Guided Attention Mechanism (CAGAM). This technique focuses on the recognition of verbs and targets using activations resulting from instruments. To solve the association problem, our RDV model adds a new form of semantic attention inspired by Transformer networks; called Multi-Head of Mixed Attention (MHMA). This technique uses several cross and self attentions to effectively capture relationships between instruments, verbs, and targets. We also introduce CholecT50 - a dataset of 50 endoscopic videos in which every frame has been annotated with labels from 100 triplet classes. Our proposed RDV model significantly improves the triplet prediction mean AP by over 9% compared to the state-of-the-art methods on this dataset.

PDF Abstract

Code

Add Remove Mark official

camma-public/tripnet official

camma-public/attention-tripnet official

CAMMA-public/cholect45

CAMMA-public/cholect50

camma-public/rendezvous

See all 8 implementations

Tasks

Add Remove

Action Triplet Recognition

Datasets

Introduced in the Paper:

CholecT50

CholecT45

Used in the Paper:

ImageNet

Cholec80

CholecT40

Results from the Paper

Add Remove

Ranked #1 on Action Triplet Recognition on CholecT50

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Action Triplet Recognition	CholecT50	Rendezvous (TensorFlow v1)	Mean AP	29.9	# 1	Compare
Action Triplet Recognition	CholecT50	Attention Tripnet (TensorFlow v1)	Mean AP	23.4	# 3	Compare

Methods

Add Remove

Absolute Position Encodings • Class Activation Guided Attention Mechanism • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • MHMA • Multi-Head Attention • Position-Wise Feed-Forward Layer • Rendezvous • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer • Triplet Attention

Edit Social Preview

Rendezvous: Attention Mechanisms for the Recognition of Surgical Action Triplets in Endoscopic Videos

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove