Multi-Heads of Mixed Attention

Introduced by Nwoye et al. in Rendezvous: Attention Mechanisms for the Recognition of Surgical Action Triplets in Endoscopic Videos

The multi-head of mixed attention combines both self- and cross-attentions, encouraging high-level learning of interactions between entities captured in the various attention features. It is build with several attention heads, each of the head can implement either self or cross attention. A self attention is when the key and query features are the same or come from the same domain features. A cross attention is when the key and query features are generated from different features. Modeling MHMA allows a model to identity the relationship between features of different domains. This is very useful in tasks involving relationship modeling such as human-object interaction, tool-tissue interaction, man-machine interaction, human-computer interface, etc.

Source: Rendezvous: Attention Mechanisms for the Recognition of Surgical Action Triplets in Endoscopic Videos

Read Paper See Code