Embedded Gaussian Affinity is a type of affinity or selfsimilarity function between two points $\mathbf{x_{i}}$ and $\mathbf{x_{j}}$ that uses a Gaussian function in an embedding space:
$$ f\left(\mathbf{x_{i}}, \mathbf{x_{j}}\right) = e^{\theta\left(\mathbf{x_{i}}\right)^{T}\phi\left(\mathbf{x_{j}}\right)} $$
Here $\theta\left(x_{i}\right) = W_{θ}x_{i}$ and $\phi\left(x_{j}\right) = W_{φ}x_{j}$ are two embeddings.
Note that the selfattention module used in the original Transformer model is a special case of nonlocal operations in the embedded Gaussian version. This can be seen from the fact that for a given $i$, $\frac{1}{\mathcal{C}\left(\mathbf{x}\right)}\sum_{\forall{j}}f\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)g\left(\mathbf{x}_{j}\right)$ becomes the softmax computation along the dimension $j$. So we have $\mathbf{y} = \text{softmax}\left(\mathbf{x}^{T}W^{T}_{\theta}W_{\phi}\mathbf{x}\right)g\left(\mathbf{x}\right)$, which is the selfattention form in the Transformer model. This shows how we can relate this recent selfattention model to the classic computer vision method of nonlocal means.
Source: Nonlocal Neural NetworksPAPER  DATE 

PBRnet: Pyramidal Bounding Box Refinement to Improve Object Localization Accuracy
• • • • • • • 
20200310 
Libra RCNN: Towards Balanced Learning for Object Detection
• • • • • 
20190404 
Nonlocal Neural Networks
• • • 
20171121 
TASK  PAPERS  SHARE 

Object Detection  2  22.22% 
Object Localization  1  11.11% 
Action Classification  1  11.11% 
Action Recognition  1  11.11% 
Instance Segmentation  1  11.11% 
Keypoint Detection  1  11.11% 
Pose Estimation  1  11.11% 
Video Classification  1  11.11% 
COMPONENT  TYPE 


🤖 No Components Found  You can add them if they exist; e.g. Mask RCNN uses RoIAlign 