Affinity Functions

Embedded Gaussian Affinity

Introduced by Wang et al. in Non-local Neural Networks

Embedded Gaussian Affinity is a type of affinity or self-similarity function between two points $\mathbf{x_{i}}$ and $\mathbf{x_{j}}$ that uses a Gaussian function in an embedding space:

$$f\left(\mathbf{x_{i}}, \mathbf{x_{j}}\right) = e^{\theta\left(\mathbf{x_{i}}\right)^{T}\phi\left(\mathbf{x_{j}}\right)}$$

Here $\theta\left(x_{i}\right) = W_{θ}x_{i}$ and $\phi\left(x_{j}\right) = W_{φ}x_{j}$ are two embeddings.

Note that the self-attention module used in the original Transformer model is a special case of non-local operations in the embedded Gaussian version. This can be seen from the fact that for a given $i$, $\frac{1}{\mathcal{C}\left(\mathbf{x}\right)}\sum_{\forall{j}}f\left(\mathbf{x}_{i}, \mathbf{x}_{j}\right)g\left(\mathbf{x}_{j}\right)$ becomes the softmax computation along the dimension $j$. So we have $\mathbf{y} = \text{softmax}\left(\mathbf{x}^{T}W^{T}_{\theta}W_{\phi}\mathbf{x}\right)g\left(\mathbf{x}\right)$, which is the self-attention form in the Transformer model. This shows how we can relate this recent self-attention model to the classic computer vision method of non-local means.

Source: Non-local Neural Networks

Latest Papers

PAPER DATE
PBRnet: Pyramidal Bounding Box Refinement to Improve Object Localization Accuracy
Li XiaoYufan LuoChunlong LuoLianhe ZhaoQuanshui FuGuoqing YangAnpeng HuangYi Zhao
2020-03-10
Libra R-CNN: Towards Balanced Learning for Object Detection
| Jiangmiao PangKai ChenJianping ShiHuajun FengWanli OuyangDahua Lin
2019-04-04
Non-local Neural Networks
| Xiaolong WangRoss GirshickAbhinav GuptaKaiming He
2017-11-21

Object Detection 2 22.22%
Object Localization 1 11.11%
Action Classification 1 11.11%
Action Recognition 1 11.11%
Instance Segmentation 1 11.11%
Keypoint Detection 1 11.11%
Pose Estimation 1 11.11%
Video Classification 1 11.11%

Components

COMPONENT TYPE
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign