Scale Aggregation Block

Introduced by Li et al. in Data-Driven Neuron Allocation for Scale Aggregation Networks

A Scale Aggregation Block concatenates feature maps at a wide range of scales. Feature maps for each scale are generated by a stack of downsampling, convolution and upsampling operations. The proposed scale aggregation block is a standard computational module which readily replaces any given transformation $\mathbf{Y}=\mathbf{T}(\mathbf{X})$, where $\mathbf{X}\in \mathbb{R}^{H\times W\times C}$, $\mathbf{Y}\in \mathbb{R}^{H\times W\times C_o}$ with $C$ and $C_o$ being the input and output channel number respectively. $\mathbf{T}$ is any operator such as a convolution layer or a series of convolution layers. Assume we have $L$ scales. Each scale $l$ is generated by sequentially conducting a downsampling $\mathbf{D}_l$, a transformation $\mathbf{T}_l$ and an unsampling operator $\mathbf{U}_l$:

$$ \mathbf{X}^{'}_l=\mathbf{D}_l(\mathbf{X}), \label{eq:eq_d} $$

$$ \mathbf{Y}^{'}_l=\mathbf{T}_l(\mathbf{X}^{'}_l), \label{eq:eq_tl} $$

$$ \mathbf{Y}_l=\mathbf{U}_l(\mathbf{Y}^{'}_l), \label{eq:eq_u} $$

where $\mathbf{X}^{'}_l\in \mathbb{R}^{H_l\times W_l\times C}$, $\mathbf{Y}^{'}_l\in \mathbb{R}^{H_l\times W_l\times C_l}$, and $\mathbf{Y}_l\in \mathbb{R}^{H\times W\times C_l}$. Notably, $\mathbf{T}_l$ has the similar structure as $\mathbf{T}$. We can concatenate all $L$ scales together, getting

$$ \mathbf{Y}^{'}=\Vert^L_1\mathbf{U}_l(\mathbf{T}_l(\mathbf{D}_l(\mathbf{X}))), \label{eq:eq_all} $$

where $\Vert$ indicates concatenating feature maps along the channel dimension, and $\mathbf{Y}^{'} \in \mathbb{R}^{H\times W\times \sum^L_1 C_l}$ is the final output feature maps of the scale aggregation block.

In the reference implementation, the downsampling $\mathbf{D}_l$ with factor $s$ is implemented by a max pool layer with $s\times s$ kernel size and $s$ stride. The upsampling $\mathbf{U}_l$ is implemented by resizing with the nearest neighbor interpolation.

Source: Data-Driven Neuron Allocation for Scale Aggregation Networks

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Pose Estimation	2	22.22%
3D Reconstruction	1	11.11%
Geometric Matching	1	11.11%
Multi-Person Pose Estimation	1	11.11%
One-Shot Learning	1	11.11%
Semantic Segmentation	1	11.11%
Image Classification	1	11.11%
Object Detection	1	11.11%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Convolution	Convolutions
Max Pooling	Pooling Operations

Categories

Add Remove

Image Model Blocks