Spatial Token Mixer

4 papers with code • 0 benchmarks • 0 datasets

Spatial Token Mixer (STM) is a module for vision transformers that aims to improve the efficiency of token mixing. STM is a type of depthwise convolution that operates on the spatial dimension of the tokens. STM is a drop-in replacement for the token mixing layers in vision transformers.

Most implemented papers

WaveMix: A Resource-efficient Neural Network for Image Analysis

pranavphoenix/WaveMix 28 May 2022

The whole architecture is a stack of self-similar and resolution-preserving WaveMix blocks, which allows architectural flexibility for various tasks and levels of resource availability.

Demystify Transformers & Convolutions in Modern Image Deep Networks

opengvlab/stm-evaluation 10 Nov 2022

Our experiments on various tasks and an analysis of inductive bias show a significant performance boost due to advanced network-level and block-level designs, but performance differences persist among different STMs.

CARD: Semantic Segmentation with Efficient Class-Aware Regularized Decoder

edwardyehuang/CAR 11 Jan 2023

Extensive experiments and ablation studies conducted on multiple benchmark datasets demonstrate that the proposed CAR can boost the accuracy of all baseline models by up to 2. 23% mIOU with superior generalization ability.

UniNeXt: Exploring A Unified Architecture for Vision Recognition

jianlong-yuan/uninext 26 Apr 2023

Interestingly, the ranking of these spatial token mixers also changes under our UniNeXt, suggesting that an excellent spatial token mixer may be stifled due to a suboptimal general architecture, which further shows the importance of the study on the general architecture of vision backbone.