Hybrid volumetric medical image segmentation models, combining the advantages of local convolution and global attention, have recently received considerable attention.
The latest breakthroughs in large vision-language models, such as Bard and GPT-4, have showcased extraordinary abilities in performing a wide range of tasks.
Using our proposed efficient additive attention, we build a series of models called "SwiftFormer" which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed.
Owing to the success of transformer models, recent works study their applicability in 3D medical segmentation tasks.
Our EdgeNeXt model with 1. 3M parameters achieves 71. 2% top-1 accuracy on ImageNet-1K, outperforming MobileViT with an absolute gain of 2. 2% with 28% reduction in FLOPs.
Ranked #29 on Semantic Segmentation on PASCAL VOC 2012 test
We compare our results to the state-of-the-art models for instance segmentation.