The experimental evaluations demonstrate that the hybrid network with the locality module outperforms pure transformer-based models and some of the best convolutional networks in accuracy.
The teacher model extracts local image features, whereas the student model focuses on global features using an attention mechanism.
Medical image segmentation is crucial for the development of computer-aided diagnostic and therapeutic systems, but still faces numerous difficulties.
Convolutional Neural Networks (CNNs) have advanced existing medical systems for automatic disease diagnosis.
In this paper, we explore the robustness of vision transformers against adversarial perturbations and try to enhance their robustness/accuracy trade-off in white box attack settings.