From this perspective, we introduce Preservational Learning to reconstruct diverse image contexts in order to preserve more information in learned representations.
Compared to the naive voxel-level self-attention implementation, such volume-based operations help to reduce the computational complexity by approximate 98% and 99. 5% on Synapse and ACDC datasets, respectively.
Vision transformers have attracted much attention from computer vision researchers as they are not restricted to the spatial inductive bias of ConvNets.
In this paper, we tackle the challenge by jointly performing compositional visual reasoning and accurate segmentation in a single stage via the proposed novel Bottom-Up Shift (BUS) and Bidirectional Attentive Refinement (BIAR) modules.
Semi-Supervised classification and segmentation methods have been widely investigated in medical image analysis.
In this paper, we show that such process can be integrated into the one-shot segmentation task which is a very challenging but meaningful topic.
Considering the scarcity of medical data, most datasets in medical image analysis are an order of magnitude smaller than those of natural images.
However, when being used for model training, only the final ground-truth label is utilized, while the critical information contained in the raw multi-rater gradings regarding the image being an easy/hard case is discarded.
Primary angle closure glaucoma (PACG) is the leading cause of irreversible blindness among Asian people.
In deep learning era, pretrained models play an important role in medical image analysis, in which ImageNet pretraining has been widely adopted as the best way.
Our method achieves new state-of-the-art results using the single model with 36$\times$(6$\times$) fewer parameters and 2. 6$\times$(2. 1$\times$) faster inference speed on facial age (attractiveness) estimation.
Ranked #1 on Attractiveness Estimation on SCUT-FBP
To bridge the gap between global and local streams, we propose a multi-class attentional region module which aims to make the number of attentional regions as small as possible and keep the diversity of these regions as high as possible.
Ranked #2 on Multi-Label Classification on PASCAL VOC 2012
no code implementations • 7 May 2020 • Codruta O. Ancuti, Cosmin Ancuti, Florin-Alexandru Vasluianu, Radu Timofte, Jing Liu, Haiyan Wu, Yuan Xie, Yanyun Qu, Lizhuang Ma, Ziling Huang, Qili Deng, Ju-Chin Chao, Tsung-Shan Yang, Peng-Wen Chen, Po-Min Hsu, Tzu-Yi Liao, Chung-En Sun, Pei-Yuan Wu, Jeonghyeok Do, Jongmin Park, Munchurl Kim, Kareem Metwaly, Xuelu Li, Tiantong Guo, Vishal Monga, Mingzhao Yu, Venkateswararao Cherukuri, Shiue-Yuan Chuang, Tsung-Nan Lin, David Lee, Jerome Chang, Zhan-Han Wang, Yu-Bang Chang, Chang-Hong Lin, Yu Dong, Hong-Yu Zhou, Xiangzhen Kong, Sourya Dipta Das, Saikat Dutta, Xuan Zhao, Bing Ouyang, Dennis Estrada, Meiqi Wang, Tianqi Su, Siyi Chen, Bangyong Sun, Vincent Whannou de Dravo, Zhe Yu, Pratik Narang, Aryan Mehra, Navaneeth Raghunath, Murari Mandal
We focus on the proposed solutions and their results evaluated on NH-Haze, a novel dataset consisting of 55 pairs of real haze free and nonhomogeneous hazy images recorded outdoor.
While practitioners have had an intuitive understanding of these observations, we do a comprehensive emperical analysis and demonstrate that: (1) the gains from SSL techniques over a fully-supervised baseline are smaller when trained from a pre-trained model than when trained from random initialization, (2) when the domain of the source data used to train the pre-trained model differs significantly from the domain of the target task, the gains from SSL are significantly higher and (3) some SSL methods are able to advance fully-supervised baselines (like Pseudo-Label).
To be specific, our approach outperforms the previous state-of-the-art model named DeepLab v3 by 1. 5% on the PASCAL VOC 2012 val set and 0. 6% on the test set by replacing the Atrous Spatial Pyramid Pooling (ASPP) module in DeepLab v3 with the proposed Vortex Pooling.
In this paper, we propose Adaptive Feeding (AF) to combine a fast (but less accurate) detector and an accurate (but slow) detector, by adaptively determining whether an image is easy or hard and choosing an appropriate detector for it.
The difficulty of image recognition has gradually increased from general category recognition to fine-grained recognition and to the recognition of some subtle attributes such as temperature and geolocation.