We develop a simple but effective module to explore the full potential of transformers for visual representation by learning fine-grained and coarse-grained features at a token level and dynamically fusing them.
The pretraining tasks include two tasks: masked representation prediction - predict the representations for the masked patches, and masked patch reconstruction - reconstruct the masked patches.
Therefore, a trade-off between effectiveness and efficiency is necessary in practical scenarios.
Ranked #1 on Object Detection on COCO test-dev (Hardware Burden metric)
1 code implementation • 16 Sep 2020 • Xuehui Yu, Zhenjun Han, Yuqi Gong, Nan Jiang, Jian Zhao, Qixiang Ye, Jie Chen, Yuan Feng, Bin Zhang, Xiaodi Wang, Ying Xin, Jingwei Liu, Mingyuan Mao, Sheng Xu, Baochang Zhang, Shumin Han, Cheng Gao, Wei Tang, Lizuo Jin, Mingbo Hong, Yuchao Yang, Shuiwang Li, Huan Luo, Qijun Zhao, Humphrey Shi
The 1st Tiny Object Detection (TOD) Challenge aims to encourage research in developing novel and accurate methods for tiny object detection in images which have wide views, with a current focus on tiny person detection.