no code implementations • ICCV 2023 • Wenshuo Ma, Yidong Li, Xiaofeng Jia, Wei Xu
Visual Transformers (ViTs) and Convolutional Neural Networks (CNNs) are the two primary backbone structures extensively used in various vision tasks.
no code implementations • ECCV 2020 • Wenshuo Ma, Tingzhong Tian, Hang Xu, Yimin Huang, Zhenguo Li
By carefully analyzing the existing bounding box patterns on the feature hierarchy, we design a flexible and tight hyper-parameter space for anchor configurations.