In this work, we observe that mutual relation knowledge between samples is also important to improve the discriminative ability of the learned representation of the student model, and propose an effective face recognition distillation method called CoupleFace by additionally introducing the Mutual Relation Distillation (MRD) into existing distillation framework.
Specifically, for pseudo labeling, existing works only focus on the classification score yet fail to guarantee the localization precision of pseudo boxes; For consistency training, the widely adopted random-resize training only considers the label-level consistency but misses the feature-level one, which also plays an important role in ensuring the scale invariance.
Based on the two observations, we propose Rank Mimicking (RM) and Prediction-guided Feature Imitation (PFI) for distilling one-stage detectors, respectively.
Transferring with few data in a general way to thousands of downstream tasks is becoming a trend of the foundation model's application.
no code implementations • 16 Nov 2021 • Jing Shao, Siyu Chen, Yangguang Li, Kun Wang, Zhenfei Yin, Yinan He, Jianing Teng, Qinghong Sun, Mengya Gao, Jihao Liu, Gengshi Huang, Guanglu Song, Yichao Wu, Yuming Huang, Fenggang Liu, Huan Peng, Shuo Qin, Chengyu Wang, Yujie Wang, Conghui He, Ding Liang, Yu Liu, Fengwei Yu, Junjie Yan, Dahua Lin, Xiaogang Wang, Yu Qiao
Enormous waves of technological innovations over the past several years, marked by the advances in AI technologies, are profoundly reshaping the industry and the society.
We build a family of models which surpass existing MLPs and even state-of-the-art Transformer-based models, e. g., Swin Transformer, while using fewer parameters and FLOPs.
Ranked #10 on Semantic Segmentation on DensePASS
We hope this work will facilitate state-of-the-art Transformer researches in computer vision.
Ranked #39 on Object Detection on COCO minival
Forms are a common type of document in real life and carry rich information through textual contents and the organizational structure.
By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text.
In this study, we make a key observation that the local con-text represented by the similarities between the instance and its inter-class neighbors1plays an important role forFR.
Unlike the recently-proposed Transformer model (e. g., ViT) that is specially designed for image classification, we propose Pyramid Vision Transformer~(PVT), which overcomes the difficulties of porting Transformer to various dense prediction tasks.
Ranked #20 on Semantic Segmentation on DensePASS
This work presents a new fine-grained transparent object segmentation dataset, termed Trans10K-v2, extending Trans10K-v1, the first large-scale transparent object segmentation dataset.
Ranked #3 on Semantic Segmentation on Trans10K
To estimate the LID of each face image in the verification process, we propose two types of LID Estimation (LIDE) methods, which are reference-based and learning-based estimation methods, respectively.
The table detection and handcrafted features in previous works cannot apply to all forms because of their requirements on formats.
Unlike previous works that merely employed visual features for text detection, this work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter), which learns both visual and linguistic features to significantly reduce ambiguity in text detection.
In this paper, we introduce an anchor-box free and single shot instance segmentation method, which is conceptually simple, fully convolutional and can be used as a mask prediction module for instance segmentation, by easily embedding it into most off-the-shelf detection methods.
Ranked #61 on Instance Segmentation on COCO test-dev
However, we find that the representation of a converged heavy model is still a strong constraint for training a small student model, which leads to a high lower bound of congruence loss.
Scene text detection, an essential step of scene text recognition system, is to locate text instances in natural scene images automatically.
Ranked #1 on Scene Text Detection on ICDAR 2017 MLT
Although deeper and larger neural networks have achieved better performance, the complex network structure and increasing computational cost cannot meet the demands of many resource-constrained applications.
Incidental scene text spotting is considered one of the most difficult and valuable challenges in the document analysis community.
Ranked #4 on Scene Text Detection on ICDAR 2015
Very deep neural networks recently achieved great success on general object recognition because of their superb learning capacity.
Ranked #12 on Face Verification on Labeled Faces in the Wild