1 code implementation • 15 Oct 2024 • Yue Cao, Yangzhou Liu, Zhe Chen, Guangchen Shi, Wenhai Wang, Danhuai Zhao, Tong Lu
To address this issue, we propose \modelname, a simple yet effective multi-layer feature fuser that efficiently integrates deep and shallow features from Vision Transformers (ViTs).
Ranked #142 on
Visual Question Answering
on MM-Vet
no code implementations • 26 Jul 2022 • Guangchen Shi, Yirui Wu, Jun Liu, Shaohua Wan, Wenhai Wang, Tong Lu
Second, to resist overfitting issues caused by few training samples, a hyper-class embedding is learned by clustering all category embeddings for initialization and aligned with category embedding of the new class for enhancement, where learned knowledge assists to learn new knowledge, thus alleviating performance dependence on training data scale.