no code implementations • 3 Jul 2024 • Zhantao Yang, Ruili Feng, Keyu Yan, Huangji Wang, Zhicai Wang, Shangwen Zhu, Han Zhang, Jie Xiao, Pingyu Wu, Kai Zhu, Jixuan Chen, Chen-Wei Xie, Chaojie Mao, Yue Yang, Hongyang Zhang, Yu Liu, Fan Cheng
This paper presents Bag-of-Concept Graph (BACON) to gift models with limited linguistic abilities to taste the privilege of Vision Language Models (VLMs) and boost downstream tasks such as detection, visual question answering (VQA), and image generation.
no code implementations • 18 Dec 2023 • Shuailei Ma, Chen-Wei Xie, Ying WEI, Siyang Sun, Jiaqi Fan, Xiaoyi Bao, Yuxin Guo, Yun Zheng
In this paper, we conduct a direct analysis of the multi-modal prompts by asking the following questions: $(i)$ How do the learned multi-modal prompts improve the recognition performance?
1 code implementation • NeurIPS 2023 • Pandeng Li, Chen-Wei Xie, Hongtao Xie, Liming Zhao, Lei Zhang, Yun Zheng, Deli Zhao, Yongdong Zhang
Video moment retrieval pursues an efficient and generalized solution to identify the specific temporal segments within an untrimmed video that correspond to a given language description.
no code implementations • CVPR 2023 • Chen-Wei Xie, Siyang Sun, Xiong Xiong, Yun Zheng, Deli Zhao, Jingren Zhou
This process can be considered as an open-book exam: with the reference set as a cheat sheet, the proposed method doesn't need to memorize all visual concepts in the training data.
1 code implementation • ICCV 2023 • Pandeng Li, Chen-Wei Xie, Liming Zhao, Hongtao Xie, Jiannan Ge, Yun Zheng, Deli Zhao, Yongdong Zhang
In the event-sentence prototype matching phase, we design a temporal prototype generation mechanism to associate intra-frame objects and interact inter-frame temporal relations.
no code implementations • 17 Apr 2018 • Chen-Wei Xie, Hong-Yu Zhou, Jianxin Wu
To be specific, our approach outperforms the previous state-of-the-art model named DeepLab v3 by 1. 5% on the PASCAL VOC 2012 val set and 0. 6% on the test set by replacing the Atrous Spatial Pyramid Pooling (ASPP) module in DeepLab v3 with the proposed Vortex Pooling.
no code implementations • 8 May 2017 • Xiu-Shen Wei, Chen-Lin Zhang, Yao Li, Chen-Wei Xie, Jianxin Wu, Chunhua Shen, Zhi-Hua Zhou
Reusable model design becomes desirable with the rapid expansion of machine learning applications.
2 code implementations • 6 Nov 2016 • Bin-Bin Gao, Chao Xing, Chen-Wei Xie, Jianxin Wu, Xin Geng
However, it is difficult to collect sufficient training images with precise labels in some domains such as apparent age estimation, head pose estimation, multi-label classification and semantic segmentation.
Ranked #1 on Head Pose Estimation on BJUT-3D
no code implementations • 24 May 2016 • Jianxin Wu, Chen-Wei Xie, Jian-Hao Luo
Large receptive field and dense prediction are both important for achieving high accuracy in pixel labeling tasks such as semantic segmentation.
no code implementations • 23 May 2016 • Xiu-Shen Wei, Chen-Wei Xie, Jianxin Wu
Fine-grained image recognition is a challenging computer vision problem, due to the small inter-class variations caused by highly similar subordinate categories, and the large intra-class variations in poses, scales and rotations.