no code implementations • 30 Aug 2023 • Yifan Xu, Mengdan Zhang, Xiaoshan Yang, Changsheng Xu
In this paper, we for the first time explore helpful multi-modal contextual knowledge to understand novel categories for open-vocabulary object detection (OVD).
1 code implementation • NeurIPS 2023 • Yifan Xu, Mengdan Zhang, Chaoyou Fu, Peixian Chen, Xiaoshan Yang, Ke Li, Changsheng Xu
To address the learning inertia problem brought by the frozen detector, a vision conditioned masked language prediction strategy is proposed.
Ranked #1 on Few-Shot Object Detection on ODinW-35
1 code implementation • 15 May 2023 • Linhui Xiao, Xiaoshan Yang, Fang Peng, Ming Yan, YaoWei Wang, Changsheng Xu
In order to utilize vision and language pre-trained models to address the grounding problem, and reasonably take advantage of pseudo-labels, we propose CLIP-VG, a novel method that can conduct self-paced curriculum adapting of CLIP with pseudo-language labels.
no code implementations • CVPR 2023 • Yuyang Wanyan, Xiaoshan Yang, Chaofan Chen, Changsheng Xu
In meta-training, we design an Active Sample Selection (ASS) module to organize query samples with large differences in the reliability of modalities into different groups based on modality-specific posterior distributions.
no code implementations • 28 Nov 2022 • Fang Peng, Xiaoshan Yang, Linhui Xiao, YaoWei Wang, Changsheng Xu
Although significant progress has been made in few-shot learning, most of existing few-shot image classification methods require supervised pre-training on a large amount of samples of base classes, which limits their generalization ability in real world application.
1 code implementation • CVPR 2022 • Jiabo Ye, Junfeng Tian, Ming Yan, Xiaoshan Yang, Xuwu Wang, Ji Zhang, Liang He, Xin Lin
Moreover, since the backbones are query-agnostic, it is difficult to completely avoid the inconsistency issue by training the visual backbone end-to-end in the visual grounding framework.
no code implementations • CVPR 2022 • Yiming Li, Xiaoshan Yang, Changsheng Xu
Humans can not only see the collection of objects in visual scenes, but also identify the relationship between objects.
no code implementations • 20 Dec 2021 • Jinfeng Wei, Yunxin Wang, Mengli Guo, Pei Lv, Xiaoshan Yang, Mingliang Xu
Graph convolutional networks (GCNs) based methods have achieved advanced performance on skeleton-based action recognition task.
no code implementations • CVPR 2021 • Chaofan Chen, Xiaoshan Yang, Changsheng Xu, Xuhui Huang, Zhe Ma
Specifically, we first employ the comparison module to explore the pairwise sample relations to learn rich sample representations in the instance-level graph.
no code implementations • 23 Mar 2021 • Xuan Ma, Xiaoshan Yang, Junyu Gao, Changsheng Xu
However, these data streams are multi-source and heterogeneous, containing complex temporal structures with local contextual and global temporal aspects, which makes the feature learning and data joint utilization challenging.
no code implementations • 7 Nov 2020 • Peng Jia, Ruiyu Ning, Ruiqi Sun, Xiaoshan Yang, Dongmei Cai
In recent years, developments of deep neural networks and increments of the number of astronomical images have evoked a lot of data--driven image restoration methods.
no code implementations • 28 Nov 2019 • Yi Huang, Xiaoshan Yang, Changsheng Xu
(1) It can model longitudinal heterogeneous EHRs data via capturing the 3-order correlations of different modalities and the irregular temporal impact of historical events.
no code implementations • 25 May 2019 • Ting-Ting Xie, Xiaoshan Yang, Tianzhu Zhang, Changsheng Xu, Ioannis Patras
Temporal action localization has recently attracted significant interest in the Computer Vision community.