no code implementations • 2 Jan 2024 • Xuelin Zhu, Jian Liu, Dongqi Tang, Jiawei Ge, Weijia Liu, Bo Liu, Jiuxin Cao
Identifying labels that did not appear during training, known as multi-label zero-shot learning, is a non-trivial task in computer vision.
no code implementations • 7 Dec 2023 • Xuelin Zhu, Jiuxin Cao, Jian Liu, Dongqi Tang, Furong Xu, Weijia Liu, Jiawei Ge, Bo Liu, Qingpei Guo, Tianyi Zhang
Pre-trained vision-language models have notably accelerated progress of open-world concept recognition.
no code implementations • 28 Nov 2023 • Jiawei Ge, Xiangmei Chen, Jiuxin Cao, Xuelin Zhu, Bo Liu
However, current VL trackers have not fully exploited the power of VL learning, as they suffer from limitations such as heavily relying on off-the-shelf backbones for feature extraction, ineffective VL fusion designs, and the absence of VL-related loss functions.
no code implementations • 27 Nov 2023 • Jiawei Ge, Shange Tang, Jianqing Fan, Cong Ma, Chi Jin
This paper addresses this fundamental question by proving that, surprisingly, classical Maximum Likelihood Estimation (MLE) purely using source data (without any modification) achieves the minimax optimality for covariate shift under the well-specified setting.
no code implementations • 28 Jun 2023 • Jianqing Fan, Jiawei Ge, Debarghya Mukherjee
Uncertainty quantification for prediction is an intriguing problem with significant applications in various fields, such as biomedical science, economic studies, and weather forecasts.
no code implementations • 2 Mar 2023 • Jiawei Ge, Shange Tang, Jianqing Fan, Chi Jin
Unsupervised pretraining, which learns a useful representation using a large amount of unlabeled data to facilitate the learning of downstream tasks, is a critical component of modern large-scale machine learning systems.
no code implementations • ICCV 2023 • Xuelin Zhu, Jian Liu, Weijia Liu, Jiawei Ge, Bo Liu, Jiuxin Cao
Multi-label image classification refers to assigning a set of labels for an image.
1 code implementation • ACMMM 2022 • Xuelin Zhu, Jiuxin Cao, Jiawei Ge, Weijia Liu, Bo Liu
Specifically, in each layer of TSFormer, a cross-modal attention module is developed to aggregate visual features from spatial stream into semantic stream and update label semantics via a residual connection.