no code implementations • 20 May 2025 • Jiaer Xia, Yuhang Zang, Peng Gao, Yixuan Li, Kaiyang Zhou
Recent research in large language models (LLMs), such as DeepSeek-R1, has shown that reinforcement learning techniques like GRPO can enable pre-trained LLMs to develop reasoning capabilities using simple question-answer pairs.
1 code implementation • 20 May 2025 • Yu tong, Zihao Pan, Shuai Yang, Kaiyang Zhou
However, existing generative watermarking methods are mainly designed for diffusion models while watermarking for autoregressive image generation models remains largely underexplored.
1 code implementation • 19 May 2025 • Sifeng Shang, Jiayi Zhou, Chenyu Lin, Minxian Li, Kaiyang Zhou
In this paper, we aim to push the limits of memory-efficient training by minimizing memory usage on model weights, gradients, and optimizer states, within a unified framework.
5 code implementations • 30 Jan 2025 • Hao Dong, Moru Liu, Kaiyang Zhou, Eleni Chatzi, Juho Kannala, Cyrill Stachniss, Olga Fink
Besides, the recent advent of large-scale pre-trained multimodal foundation models, such as CLIP, has inspired works leveraging these models to enhance adaptation and generalization performances or adapting them to downstream tasks.
3 code implementations • NeurIPS 2023 • Jingkang Yang, Jun Cen, Wenxuan Peng, Shuai Liu, Fangzhou Hong, Xiangtai Li, Kaiyang Zhou, Qifeng Chen, Ziwei Liu
To facilitate research in this new area, we build a richly annotated PSG-4D dataset consisting of 3K RGB-D videos with a total of 1M frames, each of which is labeled with 4D panoptic segmentation masks as well as fine-grained, dynamic scene graphs.
1 code implementation • CVPR 2024 • Yabin Zhang, Wenjie Zhu, Hui Tang, Zhiyuan Ma, Kaiyang Zhou, Lei Zhang
In this paper, we introduce a versatile adaptation approach that can effectively work under all three settings.
1 code implementation • 7 Feb 2024 • Shuoyuan Wang, Jindong Wang, Guoqing Wang, Bob Zhang, Kaiyang Zhou, Hongxin Wei
Vision-language models (VLMs) have emerged as formidable tools, showing their strong capability in handling various open-vocabulary tasks in image recognition, text-driven visual content generation, and visual chatbots, to name a few.
1 code implementation • CVPR 2024 • CHONG YIN, SiQi Liu, Kaiyang Zhou, Vincent Wai-Sun Wong, Pong C. Yuen
QAP is based on two quantitative attributes namely K-function-based spatial attributes and histogram-based morphological attributes which are aimed for quantitative assessment of tissue states.
3 code implementations • CVPR 2023 • Jingkang Yang, Wenxuan Peng, Xiangtai Li, Zujin Guo, Liangyu Chen, Bo Li, Zheng Ma, Kaiyang Zhou, Wayne Zhang, Chen Change Loy, Ziwei Liu
PVSG relates to the existing video scene graph generation (VidSGG) problem, which focuses on temporal interactions between humans and objects grounded with bounding boxes in videos.
1 code implementation • 12 Oct 2023 • Jingkang Yang, Yuhao Dong, Shuai Liu, Bo Li, Ziyue Wang, Chencheng Jiang, Haoran Tan, Jiamu Kang, Yuanhan Zhang, Kaiyang Zhou, Ziwei Liu
To bridge this gap, we introduce Octopus, an embodied vision-language programmer that uses executable code generation as a medium to connect planning and manipulation.
1 code implementation • 29 May 2023 • Yuhang Zang, Wei Li, Jun Han, Kaiyang Zhou, Chen Change Loy
Moreover, we present ContextDET, a unified multimodal model that is capable of end-to-end differentiable modeling of visual-language contexts, so as to locate, identify, and associate visual objects with language inputs for human-AI interaction.
1 code implementation • 24 May 2023 • Yuhang Zang, Kaiyang Zhou, Chen Huang, Chen Change Loy
This paper focuses on long-tailed object detection in the semi-supervised learning setting, which poses realistic challenges, but has rarely been studied in the literature.
1 code implementation • NeurIPS 2023 • Yuanhan Zhang, Kaiyang Zhou, Ziwei Liu
To overcome the problem, we propose a prompt retrieval framework to automate the selection of in-context examples.
no code implementations • 25 Oct 2022 • Tingwei Wang, Da Li, Kaiyang Zhou, Tao Xiang, Yi-Zhe Song
Machine learning models are intrinsically vulnerable to domain shift between training and testing data, resulting in poor performance in novel domains.
4 code implementations • 13 Oct 2022 • Jingkang Yang, Pengyun Wang, Dejian Zou, Zitang Zhou, Kunyuan Ding, Wenxuan Peng, Haoqi Wang, Guangyao Chen, Bo Li, Yiyou Sun, Xuefeng Du, Kaiyang Zhou, Wayne Zhang, Dan Hendrycks, Yixuan Li, Ziwei Liu
Out-of-distribution (OOD) detection is vital to safety-critical machine learning applications and has thus been extensively studied, with a plethora of methods developed in the literature.
1 code implementation • 13 Oct 2022 • Yuhang Zang, Wei Li, Kaiyang Zhou, Chen Huang, Chen Change Loy
Prompt tuning, a parameter- and data-efficient transfer learning paradigm that tunes only a small number of parameters in a model's input space, has become a trend in the vision community since the emergence of large vision-language models like CLIP.
2 code implementations • 15 Sep 2022 • Kaiyang Zhou, Yuanhan Zhang, Yuhang Zang, Jingkang Yang, Chen Change Loy, Ziwei Liu
Another interesting observation is that the teacher-student gap on out-of-distribution data is bigger than that on in-distribution data, which highlights the capacity mismatch issue as well as the shortcoming of KD.
1 code implementation • 22 Jul 2022 • Jingkang Yang, Yi Zhe Ang, Zujin Guo, Kaiyang Zhou, Wayne Zhang, Ziwei Liu
Existing research addresses scene graph generation (SGG) -- a critical technology for scene understanding in images -- from a detection perspective, i. e., objects are detected using bounding boxes followed by prediction of their pairwise relationships.
Ranked #5 on
Panoptic Scene Graph Generation
on PSG Dataset
1 code implementation • 17 Jul 2022 • Kaiyang Zhou, Adeline Paiement, Majid Mirmehdi
We address the problem of people detection in RGB-D data where we leverage depth information to develop a region-of-interest (ROI) selection method that provides proposals to two color and depth CNNs.
1 code implementation • 9 Jun 2022 • Yuanhan Zhang, Kaiyang Zhou, Ziwei Liu
The size of vision models has grown exponentially over the last few years, especially after the emergence of Vision Transformer.
Ranked #1 on
Image Classification
on OmniBenchmark
(using extra training data)
2 code implementations • 11 Apr 2022 • Jingkang Yang, Kaiyang Zhou, Ziwei Liu
In this paper, we take into account both shift types and introduce full-spectrum OOD (FS-OOD) detection, a more realistic problem setting that considers both detecting semantic shift and being tolerant to covariate shift; and designs three benchmarks.
Out-of-Distribution Detection
Out of Distribution (OOD) Detection
4 code implementations • 22 Mar 2022 • Yuhang Zang, Wei Li, Kaiyang Zhou, Chen Huang, Chen Change Loy
To this end, we propose a novel open-vocabulary detector based on DETR -- hence the name OV-DETR -- which, once trained, can detect any object given its class name or an exemplar image.
Ranked #26 on
Open Vocabulary Object Detection
on MSCOCO
12 code implementations • CVPR 2022 • Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu
With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential to investigate ways to adapt these models to downstream datasets.
Ranked #5 on
Prompt Engineering
on ImageNet V2
1 code implementation • 9 Mar 2022 • Zhongying Deng, Kaiyang Zhou, Da Li, Junjun He, Yi-Zhe Song, Tao Xiang
In this paper, we address both single-source and multi-source UDA from a completely different perspective, which is to view each instance as a fine domain.
1 code implementation • 6 Nov 2021 • Zhongying Deng, Kaiyang Zhou, Yongxin Yang, Tao Xiang
Importantly, the attention module is supervised by a consistency loss, which is imposed on the distributions of channel attention weights between source and target domains.
4 code implementations • 21 Oct 2021 • Jingkang Yang, Kaiyang Zhou, Yixuan Li, Ziwei Liu
In this survey, we first present a unified framework called generalized OOD detection, which encompasses the five aforementioned problems, i. e., AD, ND, OSR, OOD detection, and OD.
18 code implementations • 2 Sep 2021 • Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu
Large pre-trained vision-language models like CLIP have shown great potential in learning representations that are transferable across a wide range of downstream tasks.
Ranked #2 on
Few-shot Age Estimation
on MORPH Album2
no code implementations • ICCV 2021 • Yezhen Wang, Bo Li, Tong Che, Kaiyang Zhou, Ziwei Liu, Dongsheng Li
Confidence calibration is of great importance to the reliability of decisions made by machine learning systems.
2 code implementations • 5 Jul 2021 • Kaiyang Zhou, Yongxin Yang, Yu Qiao, Tao Xiang
MixStyle is easy to implement with a few lines of code, does not require modification to training objectives, and can fit a variety of learning paradigms including supervised domain generalization, semi-supervised domain generalization, and unsupervised domain adaptation.
2 code implementations • 1 Jun 2021 • Kaiyang Zhou, Chen Change Loy, Ziwei Liu
We find that the DG methods, which by design are unable to handle unlabeled data, perform poorly with limited labels in SSDG; the SSL methods, especially FixMatch, obtain much better results but are still far away from the basic vanilla model trained using full labels.
3 code implementations • ICLR 2021 • Kaiyang Zhou, Yongxin Yang, Yu Qiao, Tao Xiang
Our method, termed MixStyle, is motivated by the observation that visual domain is closely related to image style (e. g., photo vs.~sketch images).
Ranked #66 on
Domain Generalization
on PACS
2 code implementations • 3 Mar 2021 • Kaiyang Zhou, Ziwei Liu, Yu Qiao, Tao Xiang, Chen Change Loy
Generalization to out-of-distribution (OOD) data is a capability natural to humans yet challenging for machines to reproduce.
1 code implementation • ECCV 2020 • Kaiyang Zhou, Yongxin Yang, Timothy Hospedales, Tao Xiang
This explicitly increases the diversity of available training domains and leads to a more generalizable model.
Ranked #74 on
Domain Generalization
on PACS
1 code implementation • 16 Mar 2020 • Kaiyang Zhou, Yongxin Yang, Yu Qiao, Tao Xiang
Each such classifier is an expert to its own domain and a non-expert to others.
no code implementations • 12 Mar 2020 • Kaiyang Zhou, Yongxin Yang, Timothy Hospedales, Tao Xiang
This is achieved by having a learning objective formulated to ensure that the generated data can be correctly classified by the label classifier while fooling the domain classifier.
Ranked #72 on
Domain Generalization
on PACS
9 code implementations • 22 Oct 2019 • Kaiyang Zhou, Tao Xiang
Person re-identification (re-ID), which aims to re-identify people across different camera views, has been significantly advanced by deep learning in recent years, particularly with convolutional neural networks (CNNs).
9 code implementations • 15 Oct 2019 • Kaiyang Zhou, Yongxin Yang, Andrea Cavallaro, Tao Xiang
An effective person re-identification (re-ID) model should learn feature representations that are both discriminative, for distinguishing similar-looking people, and generalisable, for deployment across datasets without any adaptation.
Unsupervised Domain Adaptation
Unsupervised Person Re-Identification
17 code implementations • ICCV 2019 • Kaiyang Zhou, Yongxin Yang, Andrea Cavallaro, Tao Xiang
As an instance-level recognition problem, person re-identification (ReID) relies on discriminative features, which not only capture different spatial scales but also encapsulate an arbitrary combination of multiple scales.
Ranked #2 on
Person Re-Identification
on MSMT17-C
no code implementations • 9 Jul 2018 • Kaiyang Zhou, Tao Xiang, Andrea Cavallaro
Most existing video summarisation methods are based on either supervised or unsupervised learning.
6 code implementations • 29 Dec 2017 • Kaiyang Zhou, Yu Qiao, Tao Xiang
Video summarization aims to facilitate large-scale video browsing by producing short, concise summaries that are diverse and representative of original videos.
Ranked #7 on
Unsupervised Video Summarization
on TvSum