no code implementations • 11 Oct 2024 • Xiaoyu Yue, Zidong Wang, Zeyu Lu, Shuyang Sun, Meng Wei, Wanli Ouyang, Lei Bai, Luping Zhou
Conventional class-guided diffusion models generally succeed in generating images with correct semantic content, but often struggle with texture details.
Ranked #28 on
Image Generation
on ImageNet 256x256
no code implementations • 12 Sep 2024 • Runjia Li, Junlin Han, Luke Melas-Kyriazi, Chunyi Sun, Zhaochong An, Zhongrui Gui, Shuyang Sun, Philip Torr, Tomas Jakab
Existing SDS methods often struggle with this generation task due to a limited understanding of part-level semantics in text-to-image diffusion models.
no code implementations • 15 Apr 2024 • Zhongrui Gui, Shuyang Sun, Runjia Li, Jianhao Yuan, Zhaochong An, Karsten Roth, Ameya Prabhu, Philip Torr
We demonstrate that kNN-CLIP can adapt to continually growing vocabularies without the need for retraining or large memory costs.
1 code implementation • 28 Feb 2024 • Bin Cao, Jianhao Yuan, Yexin Liu, Jian Li, Shuyang Sun, Jing Liu, Bo Zhao
To alleviate artifacts and improve quality of synthetic images, we fine-tune Vision-Language Model (VLM) as artifact classifier to automatically identify and classify a wide range of artifacts and provide supervision for further optimizing generative models.
no code implementations • 16 Feb 2024 • Jianhao Yuan, Shuyang Sun, Daniel Omeiza, Bo Zhao, Paul Newman, Lars Kunze, Matthew Gadd
Recent advancements in Multi-Modal Large Language models (MLLMs) have shown promising potential in enhancing the explainability as a driving agent by producing control predictions along with natural language explanations.
no code implementations • CVPR 2024 • Shuyang Sun, Runjia Li, Philip Torr, Xiuye Gu, Siyang Li
Existing open-vocabulary image segmentation methods require a fine-tuning step on mask labels and/or image-text datasets.
1 code implementation • 16 Oct 2023 • Jianhao Yuan, Jie Zhang, Shuyang Sun, Philip Torr, Bo Zhao
Synthetic training data has gained prominence in numerous learning tasks and scenarios, offering advantages such as dataset augmentation, generalization evaluation, and privacy preservation.
no code implementations • ICCV 2023 • Runjia Li, Shuyang Sun, Mohamed Elhoseiny, Philip Torr
Hence, humour generation and understanding can serve as a new task for evaluating the ability of deep-learning methods to process abstract and subjective information.
1 code implementation • NeurIPS 2023 • Shuyang Sun, Weijun Wang, Qihang Yu, Andrew Howard, Philip Torr, Liang-Chieh Chen
This paper presents a new mechanism to facilitate the training of mask transformers for efficient panoptic segmentation, democratizing its deployment.
no code implementations • 29 Nov 2022 • Shuyang Sun, Jie-Neng Chen, Ruifei He, Alan Yuille, Philip Torr, Song Bai
LUMix is simple as it can be implemented in just a few lines of code and can be universally applied to any deep networks \eg CNNs and Vision Transformers, with minimal computational cost.
1 code implementation • 14 Oct 2022 • Ruifei He, Shuyang Sun, Xin Yu, Chuhui Xue, Wenqing Zhang, Philip Torr, Song Bai, Xiaojuan Qi
Recent text-to-image generation models have shown promising results in generating high-fidelity photo-realistic images.
1 code implementation • CVPR 2022 • Ruifei He, Shuyang Sun, Jihan Yang, Song Bai, Xiaojuan Qi
Large-scale pre-training has been proven to be crucial for various computer vision tasks.
no code implementations • CVPR 2022 • Yi Zhou, HUI ZHANG, Hana Lee, Shuyang Sun, Pingjun Li, Yangguang Zhu, ByungIn Yoo, Xiaojuan Qi, Jae-Joon Han
We encode all panoptic entities in a video, including both foreground instances and background semantics, with a unified representation called panoptic slots.
2 code implementations • CVPR 2022 • Jie-Neng Chen, Shuyang Sun, Ju He, Philip Torr, Alan Yuille, Song Bai
The confidence of the label will be larger if the corresponding input image is weighted higher by the attention map.
1 code implementation • ICCV 2021 • Xiaoyu Yue, Shuyang Sun, Zhanghui Kuang, Meng Wei, Philip Torr, Wayne Zhang, Dahua Lin
As a typical example, the Vision Transformer (ViT) directly applies a pure transformer architecture on image classification, by simply splitting images into tokens with a fixed length, and employing transformers to learn relations between these tokens.
2 code implementations • 13 Jul 2021 • Shuyang Sun, Xiaoyu Yue, Song Bai, Philip Torr
To model the representations of the two levels, we first encode the information from the whole into part vectors through an attention mechanism, then decode the global information within the part vectors back into the whole representation.
Ranked #332 on
Image Classification
on ImageNet
no code implementations • ICCV 2021 • Shuyang Sun, Xiaoyu Yue, Xiaojuan Qi, Wanli Ouyang, Victor Adrian Prisacariu, Philip H.S. Torr
Aggregating features from different depths of a network is widely adopted to improve the network capability.
no code implementations • 24 Nov 2020 • Shuyang Sun, Liang Chen, Gregory Slabaugh, Philip Torr
Some image restoration tasks like demosaicing require difficult training samples to learn effective models.
no code implementations • 12 Sep 2020 • Yi Zhou, Shuyang Sun, Chao Zhang, Yikang Li, Wanli Ouyang
By assigning each relationship a single label, current approaches formulate the relationship detection as a classification problem.
1 code implementation • ICCV 2019 • Wenwei Zhang, Hui Zhou, Shuyang Sun, Zhe Wang, Jianping Shi, Chen Change Loy
Multi-sensor perception is crucial to ensure the reliability and accuracy in autonomous driving system, while multi-object tracking (MOT) improves that by tracing sequential movement of dynamic objects.
Ranked #17 on
Multiple Object Tracking
on KITTI Test (Online Methods)
(MOTA metric)
143 code implementations • 17 Jun 2019 • Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin
In this paper, we introduce the various features of this toolbox.
5 code implementations • CVPR 2019 • Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin
In exploring a more effective approach, we find that the key to a successful instance segmentation cascade is to fully leverage the reciprocal relationship between detection and segmentation.
Ranked #32 on
Object Detection
on COCO-O
6 code implementations • NeurIPS 2018 • Shuyang Sun, Jiangmiao Pang, Jianping Shi, Shuai Yi, Wanli Ouyang
The basic principles in designing convolutional neural network (CNN) structures for predicting objects on different levels, e. g., image-level, region-level, and pixel-level are diverging.
1 code implementation • CVPR 2018 • Shuyang Sun, Zhanghui Kuang, Wanli Ouyang, Lu Sheng, Wei zhang
In this study, we introduce a novel compact motion representation for video action recognition, named Optical Flow guided Feature (OFF), which enables the network to distill temporal information through a fast and robust approach.
Ranked #38 on
Action Recognition
on UCF101
1 code implementation • CVPR 2017 • Haiyu Zhao, Maoqing Tian, Shuyang Sun, Jing Shao, Junjie Yan, Shuai Yi, Xiaogang Wang, Xiaoou Tang
Person re-identification (ReID) is an important task in video surveillance and has various applications.