Search Results for author: Shuyang Sun

Found 23 papers, 13 papers with code

kNN-CLIP: Retrieval Enables Training-Free Segmentation on Continually Expanding Large Vocabularies

no code implementations • 15 Apr 2024 • Zhongrui Gui, Shuyang Sun, Runjia Li, Jianhao Yuan, Zhaochong An, Karsten Roth, Ameya Prabhu, Philip Torr

Rapid advancements in continual segmentation have yet to bridge the gap of scaling to large continually expanding vocabularies under compute-constrained scenarios.

Panoptic Segmentation Retrieval +2

Paper
Add Code

SynArtifact: Classifying and Alleviating Artifacts in Synthetic Images via Vision-Language Model

no code implementations • 28 Feb 2024 • Bin Cao, Jianhao Yuan, Yexin Liu, Jian Li, Shuyang Sun, Jing Liu, Bo Zhao

To alleviate artifacts and improve quality of synthetic images, we fine-tune Vision-Language Model (VLM) as artifact classifier to automatically identify and classify a wide range of artifacts and provide supervision for further optimizing generative models.

Image Generation Language Modelling

Paper
Add Code

RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model

no code implementations • 16 Feb 2024 • Jianhao Yuan, Shuyang Sun, Daniel Omeiza, Bo Zhao, Paul Newman, Lars Kunze, Matthew Gadd

Recent advancements in Multi-Modal Large Language models (MLLMs) have shown promising potential in enhancing the explainability as a driving agent by producing control predictions along with natural language explanations.

Autonomous Driving Decision Making +4

Paper
Add Code

CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor

no code implementations • 12 Dec 2023 • Shuyang Sun, Runjia Li, Philip Torr, Xiuye Gu, Siyang Li

Mask labels are labor-intensive, which limits the number of categories in segmentation datasets.

Image Segmentation Segmentation +1

Paper
Add Code

Real-Fake: Effective Training Data Synthesis Through Distribution Matching

1 code implementation • 16 Oct 2023 • Jianhao Yuan, Jie Zhang, Shuyang Sun, Philip Torr, Bo Zhao

Synthetic training data has gained prominence in numerous learning tasks and scenarios, offering advantages such as dataset augmentation, generalization evaluation, and privacy preservation.

Image Classification Out-of-Distribution Generalization

Paper
Code

OxfordTVG-HIC: Can Machine Make Humorous Captions from Images?

no code implementations • ICCV 2023 • Runjia Li, Shuyang Sun, Mohamed Elhoseiny, Philip Torr

Hence, humour generation and understanding can serve as a new task for evaluating the ability of deep-learning methods to process abstract and subjective information.

Image Captioning

Paper
Add Code

ReMaX: Relaxing for Better Training on Efficient Panoptic Segmentation

1 code implementation • NeurIPS 2023 • Shuyang Sun, Weijun Wang, Qihang Yu, Andrew Howard, Philip Torr, Liang-Chieh Chen

This paper presents a new mechanism to facilitate the training of mask transformers for efficient panoptic segmentation, democratizing its deployment.

Panoptic Segmentation Segmentation

982

Paper
Code

LUMix: Improving Mixup by Better Modelling Label Uncertainty

no code implementations • 29 Nov 2022 • Shuyang Sun, Jie-Neng Chen, Ruifei He, Alan Yuille, Philip Torr, Song Bai

LUMix is simple as it can be implemented in just a few lines of code and can be universally applied to any deep networks \eg CNNs and Vision Transformers, with minimal computational cost.

Data Augmentation

Paper
Add Code

Is synthetic data from generative models ready for image recognition?

1 code implementation • 14 Oct 2022 • Ruifei He, Shuyang Sun, Xin Yu, Chuhui Xue, Wenqing Zhang, Philip Torr, Song Bai, Xiaojuan Qi

Recent text-to-image generation models have shown promising results in generating high-fidelity photo-realistic images.

Text-to-Image Generation Transfer Learning

162

Paper
Code

Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability

1 code implementation • CVPR 2022 • Ruifei He, Shuyang Sun, Jihan Yang, Song Bai, Xiaojuan Qi

Large-scale pre-training has been proven to be crucial for various computer vision tasks.

Knowledge Distillation

Paper
Code

Slot-VPS: Object-centric Representation Learning for Video Panoptic Segmentation

no code implementations • CVPR 2022 • Yi Zhou, HUI ZHANG, Hana Lee, Shuyang Sun, Pingjun Li, Yangguang Zhu, ByungIn Yoo, Xiaojuan Qi, Jae-Joon Han

We encode all panoptic entities in a video, including both foreground instances and background semantics, with a unified representation called panoptic slots.

Object Representation Learning +1

Paper
Add Code

TransMix: Attend to Mix for Vision Transformers

2 code implementations • CVPR 2022 • Jie-Neng Chen, Shuyang Sun, Ju He, Philip Torr, Alan Yuille, Song Bai

The confidence of the label will be larger if the corresponding input image is weighted higher by the attention map.

Instance Segmentation object-detection +3

566

Paper
Code

Vision Transformer with Progressive Sampling

1 code implementation • ICCV 2021 • Xiaoyu Yue, Shuyang Sun, Zhanghui Kuang, Meng Wei, Philip Torr, Wayne Zhang, Dahua Lin

As a typical example, the Vision Transformer (ViT) directly applies a pure transformer architecture on image classification, by simply splitting images into tokens with a fixed length, and employing transformers to learn relations between these tokens.

Image Classification

147

Paper
Code

Visual Parser: Representing Part-whole Hierarchies with Transformers

2 code implementations • 13 Jul 2021 • Shuyang Sun, Xiaoyu Yue, Song Bai, Philip Torr

To model the representations of the two levels, we first encode the information from the whole into part vectors through an attention mechanism, then decode the global information within the part vectors back into the whole representation.

Ranked #313 on Image Classification on ImageNet

Image Classification Instance Segmentation +3

124

Paper
Code

Aggregation With Feature Detection

no code implementations • ICCV 2021 • Shuyang Sun, Xiaoyu Yue, Xiaojuan Qi, Wanli Ouyang, Victor Adrian Prisacariu, Philip H.S. Torr

Aggregating features from different depths of a network is widely adopted to improve the network capability.

Instance Segmentation object-detection +2

Paper
Add Code

Learning to Sample the Most Useful Training Patches from Images

no code implementations • 24 Nov 2020 • Shuyang Sun, Liang Chen, Gregory Slabaugh, Philip Torr

Some image restoration tasks like demosaicing require difficult training samples to learn effective models.

Demosaicking

Paper
Add Code

Exploring the Hierarchy in Relation Labels for Scene Graph Generation

no code implementations • 12 Sep 2020 • Yi Zhou, Shuyang Sun, Chao Zhang, Yikang Li, Wanli Ouyang

By assigning each relationship a single label, current approaches formulate the relationship detection as a classification problem.

Graph Generation Relation +2

Paper
Add Code

Robust Multi-Modality Multi-Object Tracking

1 code implementation • ICCV 2019 • Wenwei Zhang, Hui Zhou, Shuyang Sun, Zhe Wang, Jianping Shi, Chen Change Loy

Multi-sensor perception is crucial to ensure the reliability and accuracy in autonomous driving system, while multi-object tracking (MOT) improves that by tracing sequential movement of dynamic objects.

Ranked #10 on Multiple Object Tracking on KITTI Tracking test

Autonomous Driving Multi-Object Tracking +2

252

Paper
Code

MMDetection: Open MMLab Detection Toolbox and Benchmark

144 code implementations • 17 Jun 2019 • Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin

In this paper, we introduce the various features of this toolbox.

Benchmarking Instance Segmentation +2

27,693

Paper
Code

Hybrid Task Cascade for Instance Segmentation

5 code implementations • CVPR 2019 • Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin

In exploring a more effective approach, we find that the key to a successful instance segmentation cascade is to fully leverage the reciprocal relationship between detection and segmentation.

Ranked #32 on Object Detection on COCO-O

Instance Segmentation object-detection +4

27,693

Paper
Code

FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction

6 code implementations • NeurIPS 2018 • Shuyang Sun, Jiangmiao Pang, Jianping Shi, Shuai Yi, Wanli Ouyang

The basic principles in designing convolutional neural network (CNN) structures for predicting objects on different levels, e. g., image-level, region-level, and pixel-level are diverging.

Image Classification

2,917

Paper
Code

Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition

1 code implementation • CVPR 2018 • Shuyang Sun, Zhanghui Kuang, Wanli Ouyang, Lu Sheng, Wei zhang

In this study, we introduce a novel compact motion representation for video action recognition, named Optical Flow guided Feature (OFF), which enables the network to distill temporal information through a fast and robust approach.

Ranked #36 on Action Recognition on UCF101

Action Recognition In Videos Optical Flow Estimation +1

196

Paper
Code

Spindle Net: Person Re-Identification With Human Body Region Guided Feature Decomposition and Fusion

1 code implementation • CVPR 2017 • Haiyu Zhao, Maoqing Tian, Shuyang Sun, Jing Shao, Junjie Yan, Shuai Yi, Xiaogang Wang, Xiaoou Tang

Person re-identification (ReID) is an important task in video surveillance and has various applications.

Person Re-Identification

115

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.