Search Results for author: Shuyang Sun

Found 21 papers, 13 papers with code

RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model

no code implementations16 Feb 2024 Jianhao Yuan, Shuyang Sun, Daniel Omeiza, Bo Zhao, Paul Newman, Lars Kunze, Matthew Gadd

Recent advancements in Multi-Modal Large Language models (MLLMs) have shown promising potential in enhancing the explainability as a driving agent by producing control predictions along with natural language explanations.

Autonomous Driving Decision Making +4

Real-Fake: Effective Training Data Synthesis Through Distribution Matching

1 code implementation16 Oct 2023 Jianhao Yuan, Jie Zhang, Shuyang Sun, Philip Torr, Bo Zhao

Synthetic training data has gained prominence in numerous learning tasks and scenarios, offering advantages such as dataset augmentation, generalization evaluation, and privacy preservation.

Image Classification Out-of-Distribution Generalization

OxfordTVG-HIC: Can Machine Make Humorous Captions from Images?

no code implementations ICCV 2023 Runjia Li, Shuyang Sun, Mohamed Elhoseiny, Philip Torr

Hence, humour generation and understanding can serve as a new task for evaluating the ability of deep-learning methods to process abstract and subjective information.

Image Captioning

ReMaX: Relaxing for Better Training on Efficient Panoptic Segmentation

1 code implementation NeurIPS 2023 Shuyang Sun, Weijun Wang, Qihang Yu, Andrew Howard, Philip Torr, Liang-Chieh Chen

This paper presents a new mechanism to facilitate the training of mask transformers for efficient panoptic segmentation, democratizing its deployment.

Panoptic Segmentation Segmentation

LUMix: Improving Mixup by Better Modelling Label Uncertainty

no code implementations29 Nov 2022 Shuyang Sun, Jie-Neng Chen, Ruifei He, Alan Yuille, Philip Torr, Song Bai

LUMix is simple as it can be implemented in just a few lines of code and can be universally applied to any deep networks \eg CNNs and Vision Transformers, with minimal computational cost.

Data Augmentation

Is synthetic data from generative models ready for image recognition?

1 code implementation14 Oct 2022 Ruifei He, Shuyang Sun, Xin Yu, Chuhui Xue, Wenqing Zhang, Philip Torr, Song Bai, Xiaojuan Qi

Recent text-to-image generation models have shown promising results in generating high-fidelity photo-realistic images.

Transfer Learning

Slot-VPS: Object-centric Representation Learning for Video Panoptic Segmentation

no code implementations CVPR 2022 Yi Zhou, HUI ZHANG, Hana Lee, Shuyang Sun, Pingjun Li, Yangguang Zhu, ByungIn Yoo, Xiaojuan Qi, Jae-Joon Han

We encode all panoptic entities in a video, including both foreground instances and background semantics, with a unified representation called panoptic slots.

Object Representation Learning +1

TransMix: Attend to Mix for Vision Transformers

2 code implementations CVPR 2022 Jie-Neng Chen, Shuyang Sun, Ju He, Philip Torr, Alan Yuille, Song Bai

The confidence of the label will be larger if the corresponding input image is weighted higher by the attention map.

Instance Segmentation object-detection +3

Vision Transformer with Progressive Sampling

1 code implementation ICCV 2021 Xiaoyu Yue, Shuyang Sun, Zhanghui Kuang, Meng Wei, Philip Torr, Wayne Zhang, Dahua Lin

As a typical example, the Vision Transformer (ViT) directly applies a pure transformer architecture on image classification, by simply splitting images into tokens with a fixed length, and employing transformers to learn relations between these tokens.

Image Classification

Visual Parser: Representing Part-whole Hierarchies with Transformers

2 code implementations13 Jul 2021 Shuyang Sun, Xiaoyu Yue, Song Bai, Philip Torr

To model the representations of the two levels, we first encode the information from the whole into part vectors through an attention mechanism, then decode the global information within the part vectors back into the whole representation.

Image Classification Instance Segmentation +3

Learning to Sample the Most Useful Training Patches from Images

no code implementations24 Nov 2020 Shuyang Sun, Liang Chen, Gregory Slabaugh, Philip Torr

Some image restoration tasks like demosaicing require difficult training samples to learn effective models.

Demosaicking

Exploring the Hierarchy in Relation Labels for Scene Graph Generation

no code implementations12 Sep 2020 Yi Zhou, Shuyang Sun, Chao Zhang, Yikang Li, Wanli Ouyang

By assigning each relationship a single label, current approaches formulate the relationship detection as a classification problem.

Graph Generation Relation +2

Robust Multi-Modality Multi-Object Tracking

1 code implementation ICCV 2019 Wenwei Zhang, Hui Zhou, Shuyang Sun, Zhe Wang, Jianping Shi, Chen Change Loy

Multi-sensor perception is crucial to ensure the reliability and accuracy in autonomous driving system, while multi-object tracking (MOT) improves that by tracing sequential movement of dynamic objects.

Autonomous Driving Multi-Object Tracking +2

Hybrid Task Cascade for Instance Segmentation

5 code implementations CVPR 2019 Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin

In exploring a more effective approach, we find that the key to a successful instance segmentation cascade is to fully leverage the reciprocal relationship between detection and segmentation.

Instance Segmentation object-detection +5

FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction

6 code implementations NeurIPS 2018 Shuyang Sun, Jiangmiao Pang, Jianping Shi, Shuai Yi, Wanli Ouyang

The basic principles in designing convolutional neural network (CNN) structures for predicting objects on different levels, e. g., image-level, region-level, and pixel-level are diverging.

Image Classification

Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition

1 code implementation CVPR 2018 Shuyang Sun, Zhanghui Kuang, Wanli Ouyang, Lu Sheng, Wei zhang

In this study, we introduce a novel compact motion representation for video action recognition, named Optical Flow guided Feature (OFF), which enables the network to distill temporal information through a fast and robust approach.

Action Recognition In Videos Optical Flow Estimation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.