Search Results for author: Sitong Wu

Found 15 papers, 5 papers with code

RoboCoder: Robotic Learning from Basic Skills to General Tasks with Large Language Models

no code implementations6 Jun 2024 Jingyao Li, Pengguang Chen, Sitong Wu, Chuanyang Zheng, Hong Xu, Jiaya Jia

To address these limitations, the RoboCoder framework integrates Large Language Models (LLMs) with a dynamic learning system that uses real-time environmental feedback to continuously update and refine action codes.

Ensemble Quadratic Assignment Network for Graph Matching

no code implementations11 Mar 2024 Haoru Tan, Chuang Wang, Sitong Wu, Xu-Yao Zhang, Fei Yin, Cheng-Lin Liu

In this paper, we propose a graph neural network (GNN) based approach to combine the advantages of data-driven and traditional methods.

3D Shape Classification Graph Matching +1

SaCo Loss: Sample-wise Affinity Consistency for Vision-Language Pre-training

no code implementations CVPR 2024 Sitong Wu, Haoru Tan, Zhuotao Tian, Yukang Chen, Xiaojuan Qi, Jiaya Jia

We discover that the lack of consideration for sample-wise affinity consistency across modalities in existing training objectives is the central cause.

RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension

1 code implementation3 Aug 2023 Qiang Zhou, Chaohui Yu, Shaofeng Zhang, Sitong Wu, Zhibing Wang, Fan Wang

To this end, we propose to extract features corresponding to regional objects as soft prompts for LLM, which provides a straightforward and scalable approach and eliminates the need for LLM fine-tuning.

Image Comprehension

AxWin Transformer: A Context-Aware Vision Transformer Backbone with Axial Windows

no code implementations2 May 2023 Fangjian Lin, Yizhe Ma, Sitong Wu, Long Yu, Shengwei Tian

Recently Transformer has shown good performance in several vision tasks due to its powerful modeling capabilities.

PRSeg: A Lightweight Patch Rotate MLP Decoder for Semantic Segmentation

no code implementations1 May 2023 Yizhe Ma, Fangjian Lin, Sitong Wu, Shengwei Tian, Long Yu

We expect that our PRSeg can promote the development of MLP-based decoder in semantic segmentation.

Decoder Segmentation +1

UniNeXt: Exploring A Unified Architecture for Vision Recognition

1 code implementation26 Apr 2023 Fangjian Lin, Jianlong Yuan, Sitong Wu, Fan Wang, Zhibin Wang

Interestingly, the ranking of these spatial token mixers also changes under our UniNeXt, suggesting that an excellent spatial token mixer may be stifled due to a suboptimal general architecture, which further shows the importance of the study on the general architecture of vision backbone.

Spatial Token Mixer

Semantic Diffusion Network for Semantic Segmentation

no code implementations NeurIPS 2022 Haoru Tan, Sitong Wu, Jimin Pi

We then propose a novel learnable approach called semantic diffusion network (SDN) to approximate the diffusion process, which contains a parameterized semantic difference convolution operator followed by a feature fusion module.

Decoder Segmentation +1

Demystify Transformers & Convolutions in Modern Image Deep Networks

1 code implementation10 Nov 2022 Xiaowei Hu, Min Shi, Weiyun Wang, Sitong Wu, Linjie Xing, Wenhai Wang, Xizhou Zhu, Lewei Lu, Jie zhou, Xiaogang Wang, Yu Qiao, Jifeng Dai

Our experiments on various tasks and an analysis of inductive bias show a significant performance boost due to advanced network-level and block-level designs, but performance differences persist among different STMs.

Image Deep Networks Spatial Token Mixer

CATrans: Context and Affinity Transformer for Few-Shot Segmentation

no code implementations27 Apr 2022 Shan Zhang, Tianyi Wu, Sitong Wu, Guodong Guo

In this work, we effectively integrate the context and affinity information via the proposed novel Context and Affinity Transformer (CATrans) in a hierarchical architecture.

Relation Transfer Learning

Feature Selective Transformer for Semantic Image Segmentation

no code implementations26 Mar 2022 Fangjian Lin, Tianyi Wu, Sitong Wu, Shengwei Tian, Guodong Guo

In this work, we focus on fusing multi-scale features from Transformer-based backbones for semantic segmentation, and propose a Feature Selective Transformer (FeSeFormer), which aggregates features from all scales (or levels) for each query feature.

feature selection Image Segmentation +2

StructToken : Rethinking Semantic Segmentation with Structural Prior

no code implementations23 Mar 2022 Fangjian Lin, Zhanhao Liang, Sitong Wu, Junjun He, Kai Chen, Shengwei Tian

In previous deep-learning-based methods, semantic segmentation has been regarded as a static or dynamic per-pixel classification task, \textit{i. e.,} classify each pixel representation to a specific category.

Decision Making Segmentation +1

Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped Attention

2 code implementations28 Dec 2021 Sitong Wu, Tianyi Wu, Haoru Tan, Guodong Guo

To reduce the quadratic computation complexity caused by the global self-attention, various methods constrain the range of attention within a local region to improve its efficiency.

Instance Segmentation object-detection +2

Proxy Graph Matching with Proximal Matching Networks

no code implementations AAAI 2021 Haoru Tan, Chuang Wang, Sitong Wu, Tie-Qiang Wang, Xu-Yao Zhang, Cheng-Lin Liu

It consists of three parts: a graph neural network to generate a high-level local feature, an attention-based module to normalize the rotational transform, and a global feature matching module based on proximal optimization.

Graph Matching Graph Neural Network

Fully Transformer Networks for Semantic Image Segmentation

1 code implementation8 Jun 2021 Sitong Wu, Tianyi Wu, Fangjian Lin, Shengwei Tian, Guodong Guo

Transformers have shown impressive performance in various natural language processing and computer vision tasks, due to the capability of modeling long-range dependencies.

Decoder Face Parsing +3

Cannot find the paper you are looking for? You can Submit a new open access paper.