no code implementations • 6 Jun 2024 • Jingyao Li, Pengguang Chen, Sitong Wu, Chuanyang Zheng, Hong Xu, Jiaya Jia
To address these limitations, the RoboCoder framework integrates Large Language Models (LLMs) with a dynamic learning system that uses real-time environmental feedback to continuously update and refine action codes.
no code implementations • 11 Mar 2024 • Haoru Tan, Chuang Wang, Sitong Wu, Xu-Yao Zhang, Fei Yin, Cheng-Lin Liu
In this paper, we propose a graph neural network (GNN) based approach to combine the advantages of data-driven and traditional methods.
no code implementations • CVPR 2024 • Sitong Wu, Haoru Tan, Zhuotao Tian, Yukang Chen, Xiaojuan Qi, Jiaya Jia
We discover that the lack of consideration for sample-wise affinity consistency across modalities in existing training objectives is the central cause.
1 code implementation • 3 Aug 2023 • Qiang Zhou, Chaohui Yu, Shaofeng Zhang, Sitong Wu, Zhibing Wang, Fan Wang
To this end, we propose to extract features corresponding to regional objects as soft prompts for LLM, which provides a straightforward and scalable approach and eliminates the need for LLM fine-tuning.
no code implementations • 2 May 2023 • Fangjian Lin, Yizhe Ma, Sitong Wu, Long Yu, Shengwei Tian
Recently Transformer has shown good performance in several vision tasks due to its powerful modeling capabilities.
no code implementations • 1 May 2023 • Yizhe Ma, Fangjian Lin, Sitong Wu, Shengwei Tian, Long Yu
We expect that our PRSeg can promote the development of MLP-based decoder in semantic segmentation.
1 code implementation • 26 Apr 2023 • Fangjian Lin, Jianlong Yuan, Sitong Wu, Fan Wang, Zhibin Wang
Interestingly, the ranking of these spatial token mixers also changes under our UniNeXt, suggesting that an excellent spatial token mixer may be stifled due to a suboptimal general architecture, which further shows the importance of the study on the general architecture of vision backbone.
no code implementations • NeurIPS 2022 • Haoru Tan, Sitong Wu, Jimin Pi
We then propose a novel learnable approach called semantic diffusion network (SDN) to approximate the diffusion process, which contains a parameterized semantic difference convolution operator followed by a feature fusion module.
1 code implementation • 10 Nov 2022 • Xiaowei Hu, Min Shi, Weiyun Wang, Sitong Wu, Linjie Xing, Wenhai Wang, Xizhou Zhu, Lewei Lu, Jie zhou, Xiaogang Wang, Yu Qiao, Jifeng Dai
Our experiments on various tasks and an analysis of inductive bias show a significant performance boost due to advanced network-level and block-level designs, but performance differences persist among different STMs.
no code implementations • 27 Apr 2022 • Shan Zhang, Tianyi Wu, Sitong Wu, Guodong Guo
In this work, we effectively integrate the context and affinity information via the proposed novel Context and Affinity Transformer (CATrans) in a hierarchical architecture.
no code implementations • 26 Mar 2022 • Fangjian Lin, Tianyi Wu, Sitong Wu, Shengwei Tian, Guodong Guo
In this work, we focus on fusing multi-scale features from Transformer-based backbones for semantic segmentation, and propose a Feature Selective Transformer (FeSeFormer), which aggregates features from all scales (or levels) for each query feature.
no code implementations • 23 Mar 2022 • Fangjian Lin, Zhanhao Liang, Sitong Wu, Junjun He, Kai Chen, Shengwei Tian
In previous deep-learning-based methods, semantic segmentation has been regarded as a static or dynamic per-pixel classification task, \textit{i. e.,} classify each pixel representation to a specific category.
2 code implementations • 28 Dec 2021 • Sitong Wu, Tianyi Wu, Haoru Tan, Guodong Guo
To reduce the quadratic computation complexity caused by the global self-attention, various methods constrain the range of attention within a local region to improve its efficiency.
no code implementations • AAAI 2021 • Haoru Tan, Chuang Wang, Sitong Wu, Tie-Qiang Wang, Xu-Yao Zhang, Cheng-Lin Liu
It consists of three parts: a graph neural network to generate a high-level local feature, an attention-based module to normalize the rotational transform, and a global feature matching module based on proximal optimization.
1 code implementation • 8 Jun 2021 • Sitong Wu, Tianyi Wu, Fangjian Lin, Shengwei Tian, Guodong Guo
Transformers have shown impressive performance in various natural language processing and computer vision tasks, due to the capability of modeling long-range dependencies.