no code implementations • 20 Oct 2023 • Siyu Zhang, Yeming Chen, Sirui Cheng, Yaoru Sun, Jun Yang, Lizhi Bai
It parses the entire image as a fine-to-coarse hierarchical structure of constituent visual patterns, and captures multiscale features by progressively merging adjacent superpixels as graph nodes.
1 code implementation • 24 Sep 2023 • Haoran Wang, Zeshen Tang, Leya Yang, Yaoru Sun, Fang Wang, Siyu Zhang, Yeming Chen
Here, we propose a goal-conditioned HRL framework named Guided Cooperation via Model-based Rollout (GCMR), aiming to bridge inter-layer information synchronization and cooperation by exploiting forward dynamics.
Hierarchical Reinforcement Learning reinforcement-learning +1
no code implementations • 18 Aug 2023 • Yeming Chen, Siyu Zhang, Yaoru Sun, Weijian Liang, Haoran Wang
In this work, we propose an efficient computation framework for multimodal alignment by introducing a novel visual semantic module to further improve the performance of the VL tasks.
no code implementations • 26 Jul 2023 • Siyu Zhang, Yeming Chen, Yaoru Sun, Fang Wang, Haibo Shi, Haoran Wang
Visual question answering (VQA) has been intensively studied as a multimodal task that requires effort in bridging vision and language to infer answers correctly.
1 code implementation • 26 Mar 2023 • Guorun Wang, Jun Yang, Yaoru Sun
Adapters are to freeze the model and give it a new weight matrix on the side, which can significantly reduce the time and memory of training, but the cost is that the evaluation and testing will increase the time and memory consumption.
no code implementations • 23 Feb 2023 • Jun Yang, Lizhi Bai, Yaoru Sun, Chunqi Tian, Maoyu Mao, Guorun Wang
For the Depth branch, we propose a Pixel Difference Convolution (PDC) to consider local and detailed geometric information in Depth data via aggregating both intensity and gradient information.
Ranked #13 on Semantic Segmentation on SUN-RGBD (using extra training data)
no code implementations • 13 Oct 2022 • Lizhi Bai, Jun Yang, Chunqi Tian, Yaoru Sun, Maoyu Mao, Yanjun Xu, Weirong Xu
A two-branch network built with DCA and EDCA, called Differential Convolutional Network (DCANet), is proposed to fuse local and global information of two-modal data.
Ranked #13 on Semantic Segmentation on SUN-RGBD (using extra training data)