Search Results for author: Weixian Lei

Found 6 papers, 5 papers with code

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

1 code implementation26 Nov 2024 Kevin Qinghong Lin, Linjie Li, Difei Gao, Zhengyuan Yang, Shiwei Wu, Zechen Bai, Weixian Lei, Lijuan Wang, Mike Zheng Shou

In this work, we develop a vision-language-action model in digital world, namely ShowUI, which features the following innovations: (i) UI-Guided Visual Token Selection to reduce computational costs by formulating screenshots as an UI connected graph, adaptively identifying their redundant relationship and serve as the criteria for token selection during self-attention blocks; (ii) Interleaved Vision-Language-Action Streaming that flexibly unifies diverse needs within GUI tasks, enabling effective management of visual-action history in navigation or pairing multi-turn query-action sequences per screenshot to enhance training efficiency; (iii) Small-scale High-quality GUI Instruction-following Datasets by careful data curation and employing a resampling strategy to address significant data type imbalances.

Instruction Following Natural Language Visual Grounding

ViT-Lens: Towards Omni-modal Representations

1 code implementation CVPR 2024 Weixian Lei, Yixiao Ge, Kun Yi, Jianfeng Zhang, Difei Gao, Dylan Sun, Yuying Ge, Ying Shan, Mike Zheng Shou

In this paper, we present ViT-Lens-2 that facilitates efficient omni-modal representation learning by perceiving novel modalities with a pretrained ViT and aligning them to a pre-defined space.

EEG Image Generation +2

ViT-Lens: Initiating Omni-Modal Exploration through 3D Insights

1 code implementation20 Aug 2023 Weixian Lei, Yixiao Ge, Jianfeng Zhang, Dylan Sun, Kun Yi, Ying Shan, Mike Zheng Shou

A well-trained lens with a ViT backbone has the potential to serve as one of these foundation models, supervising the learning of subsequent modalities.

3D Classification Question Answering +4

PCCT: Progressive Class-Center Triplet Loss for Imbalanced Medical Image Classification

no code implementations11 Jul 2022 Kanghao Chen, Weixian Lei, Rong Zhang, Shen Zhao, Wei-Shi Zheng, Ruixuan Wang

For the class-center involved triplet loss, the positive and negative samples in each triplet are replaced by their corresponding class centers, which enforces data representations of the same class closer to the class center.

Image Classification Medical Image Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.