Attention-driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models without Fine-Tuning

1 code implementation14 Dec 2024 Hai-Ming Xu, Qi Chen, Lei Wang, Lingqiao Liu

Additionally, we demonstrate that our attention map-based grounding technique significantly outperforms direct localization predictions from MiniCPM-Llama3-V 2. 5, highlighting the potential of using attention maps from pretrained MLLMs and paving the way for future innovations in this domain.


Progressive Feature Adjustment for Semi-supervised Learning from Pretrained Models

no code implementations9 Sep 2023 Hai-Ming Xu, Lingqiao Liu, Hao Chen, Ehsan Abbasnejad, Rafael Felix

As an effective way to alleviate the burden of data annotation, semi-supervised learning (SSL) provides an attractive solution due to its ability to leverage both labeled and unlabeled data to build a predictive model.

Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View

no code implementations CVPR 2023 Shuo Wang, Xinhai Zhao, Hai-Ming Xu, Zehui Chen, Dameng Yu, Jiahao Chang, Zhen Yang, Feng Zhao

Based on the covariate shift assumption, we find that the gap mainly attributes to the feature distribution of BEV, which is determined by the quality of both depth estimation and 2D image's feature representation.

3D Object Detection Depth Estimation +3

ProtoTransfer: Cross-Modal Prototype Transfer for Point Cloud Segmentation

no code implementations ICCV 2023 Pin Tang, Hai-Ming Xu, Chao Ma

Knowledge transfer from multi-modal, i. e., LiDAR points and images, to a single LiDAR modal can take advantage of complimentary information from modal-fusion but keep a single modal inference speed, showing a promising direction for point cloud semantic segmentation in autonomous driving.

Autonomous Driving Point Cloud Segmentation +2

Semi-supervised Semantic Segmentation with Prototype-based Consistency Regularization

1 code implementation10 Oct 2022 Hai-Ming Xu, Lingqiao Liu, Qiuchen Bian, Zhen Yang

Semi-supervised semantic segmentation requires the model to effectively propagate the label information from limited annotated images to unlabeled ones.

Diversity Semi-Supervised Semantic Segmentation

Dual Decision Improves Open-Set Panoptic Segmentation

no code implementations6 Jul 2022 Hai-Ming Xu, Hao Chen, Lingqiao Liu, Yufei Yin

Then we distinguish the "unknown things" from the background by using the additional object prediction head.

Panoptic Segmentation

Semi-supervised Learning via Conditional Rotation Angle Estimation

no code implementations9 Jan 2020 Hai-Ming Xu, Lingqiao Liu, Dong Gong

Our insight is that the prediction target in SemSL can be modeled as the latent factor in the predictor for the SlfSL target.

Self-Supervised Learning

