1 code implementation • 17 Feb 2025 • Zichen Wen, Yifeng Gao, Shaobo Wang, Junyuan Zhang, Qintong Zhang, Weijia Li, Conghui He, Linfeng Zhang
Abundant recent methods aim to solve this problem with token pruning, which first defines an importance criterion for tokens and then prunes the unimportant vision tokens during inference.
no code implementations • 17 Feb 2025 • Zichen Wen, Yifeng Gao, Weijia Li, Conghui He, Linfeng Zhang
Multimodal large language models (MLLMs) have shown remarkable performance for cross-modal understanding and generation, yet still suffer from severe inference costs.
1 code implementation • 22 Dec 2024 • Junyan Ye, Honglin Lin, Leyan Ou, Dairong Chen, ZiHao Wang, Conghui He, Weijia Li
In this work, we introduce a novel task for cross-view geo-localization with natural language descriptions, which aims to retrieve corresponding satellite images or OSM database based on scene text.
no code implementations • 10 Dec 2024 • Weijia Li, Guang Hu, Yangmengfei Xu
We then implemented the PJP model in several well-known domains and compared it with the JP model in the experiments.
no code implementations • 13 Oct 2024 • Junyan Ye, Baichuan Zhou, Zilong Huang, Junan Zhang, Tianyi Bai, Hengrui Kang, Jun He, Honglin Lin, ZiHao Wang, Tong Wu, Zhizheng Wu, Yiping Chen, Dahua Lin, Conghui He, Weijia Li
With the rapid development of AI-generated content, the future internet may be inundated with synthetic data, making the discrimination of authentic and credible multimodal data increasingly challenging.
1 code implementation • 30 Aug 2024 • Baichuan Zhou, Haote Yang, Dairong Chen, Junyan Ye, Tianyi Bai, Jinhua Yu, Songyang Zhang, Dahua Lin, Conghui He, Weijia Li
Moreover, existing urban benchmarks have been limited to evaluating LMMs with basic region-level urban tasks under singular views, leading to incomplete evaluations of LMMs' abilities in urban environments.
no code implementations • 27 Aug 2024 • Weijia Li, Jun He, Junyan Ye, Huaping Zhong, Zhimeng Zheng, Zilong Huang, Dahua Lin, Conghui He
In this work, we propose CrossViewDiff, a cross-view diffusion model for satellite-to-street view synthesis.
no code implementations • 18 Aug 2024 • Weijia Li, Jinhua Yu, Dairong Chen, Yi Lin, Runmin Dong, Xiang Zhang, Conghui He, Haohuan Fu
In this work, we propose a geometry-aware semi-supervised framework for fine-grained building function recognition, utilizing geometric relationships among multi-source data to enhance pseudo-label accuracy in semi-supervised learning, broadening its applicability to various building function categorization systems.
1 code implementation • 10 Aug 2024 • Junyan Ye, Zhutao Lv, Weijia Li, Jinhua Yu, Haote Yang, Huaping Zhong, Conghui He
In the existing retrieval of street view panorama images and satellite images, we introduce BEV and satellite image retrieval branches for collaborative retrieval.
no code implementations • 3 Aug 2024 • Junyan Ye, Jun He, Weijia Li, Zhutao Lv, Yi Lin, Jinhua Yu, Haote Yang, Conghui He
In this paper, we introduce SkyDiffusion, a novel cross-view generation method for synthesizing aerial images from street view images, utilizing a diffusion model and the Bird's-Eye View (BEV) paradigm.
no code implementations • 16 Jul 2024 • Lihan Tong, Yun Liu, Weijia Li, Liyuan Chen, ErKang Chen
Our work not only advances the field of image dehazing but also offers insights into the design of attention mechanisms for broader applications in computer vision.
no code implementations • 28 Jun 2024 • Lihan Tong, Weijia Li, Qingxia Yang, Liyuan Chen, Peng Chen
We present Ksformer, utilizing Multi-scale Key-select Routing Attention (MKRA) for intelligent selection of key areas through multi-channel, multi-scale windows with a top-k operator, and Lightweight Frequency Processing Module (LFPM) to enhance high-frequency features, outperforming other dehazing methods in tests.
no code implementations • 24 Jun 2024 • Shilei Cao, Yan Liu, Juepeng Zheng, Weijia Li, Runmin Dong, Haohuan Fu
We demonstrate the effectiveness of AMROD on four CTTA object detection tasks, where AMROD outperforms existing methods, especially achieving a 3. 2 mAP improvement and a 20% increase in efficiency on the Cityscapes-to-Cityscapes-C CTTA task.
no code implementations • 12 Jun 2024 • Lixian Zhang, Yi Zhao, Runmin Dong, Jinxiao Zhang, Shuai Yuan, Shilei Cao, Mengxuan Chen, Juepeng Zheng, Weijia Li, Wei Liu, Wayne Zhang, Litong Feng, Haohuan Fu
A$^{2}$-MAE integrates an anchor-aware masking strategy and a geographic encoding module to comprehensively exploit the properties of RS images.
no code implementations • 9 May 2024 • Lihan Tong, Yun Liu, Tian Ye, Weijia Li, Liyuan Chen, ErKang Chen
The objective of single image dehazing is to restore hazy images and produce clear, high-quality visuals.
1 code implementation • CVPR 2024 • Weijia Li, Haote Yang, Zhenghao Hu, Juepeng Zheng, Gui-Song Xia, Conghui He
3D building reconstruction from monocular remote sensing images is an important and challenging research problem that has received increasing attention in recent years, owing to its low cost of data acquisition and availability for large-scale applications.
1 code implementation • CVPR 2024 • Junyan Ye, Qiyan Luo, Jinhua Yu, Huaping Zhong, Zhimeng Zheng, Conghui He, Weijia Li
This paper aims at achieving fine-grained building attribute segmentation in a cross-view scenario, i. e., using satellite and street-view image pairs.
2 code implementations • 29 Mar 2024 • Chao Pang, Xingxing Weng, Jiang Wu, Jiayu Li, Yi Liu, Jiaxing Sun, Weijia Li, Shuai Wang, Litong Feng, Gui-Song Xia, Conghui He
VHM is built on a large-scale remote sensing image-text dataset with rich-content captions (VersaD), and an honest instruction dataset comprising both factual and deceptive questions (HnstD).
1 code implementation • CVPR 2024 • Runmin Dong, Shuai Yuan, Bin Luo, Mengxuan Chen, Jinxiao Zhang, Lixian Zhang, Weijia Li, Juepeng Zheng, Haohuan Fu
Specifically, we inject the priors into the denoising model to improve the utilization of reference information in unchanged areas and regulate the reconstruction of semantically relevant content in changed areas.
1 code implementation • 21 Dec 2023 • Yiqi Lin, Conghui He, Alex Jinpeng Wang, Bin Wang, Weijia Li, Mike Zheng Shou
Despite CLIP being the foundation model in numerous vision-language applications, the CLIP suffers from a severe text spotting bias.
no code implementations • 20 Oct 2023 • Juepeng Zheng, Shuai Yuan, Weijia Li, Haohuan Fu, Le Yu
); (2) traditional machine learning methods (such as random forest, decision tree, etc.
2 code implementations • 24 Aug 2023 • Bin Wang, Fan Wu, Xiao Han, Jiahui Peng, Huaping Zhong, Pan Zhang, Xiaoyi Dong, Weijia Li, Wei Li, Jiaqi Wang, Conghui He
A practical solution to this problem would be to utilize the available multimodal large language models (MLLMs) to generate instruction data for vision-language tasks.
2 code implementations • CVPR 2023 • Weijia Li, Saihui Hou, Chunjie Zhang, Chunshui Cao, Xu Liu, Yongzhen Huang, Yao Zhao
For the cloth-changing problem, video-based ReID is rarely studied due to the lack of a suitable cloth-changing benchmark, and gait recognition is often researched under controlled conditions.
no code implementations • ICCV 2023 • Runmin Dong, Lichao Mou, Mengxuan Chen, Weijia Li, Xin-Yi Tong, Shuai Yuan, Lixian Zhang, Juepeng Zheng, Xiaoxiang Zhu, Haohuan Fu
Moreover, we propose the Class Center Contrast method to jointly utilize the labeled and unlabeled data.
no code implementations • 11 Dec 2022 • Yiqi Lin, Huabin Zheng, Huaping Zhong, Jinjing Zhu, Weijia Li, Conghui He, Lin Wang
To address these issues, we build a task-specific self-supervised pre-training framework from a data selection perspective based on a simple hypothesis that pre-training on the unlabeled samples with similar distribution to the target task can bring substantial performance gains.
no code implementations • CVPR 2023 • Weijia Li, Yawen Lai, Linning Xu, Yuanbo Xiangli, Jinhua Yu, Conghui He, Gui-Song Xia, Dahua Lin
More precisely, the OmniCity contains multi-view satellite images as well as street-level panorama and mono-view images, constituting over 100K pixel-wise annotated images that are well-aligned and collected from 25K geo-locations in New York City.
1 code implementation • 17 May 2022 • Dinghao Yang, Bin Wang, Weijia Li, Yiqi Lin, Conghui He
Although avoiding the extensive labors of trimap annotation, existing methods still suffer from two limitations: (1) For the single image with multiple objects, it is essential to provide extra interaction information to help determining the matting target; (2) For transparent objects, the accurate regression of alpha matte from RGB image is much more difficult compared with the opaque ones.
1 code implementation • 28 Apr 2022 • Jinwang Wang, Lingxuan Meng, Weijia Li, Wen Yang, Lei Yu, Gui-Song Xia
In this paper, we propose an offset vector learning scheme, which turns the building footprint extraction problem in off-nadir images into an instance-level joint prediction problem of the building roof and its corresponding "roof to footprint" offset vector.
1 code implementation • ICCV 2021 • Zhuoming Liu, Hao Ding, Huaping Zhong, Weijia Li, Jifeng Dai, Conghui He
To obtain the Influence of the unlabeled sample in the active learning scenario, we design the Untrained Unlabeled sample Influence Calculation(UUIC) to estimate the unlabeled sample's expected gradient with which we calculate its Influence.
no code implementations • ICCV 2021 • Weijia Li, Lingxuan Meng, Jinwang Wang, Conghui He, Gui-Song Xia, Dahua Lin
3D building reconstruction from monocular remote sensing imagery is an important research problem and an economic solution to large-scale city modeling, compared with reconstruction from LiDAR data and multi-view imagery.
1 code implementation • 26 Aug 2020 • Juepeng Zheng, Haohuan Fu, Weijia Li, Wenzhao Wu, Yi Zhao, Runmin Dong, Le Yu
In this paper, we propose a novel domain adaptive oil palm tree detection method, i. e., a Multi-level Attention Domain Adaptation Network (MADAN) to reap cross-regional oil palm tree counting and detection.