Search Results for author: Weijia Li

Found 31 papers, 15 papers with code

Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More

1 code implementation17 Feb 2025 Zichen Wen, Yifeng Gao, Shaobo Wang, Junyuan Zhang, Qintong Zhang, Weijia Li, Conghui He, Linfeng Zhang

Abundant recent methods aim to solve this problem with token pruning, which first defines an importance criterion for tokens and then prunes the unimportant vision tokens during inference.

Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?

no code implementations17 Feb 2025 Zichen Wen, Yifeng Gao, Weijia Li, Conghui He, Linfeng Zhang

Multimodal large language models (MLLMs) have shown remarkable performance for cross-modal understanding and generation, yet still suffer from severe inference costs.

Where am I? Cross-View Geo-localization with Natural Language Descriptions

1 code implementation22 Dec 2024 Junyan Ye, Honglin Lin, Leyan Ou, Dairong Chen, ZiHao Wang, Conghui He, Weijia Li

In this work, we introduce a novel task for cross-view geo-localization with natural language descriptions, which aims to retrieve corresponding satellite images or OSM database based on scene text.

geo-localization Image Retrieval +2

Beyond Static Assumptions: the Predictive Justified Perspective Model for Epistemic Planning

no code implementations10 Dec 2024 Weijia Li, Guang Hu, Yangmengfei Xu

We then implemented the PJP model in several well-known domains and compared it with the JP model in the experiments.

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

no code implementations13 Oct 2024 Junyan Ye, Baichuan Zhou, Zilong Huang, Junan Zhang, Tianyi Bai, Hengrui Kang, Jun He, Honglin Lin, ZiHao Wang, Tong Wu, Zhizheng Wu, Yiping Chen, Dahua Lin, Conghui He, Weijia Li

With the rapid development of AI-generated content, the future internet may be inundated with synthetic data, making the discrimination of authentic and credible multimodal data increasingly challenging.

Multiple-choice

UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios

1 code implementation30 Aug 2024 Baichuan Zhou, Haote Yang, Dairong Chen, Junyan Ye, Tianyi Bai, Jinhua Yu, Songyang Zhang, Dahua Lin, Conghui He, Weijia Li

Moreover, existing urban benchmarks have been limited to evaluating LMMs with basic region-level urban tasks under singular views, leading to incomplete evaluations of LMMs' abilities in urban environments.

Attribute geo-localization +1

Fine-Grained Building Function Recognition from Street-View Images via Geometry-Aware Semi-Supervised Learning

no code implementations18 Aug 2024 Weijia Li, Jinhua Yu, Dairong Chen, Yi Lin, Runmin Dong, Xiang Zhang, Conghui He, Haohuan Fu

In this work, we propose a geometry-aware semi-supervised framework for fine-grained building function recognition, utilizing geometric relationships among multi-source data to enhance pseudo-label accuracy in semi-supervised learning, broadening its applicability to various building function categorization systems.

Pseudo Label

Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network

1 code implementation10 Aug 2024 Junyan Ye, Zhutao Lv, Weijia Li, Jinhua Yu, Haote Yang, Huaping Zhong, Conghui He

In the existing retrieval of street view panorama images and satellite images, we introduce BEV and satellite image retrieval branches for collaborative retrieval.

geo-localization Image Retrieval +1

SkyDiffusion: Ground-to-Aerial Image Synthesis with Diffusion Models and BEV Paradigm

no code implementations3 Aug 2024 Junyan Ye, Jun He, Weijia Li, Zhutao Lv, Yi Lin, Jinhua Yu, Haote Yang, Conghui He

In this paper, we introduce SkyDiffusion, a novel cross-view generation method for synthesizing aerial images from street view images, utilizing a diffusion model and the Bird's-Eye View (BEV) paradigm.

Image Generation SSIM

Haze-Aware Attention Network for Single-Image Dehazing

no code implementations16 Jul 2024 Lihan Tong, Yun Liu, Weijia Li, Liyuan Chen, ErKang Chen

Our work not only advances the field of image dehazing but also offers insights into the design of attention mechanisms for broader applications in computer vision.

Image Dehazing Image Restoration +1

Vision Transformer with Key-select Routing Attention for Single Image Dehazing

no code implementations28 Jun 2024 Lihan Tong, Weijia Li, Qingxia Yang, Liyuan Chen, Peng Chen

We present Ksformer, utilizing Multi-scale Key-select Routing Attention (MKRA) for intelligent selection of key areas through multi-channel, multi-scale windows with a top-k operator, and Lightweight Frequency Processing Module (LFPM) to enhance high-frequency features, outperforming other dehazing methods in tests.

Image Dehazing Single Image Dehazing

Exploring Test-Time Adaptation for Object Detection in Continually Changing Environments

no code implementations24 Jun 2024 Shilei Cao, Yan Liu, Juepeng Zheng, Weijia Li, Runmin Dong, Haohuan Fu

We demonstrate the effectiveness of AMROD on four CTTA object detection tasks, where AMROD outperforms existing methods, especially achieving a 3. 2 mAP improvement and a 20% increase in efficiency on the Cityscapes-to-Cityscapes-C CTTA task.

Contrastive Learning Object +3

Parallel Cross Strip Attention Network for Single Image Dehazing

no code implementations9 May 2024 Lihan Tong, Yun Liu, Tian Ye, Weijia Li, Liyuan Chen, ErKang Chen

The objective of single image dehazing is to restore hazy images and produce clear, high-quality visuals.

Image Dehazing Single Image Dehazing

3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions

1 code implementation CVPR 2024 Weijia Li, Haote Yang, Zhenghao Hu, Juepeng Zheng, Gui-Song Xia, Conghui He

3D building reconstruction from monocular remote sensing images is an important and challenging research problem that has received increasing attention in recent years, owing to its low cost of data acquisition and availability for large-scale applications.

3D Reconstruction

SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation

1 code implementation CVPR 2024 Junyan Ye, Qiyan Luo, Jinhua Yu, Huaping Zhong, Zhimeng Zheng, Conghui He, Weijia Li

This paper aims at achieving fine-grained building attribute segmentation in a cross-view scenario, i. e., using satellite and street-view image pairs.

Attribute Semantic Segmentation

VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis

2 code implementations29 Mar 2024 Chao Pang, Xingxing Weng, Jiang Wu, Jiayu Li, Yi Liu, Jiaxing Sun, Weijia Li, Shuai Wang, Litong Feng, Gui-Song Xia, Conghui He

VHM is built on a large-scale remote sensing image-text dataset with rich-content captions (VersaD), and an honest instruction dataset comprising both factual and deceptive questions (HnstD).

Hallucination Image Captioning +8

Building Bridges across Spatial and Temporal Resolutions: Reference-Based Super-Resolution via Change Priors and Conditional Diffusion Model

1 code implementation CVPR 2024 Runmin Dong, Shuai Yuan, Bin Luo, Mengxuan Chen, Jinxiao Zhang, Lixian Zhang, Weijia Li, Juepeng Zheng, Haohuan Fu

Specifically, we inject the priors into the denoising model to improve the utilization of reference information in unchanged areas and regulate the reconstruction of semantically relevant content in changed areas.

Denoising Reference-based Super-Resolution

Parrot Captions Teach CLIP to Spot Text

1 code implementation21 Dec 2023 Yiqi Lin, Conghui He, Alex Jinpeng Wang, Bin Wang, Weijia Li, Mike Zheng Shou

Despite CLIP being the foundation model in numerous vision-language applications, the CLIP suffers from a severe text spotting bias.

Representation Learning text similarity +1

VIGC: Visual Instruction Generation and Correction

2 code implementations24 Aug 2023 Bin Wang, Fan Wu, Xiao Han, Jiahui Peng, Huaping Zhong, Pan Zhang, Xiaoyi Dong, Weijia Li, Wei Li, Jiaqi Wang, Conghui He

A practical solution to this problem would be to utilize the available multimodal large language models (MLLMs) to generate instruction data for vision-language tasks.

Hallucination Image Captioning +1

An In-Depth Exploration of Person Re-Identification and Gait Recognition in Cloth-Changing Conditions

2 code implementations CVPR 2023 Weijia Li, Saihui Hou, Chunjie Zhang, Chunshui Cao, Xu Liu, Yongzhen Huang, Yao Zhao

For the cloth-changing problem, video-based ReID is rarely studied due to the lack of a suitable cloth-changing benchmark, and gait recognition is often researched under controlled conditions.

16k Gait Recognition +1

SEPT: Towards Scalable and Efficient Visual Pre-Training

no code implementations11 Dec 2022 Yiqi Lin, Huabin Zheng, Huaping Zhong, Jinjing Zhu, Weijia Li, Conghui He, Lin Wang

To address these issues, we build a task-specific self-supervised pre-training framework from a data selection perspective based on a simple hypothesis that pre-training on the unlabeled samples with similar distribution to the target task can bring substantial performance gains.

Retrieval

OmniCity: Omnipotent City Understanding with Multi-level and Multi-view Images

no code implementations CVPR 2023 Weijia Li, Yawen Lai, Linning Xu, Yuanbo Xiangli, Jinhua Yu, Conghui He, Gui-Song Xia, Dahua Lin

More precisely, the OmniCity contains multi-view satellite images as well as street-level panorama and mono-view images, constituting over 100K pixel-wise annotated images that are well-aligned and collected from 25K geo-locations in New York City.

Instance Segmentation Segmentation +1

Exploring the Interactive Guidance for Unified and Effective Image Matting

1 code implementation17 May 2022 Dinghao Yang, Bin Wang, Weijia Li, Yiqi Lin, Conghui He

Although avoiding the extensive labors of trimap annotation, existing methods still suffer from two limitations: (1) For the single image with multiple objects, it is essential to provide extra interaction information to help determining the matting target; (2) For transparent objects, the accurate regression of alpha matte from RGB image is much more difficult compared with the opaque ones.

Foreground Segmentation Image Matting +1

Learning to Extract Building Footprints from Off-Nadir Aerial Images

1 code implementation28 Apr 2022 Jinwang Wang, Lingxuan Meng, Weijia Li, Wen Yang, Lei Yu, Gui-Song Xia

In this paper, we propose an offset vector learning scheme, which turns the building footprint extraction problem in off-nadir images into an instance-level joint prediction problem of the building roof and its corresponding "roof to footprint" offset vector.

Influence Selection for Active Learning

1 code implementation ICCV 2021 Zhuoming Liu, Hao Ding, Huaping Zhong, Weijia Li, Jifeng Dai, Conghui He

To obtain the Influence of the unlabeled sample in the active learning scenario, we design the Untrained Unlabeled sample Influence Calculation(UUIC) to estimate the unlabeled sample's expected gradient with which we calculate its Influence.

Active Learning Diversity

3D Building Reconstruction From Monocular Remote Sensing Images

no code implementations ICCV 2021 Weijia Li, Lingxuan Meng, Jinwang Wang, Conghui He, Gui-Song Xia, Dahua Lin

3D building reconstruction from monocular remote sensing imagery is an important research problem and an economic solution to large-scale city modeling, compared with reconstruction from LiDAR data and multi-view imagery.

3D Reconstruction Model Optimization

Cross-regional oil palm tree counting and detection via multi-level attention domain adaptation network

1 code implementation26 Aug 2020 Juepeng Zheng, Haohuan Fu, Weijia Li, Wenzhao Wu, Yi Zhao, Runmin Dong, Le Yu

In this paper, we propose a novel domain adaptive oil palm tree detection method, i. e., a Multi-level Attention Domain Adaptation Network (MADAN) to reap cross-regional oil palm tree counting and detection.

Domain Adaptation

Cannot find the paper you are looking for? You can Submit a new open access paper.