Search Results for author: Guoqing Wang

Found 27 papers, 12 papers with code

New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration

no code implementations27 Feb 2025 Xuzheng Yang, Junzhuo Liu, Peng Wang, Guoqing Wang, Yang Yang, Heng Tao Shen

To address fine-grained compositional REC, we propose novel methods based on a Specialist-MLLM collaboration framework, leveraging the complementary strengths of them: Specialist Models handle simpler tasks efficiently, while MLLMs are better suited for complex reasoning.

Image Comprehension Referring Expression +1

Towards Real-time Video Compressive Sensing on Mobile Devices

1 code implementation14 Aug 2024 Miao Cao, Lishun Wang, Huan Wang, Guoqing Wang, Xin Yuan

The fast evolving mobile devices and existing high-performance video SCI reconstruction algorithms motivate us to develop mobile reconstruction methods for real-world applications.

Compressive Sensing Knowledge Distillation +1

Diffusion Models as Optimizers for Efficient Planning in Offline RL

1 code implementation23 Jul 2024 Renming Huang, Yunqiang Pei, Guoqing Wang, Yangming Zhang, Yang Yang, Peng Wang, HengTao Shen

To evaluate the effectiveness and efficiency of the Trajectory Diffuser, we conduct experiments on the D4RL benchmarks.

D4RL Decision Making +3

VEON: Vocabulary-Enhanced Occupancy Prediction

no code implementations17 Jul 2024 Jilai Zheng, Pin Tang, Zhongdao Wang, Guoqing Wang, Xiangxuan Ren, Bailan Feng, Chao Ma

Hence, instead of building our model from scratch, we try to blend 2D foundation models, specifically a depth model MiDaS and a semantic model CLIP, to lift the semantics to 3D space, thus fulfilling 3D occupancy.

Prediction

DMM: Disparity-guided Multispectral Mamba for Oriented Object Detection in Remote Sensing

1 code implementation11 Jul 2024 Minghang Zhou, Tianyu Li, Chaofan Qiao, Dongyu Xie, Guoqing Wang, Ningjuan Ruan, Lin Mei, Yang Yang

Inspired by the efficiency and lower complexity of Mamba in long sequence tasks, we propose Disparity-guided Multispectral Mamba (DMM), a multispectral oriented object detection framework comprised of a Disparity-guided Cross-modal Fusion Mamba (DCFM) module, a Multi-scale Target-aware Attention (MTA) module, and a Target-Prior Aware (TPA) auxiliary task.

Computational Efficiency Mamba +3

Q-SNNs: Quantized Spiking Neural Networks

no code implementations19 Jun 2024 Wenjie Wei, Yu Liang, Ammar Belatreche, Yichen Xiao, Honglin Cao, Zhenbang Ren, Guoqing Wang, Malu Zhang, Yang Yang

Brain-inspired Spiking Neural Networks (SNNs) leverage sparse spikes to represent information and process them in an asynchronous event-driven manner, offering an energy-efficient paradigm for the next generation of machine intelligence.

Quantization

OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving

no code implementations23 Apr 2024 Guoqing Wang, Zhongdao Wang, Pin Tang, Jilai Zheng, Xiangxuan Ren, Bailan Feng, Chao Ma

Existing solutions for 3D semantic occupancy prediction typically treat the task as a one-shot 3D voxel-wise segmentation perception problem.

3D Semantic Occupancy Prediction Autonomous Driving +1

Structure-Aware Human Body Reshaping with Adaptive Affinity-Graph Network

1 code implementation22 Apr 2024 Qiwen Deng, Yangcen Liu, Wen Li, Guoqing Wang

Particularly, an SRM filter is utilized to extract high-frequency details, which are combined with spatial features as input to the BSD.

Optical Flow Estimation

SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction

no code implementations CVPR 2024 Pin Tang, Zhongdao Wang, Guoqing Wang, Jilai Zheng, Xiangxuan Ren, Bailan Feng, Chao Ma

Vision-based perception for autonomous driving requires an explicit modeling of a 3D space, where 2D latent representations are mapped and subsequent 3D operators are applied.

Autonomous Driving

Scaling Multi-Camera 3D Object Detection through Weak-to-Strong Eliciting

1 code implementation10 Apr 2024 Hao Lu, Jiaqi Tang, Xinli Xu, Xu Cao, Yunpeng Zhang, Guoqing Wang, Dalong Du, Hao Chen, Yingcong Chen

Finally, for MC3D-Det joint training, the elaborate dataset merge strategy is designed to solve the problem of inconsistent camera numbers and camera parameters.

3D Object Detection Autonomous Driving +1

Region-aware Distribution Contrast: A Novel Approach to Multi-Task Partially Supervised Learning

no code implementations15 Mar 2024 Meixuan Li, Tianyu Li, Guoqing Wang, Peng Wang, Yang Yang, Heng Tao Shen

Aligning these distributions between corresponding regions from different tasks imparts higher flexibility and capacity to capture intra-region structures, accommodating a broader range of tasks.

Depth Estimation Semantic Segmentation +1

Open-Vocabulary Calibration for Fine-tuned CLIP

1 code implementation7 Feb 2024 Shuoyuan Wang, Jindong Wang, Guoqing Wang, Bob Zhang, Kaiyang Zhou, Hongxin Wei

Vision-language models (VLMs) have emerged as formidable tools, showing their strong capability in handling various open-vocabulary tasks in image recognition, text-driven visual content generation, and visual chatbots, to name a few.

parameter-efficient fine-tuning

JoReS-Diff: Joint Retinex and Semantic Priors in Diffusion Model for Low-light Image Enhancement

no code implementations20 Dec 2023 Yuhui Wu, Guoqing Wang, Zhiwen Wang, Yang Yang, Tianyu Li, Malu Zhang, Chongyi Li, Heng Tao Shen

By treating Retinex- and semantic-based priors as the condition, JoReS-Diff presents a unique perspective for establishing an diffusion model for LLIE and similar image enhancement tasks.

Low-Light Image Enhancement Semantic Segmentation

Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation

no code implementations24 Oct 2023 Yinjie Lei, Zixuan Wang, Feng Chen, Guoqing Wang, Peng Wang, Yang Yang

Multi-modal 3D scene understanding has gained considerable attention due to its wide applications in many areas, such as autonomous driving and human-computer interaction.

Autonomous Driving Scene Understanding

Faster Video Moment Retrieval with Point-Level Supervision

no code implementations23 May 2023 Xun Jiang, Zailei Zhou, Xing Xu, Yang Yang, Guoqing Wang, Heng Tao Shen

Existing VMR methods suffer from two defects: (1) massive expensive temporal annotations are required to obtain satisfying performance; (2) complicated cross-modal interaction modules are deployed, which lead to high computational cost and low efficiency for the retrieval process.

Moment Retrieval Natural Language Queries +1

Learning Semantic-Aware Knowledge Guidance for Low-Light Image Enhancement

1 code implementation CVPR 2023 Yuhui Wu, Chen Pan, Guoqing Wang, Yang Yang, Jiwei Wei, Chongyi Li, Heng Tao Shen

To address this issue, we propose a novel semantic-aware knowledge-guided framework (SKF) that can assist a low-light enhancement model in learning rich and diverse priors encapsulated in a semantic segmentation model.

Low-Light Image Enhancement Semantic Segmentation

ScanERU: Interactive 3D Visual Grounding based on Embodied Reference Understanding

1 code implementation23 Mar 2023 Ziyang Lu, Yunqiang Pei, Guoqing Wang, Yang Yang, Zheng Wang, Heng Tao Shen

Despite their effectiveness, existing methods suffer from the difficulty of low recognition accuracy in cases of multiple adjacent objects with similar appearances. To address this issue, this work intuitively introduces the human-robot interaction as a cue to facilitate the development of 3D visual grounding.

3D visual grounding

Thunder: Thumbnail based Fast Lightweight Image Denoising Network

no code implementations24 May 2022 Yifeng Zhou, Xing Xu, Shuaicheng Liu, Guoqing Wang, Huimin Lu, Heng Tao Shen

To achieve promising results on removing noise from real-world images, most of existing denoising networks are formulated with complex network structure, making them impractical for deployment.

Image Denoising SSIM

Learning content and context with language bias for Visual Question Answering

1 code implementation21 Dec 2020 Chao Yang, Su Feng, Dongsheng Li, HuaWei Shen, Guoqing Wang, Bin Jiang

Many works concentrate on how to reduce language bias which makes models answer questions ignoring visual content and language context.

Question Answering Visual Question Answering

ERL-Net: Entangled Representation Learning for Single Image De-Raining

no code implementations ICCV 2019 Guoqing Wang, Changming Sun, Arcot Sowmya

In this paper, we hypothesize that there exists an inherent mapping between the low-quality embedding to a latent optimal one, with which the generator (decoder) can produce much better results.

Decoder Image Restoration +3

Cannot find the paper you are looking for? You can Submit a new open access paper.