Search Results for author: Haokui Zhang

Found 28 papers, 17 papers with code

REO-VLM: Transforming VLM to Meet Regression Challenges in Earth Observation

no code implementations21 Dec 2024 Xizhe Xue, Guoting Wei, Hao Chen, Haokui Zhang, Feng Lin, Chunhua Shen, Xiao Xiang Zhu

The rapid evolution of Vision Language Models (VLMs) has catalyzed significant advancements in artificial intelligence, expanding research across various disciplines, including Earth Observation (EO).

Earth Observation regression

Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation

no code implementations30 Oct 2024 Wei Dong, Yuan Sun, Yiting Yang, Xing Zhang, Zhijun Lin, Qingsen Yan, Haokui Zhang, Peng Wang, Yang Yang, HengTao Shen

A common strategy for Parameter-Efficient Fine-Tuning (PEFT) of pre-trained Vision Transformers (ViTs) involves adapting the model to downstream tasks by learning a low-rank adaptation matrix.

parameter-efficient fine-tuning

Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques

1 code implementation15 Oct 2024 Lijie Tao, Haokui Zhang, Haizhao Jing, Yu Liu, Dawei Yan, Guoting Wei, Xizhe Xue

Recently, the remarkable success of ChatGPT has sparked a renewed wave of interest in artificial intelligence (AI), and the advancements in visual language models (VLMs) have pushed this enthusiasm to new heights.

TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings

no code implementations15 Sep 2024 Dawei Yan, Pengcheng Li, Yang Li, Hao Chen, QingGuo Chen, Weihua Luo, Wei Dong, Qingsen Yan, Haokui Zhang, Chunhua Shen

In contrast, we propose Text Guided LLaVA (TG-LLaVA) in this paper, which optimizes VLMs by guiding the vision encoder with text, offering a new and orthogonal optimization direction.

Language Modelling

3D-RCNet: Learning from Transformer to Build a 3D Relational ConvNet for Hyperspectral Image Classification

1 code implementation25 Aug 2024 Haizhao Jing, Liuwei Wan, Xizhe Xue, Haokui Zhang, Ying Li

To overcome these challenges, we propose a 3D relational ConvNet named 3D-RCNet, which inherits both strengths of ConvNet and ViT, resulting in high performance in HSI classification.

Computational Efficiency Hyperspectral Image Classification

OVA-DETR: Open Vocabulary Aerial Object Detection Using Image-Text Alignment and Fusion

no code implementations22 Aug 2024 Guoting Wei, Xia Yuan, Yu Liu, Zhenhao Shang, Kelu Yao, Chao Li, Qingsen Yan, Chunxia Zhao, Haokui Zhang, Rong Xiao

Then, we propose Bidirectional Vision-Language Fusion (Bi-VLF), which includes a dual-attention fusion encoder and a multi-level text-guided Fusion Decoder.

Decoder object-detection +1

Bridging Sensor Gaps via Attention Gated Tuning for Hyperspectral Image Classification

1 code implementation22 Sep 2023 Xizhe Xue, Haokui Zhang, Zongwen Bai, Ying Li

In this paper, we propose a novel Attention-Gated Tuning (AGT) strategy and a triplet-structured transformer model, Tri-Former, to address this issue.

Computational Efficiency Hyperspectral Image Classification +2

Teacher Agent: A Knowledge Distillation-Free Framework for Rehearsal-based Video Incremental Learning

1 code implementation1 Jun 2023 Shengqin Jiang, Yaoyu Fang, Haokui Zhang, Qingshan Liu, Yuankai Qi, Yang Yang, Peng Wang

Rehearsal-based video incremental learning often employs knowledge distillation to mitigate catastrophic forgetting of previously learned data.

Incremental Learning Knowledge Distillation +1

NAR-Former: Neural Architecture Representation Learning towards Holistic Attributes Prediction

1 code implementation CVPR 2023 Yun Yi, Haokui Zhang, Wenze Hu, Nannan Wang, Xiaoyu Wang

In this paper, we propose a neural architecture representation model that can be used to estimate these attributes holistically.

Representation Learning

Fcaformer: Forward Cross Attention in Hybrid Vision Transformer

2 code implementations ICCV 2023 Haokui Zhang, Wenze Hu, Xiaoyu Wang

Currently, one main research line in designing a more efficient vision transformer is reducing the computational cost of self attention modules by adopting sparse attention or using local attention windows.

Image Classification Knowledge Distillation

ParCNetV2: Oversized Kernel with Enhanced Attention

1 code implementation ICCV 2023 Ruihan Xu, Haokui Zhang, Wenze Hu, Shiliang Zhang, Xiaoyu Wang

Specifically, we propose a new convolutional neural network, ParCNetV2, that extends position-aware circular convolution (ParCNet) with oversized convolutions and bifurcate gate units to enhance attention.

ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer

3 code implementations8 Mar 2022 Haokui Zhang, Wenze Hu, Xiaoyu Wang

Experiment results show that the proposed ParC-Net achieves better performance than popular light-weight ConvNets and vision transformer based models in common vision tasks and datasets, while having fewer parameters and faster inference speed.

Image Classification object-detection +3

Grafting Transformer on Automatically Designed Convolutional Neural Network for Hyperspectral Image Classification

1 code implementation21 Oct 2021 Xizhe Xue, Haokui Zhang, Bei Fang, Zongwen Bai, Ying Li

Compared with search spaces proposed in previous works, the proposed hybrid search space is more aligned with the characteristic of HSI data, that is, HSIs have a relatively low spatial resolution and an extremely high spectral resolution.

Classification Hyperspectral Image Classification +1

Connecting Compression Spaces with Transformer for Approximate Nearest Neighbor Search

no code implementations30 Jul 2021 Haokui Zhang, Buzhou Tang, Wenze Hu, Xiaoyu Wang

Specifically, based on transformer, we propose a new network structure to compress the feature into a low dimensional space, and an inhomogeneous neighborhood relationship preserving (INRP) loss that aims to maintain high search accuracy.

Feature Compression Information Retrieval +2

Pseudo-LiDAR Based Road Detection

no code implementations28 Jul 2021 Libo Sun, Haokui Zhang, Wei Yin

Specifically, we exploit pseudo-LiDAR using depth estimation, and propose a feature fusion network where RGB and learned depth information are fused for improved road detection.

Depth Estimation Self-Driving Cars

3D-ANAS: 3D Asymmetric Neural Architecture Search for Fast Hyperspectral Image Classification

1 code implementation12 Jan 2021 Haokui Zhang, Chengrong Gong, Yunpeng Bai, Zongwen Bai, Ying Li

Correspondingly, different models need to be designed for different datasets, which further increases the workload of designing architectures; 2) the mainstream framework is a patch-to-pixel framework.

Classification General Classification +3

Memory-Efficient Hierarchical Neural Architecture Search for Image Restoration

1 code implementation24 Dec 2020 Haokui Zhang, Ying Li, Hao Chen, Chengrong Gong, Zongwen Bai, Chunhua Shen

For the inner search space, we propose a layer-wise architecture sharing strategy (LWAS), resulting in more flexible architectures and better performance.

Image Denoising Image Restoration +2

Hyperspectral Classification Based on Lightweight 3-D-CNN With Transfer Learning

2 code implementations7 Dec 2020 Haokui Zhang, Ying Li, Yenan Jiang, Peng Wang, Qiang Shen, Chunhua Shen

In contrast to previous approaches, we do not impose restrictions over the source data sets, in which they do not have to be collected by the same sensors as the target data sets.

Classification General Classification +1

Hyperspectral Image Classification with Spatial Consistence Using Fully Convolutional Spatial Propagation Network

no code implementations4 Aug 2020 Yenan Jiang, Ying Li, Shanrong Zou, Haokui Zhang, Yunpeng Bai

However, the existing CNN-based models operate at the patch-level, in which pixel is separately classified into classes using a patch of images around it.

Classification General Classification +1

Gradient Information Guided Deraining with A Novel Network and Adversarial Training

no code implementations9 Oct 2019 Yinglong Wang, Haokui Zhang, Yu Liu, Qinfeng Shi, Bing Zeng

However, the existing methods usually do not have good generalization ability, which leads to the fact that almost all of existing methods have a satisfied performance on removing a specific type of rain streaks, but may have a relatively poor performance on other types of rain streaks.

Rain Removal

RGB-D Based Action Recognition with Light-weight 3D Convolutional Networks

no code implementations24 Nov 2018 Haokui Zhang, Ying Li, Peng Wang, Yu Liu, Chunhua Shen

Different from RGB videos, depth data in RGB-D videos provide key complementary information for tristimulus visual data which potentially could achieve accuracy improvement for action recognition.

Action Recognition Temporal Action Localization

Cannot find the paper you are looking for? You can Submit a new open access paper.