no code implementations • 21 Dec 2024 • Xizhe Xue, Guoting Wei, Hao Chen, Haokui Zhang, Feng Lin, Chunhua Shen, Xiao Xiang Zhu
The rapid evolution of Vision Language Models (VLMs) has catalyzed significant advancements in artificial intelligence, expanding research across various disciplines, including Earth Observation (EO).
no code implementations • 30 Oct 2024 • Wei Dong, Yuan Sun, Yiting Yang, Xing Zhang, Zhijun Lin, Qingsen Yan, Haokui Zhang, Peng Wang, Yang Yang, HengTao Shen
A common strategy for Parameter-Efficient Fine-Tuning (PEFT) of pre-trained Vision Transformers (ViTs) involves adapting the model to downstream tasks by learning a low-rank adaptation matrix.
1 code implementation • 15 Oct 2024 • Lijie Tao, Haokui Zhang, Haizhao Jing, Yu Liu, Dawei Yan, Guoting Wei, Xizhe Xue
Recently, the remarkable success of ChatGPT has sparked a renewed wave of interest in artificial intelligence (AI), and the advancements in visual language models (VLMs) have pushed this enthusiasm to new heights.
no code implementations • 15 Sep 2024 • Dawei Yan, Pengcheng Li, Yang Li, Hao Chen, QingGuo Chen, Weihua Luo, Wei Dong, Qingsen Yan, Haokui Zhang, Chunhua Shen
In contrast, we propose Text Guided LLaVA (TG-LLaVA) in this paper, which optimizes VLMs by guiding the vision encoder with text, offering a new and orthogonal optimization direction.
1 code implementation • 25 Aug 2024 • Haizhao Jing, Liuwei Wan, Xizhe Xue, Haokui Zhang, Ying Li
To overcome these challenges, we propose a 3D relational ConvNet named 3D-RCNet, which inherits both strengths of ConvNet and ViT, resulting in high performance in HSI classification.
no code implementations • 22 Aug 2024 • Guoting Wei, Xia Yuan, Yu Liu, Zhenhao Shang, Kelu Yao, Chao Li, Qingsen Yan, Chunxia Zhao, Haokui Zhang, Rong Xiao
Then, we propose Bidirectional Vision-Language Fusion (Bi-VLF), which includes a dual-attention fusion encoder and a multi-level text-guided Fusion Decoder.
1 code implementation • 22 Sep 2023 • Xizhe Xue, Haokui Zhang, Zongwen Bai, Ying Li
In this paper, we propose a novel Attention-Gated Tuning (AGT) strategy and a triplet-structured transformer model, Tri-Former, to address this issue.
Computational Efficiency
Hyperspectral Image Classification
+2
1 code implementation • NeurIPS 2023 • Yun Yi, Haokui Zhang, Rong Xiao, Nannan Wang, Xiaoyu Wang
It can learn efficient representations from both cell-structured networks and entire networks.
1 code implementation • 1 Jun 2023 • Shengqin Jiang, Yaoyu Fang, Haokui Zhang, Qingshan Liu, Yuankai Qi, Yang Yang, Peng Wang
Rehearsal-based video incremental learning often employs knowledge distillation to mitigate catastrophic forgetting of previously learned data.
1 code implementation • CVPR 2023 • Yun Yi, Haokui Zhang, Wenze Hu, Nannan Wang, Xiaoyu Wang
In this paper, we propose a neural architecture representation model that can be used to estimate these attributes holistically.
2 code implementations • ICCV 2023 • Haokui Zhang, Wenze Hu, Xiaoyu Wang
Currently, one main research line in designing a more efficient vision transformer is reducing the computational cost of self attention modules by adopting sparse attention or using local attention windows.
1 code implementation • ICCV 2023 • Ruihan Xu, Haokui Zhang, Wenze Hu, Shiliang Zhang, Xiaoyu Wang
Specifically, we propose a new convolutional neural network, ParCNetV2, that extends position-aware circular convolution (ParCNet) with oversized convolutions and bifurcate gate units to enhance attention.
no code implementations • 8 Oct 2022 • Tao Yang, Haokui Zhang, Wenze Hu, Changwen Chen, Xiaoyu Wang
Transformer models have made tremendous progress in various fields in recent years.
3 code implementations • 8 Mar 2022 • Haokui Zhang, Wenze Hu, Xiaoyu Wang
Experiment results show that the proposed ParC-Net achieves better performance than popular light-weight ConvNets and vision transformer based models in common vision tasks and datasets, while having fewer parameters and faster inference speed.
Ranked #817 on
Image Classification
on ImageNet
1 code implementation • 21 Oct 2021 • Xizhe Xue, Haokui Zhang, Bei Fang, Zongwen Bai, Ying Li
Compared with search spaces proposed in previous works, the proposed hybrid search space is more aligned with the characteristic of HSI data, that is, HSIs have a relatively low spatial resolution and an extremely high spectral resolution.
no code implementations • 30 Jul 2021 • Haokui Zhang, Buzhou Tang, Wenze Hu, Xiaoyu Wang
Specifically, based on transformer, we propose a new network structure to compress the feature into a low dimensional space, and an inhomogeneous neighborhood relationship preserving (INRP) loss that aims to maintain high search accuracy.
no code implementations • 28 Jul 2021 • Libo Sun, Haokui Zhang, Wei Yin
Specifically, we exploit pseudo-LiDAR using depth estimation, and propose a feature fusion network where RGB and learned depth information are fused for improved road detection.
1 code implementation • 12 Jan 2021 • Haokui Zhang, Chengrong Gong, Yunpeng Bai, Zongwen Bai, Ying Li
Correspondingly, different models need to be designed for different datasets, which further increases the workload of designing architectures; 2) the mainstream framework is a patch-to-pixel framework.
1 code implementation • 24 Dec 2020 • Haokui Zhang, Ying Li, Hao Chen, Chengrong Gong, Zongwen Bai, Chunhua Shen
For the inner search space, we propose a layer-wise architecture sharing strategy (LWAS), resulting in more flexible architectures and better performance.
2 code implementations • 7 Dec 2020 • Haokui Zhang, Ying Li, Yenan Jiang, Peng Wang, Qiang Shen, Chunhua Shen
In contrast to previous approaches, we do not impose restrictions over the source data sets, in which they do not have to be collected by the same sensors as the target data sets.
no code implementations • 4 Aug 2020 • Yenan Jiang, Ying Li, Shanrong Zou, Haokui Zhang, Yunpeng Bai
However, the existing CNN-based models operate at the patch-level, in which pixel is separately classified into classes using a patch of images around it.
1 code implementation • 11 Feb 2020 • Haokui Zhang, Yu Liu, Bei Fang, Ying Li, Lingqiao Liu, Ian Reid
Hyperspectral image(HSI) classification has been improved with convolutional neural network(CNN) in very recent years.
no code implementations • 9 Oct 2019 • Yinglong Wang, Haokui Zhang, Yu Liu, Qinfeng Shi, Bing Zeng
However, the existing methods usually do not have good generalization ability, which leads to the fact that almost all of existing methods have a satisfied performance on removing a specific type of rain streaks, but may have a relatively poor performance on other types of rain streaks.
no code implementations • 28 Sep 2019 • Yu Liu, Lingqiao Liu, Haokui Zhang, Hamid Rezatofighi, Ian Reid
This paper tackles the problem of video object segmentation.
1 code implementation • CVPR 2020 • Haokui Zhang, Ying Li, Hao Chen, Chunhua Shen
We also present analysis on the architectures found by NAS.
2 code implementations • ICCV 2019 • Haokui Zhang, Chunhua Shen, Ying Li, Yuanzhouhan Cao, Yu Liu, Youliang Yan
The temporal consistency loss is combined with the spatial loss to update the model in an end-to-end fashion.
Ranked #5 on
Monocular Depth Estimation
on Mid-Air Dataset
no code implementations • 24 Nov 2018 • Haokui Zhang, Ying Li, Peng Wang, Yu Liu, Chunhua Shen
Different from RGB videos, depth data in RGB-D videos provide key complementary information for tristimulus visual data which potentially could achieve accuracy improvement for action recognition.
1 code implementation • Remote Sensing 2017 • Ying Li, Haokui Zhang, Qiang Shen
Recent research has shown that using spectral–spatial information can considerably improve the performance of hyperspectral image (HSI) classification.