Search Results for author: Guodong Guo

Found 74 papers, 22 papers with code

GINet: Graph Interaction Network for Scene Parsing

1 code implementation ECCV 2020 Tianyi Wu, Yu Lu, Yu Zhu, Chuang Zhang, Ming Wu, Zhanyu Ma, Guodong Guo

GI unit is further improved by the SC-loss to enhance the semantic representations over the exemplar-based semantic graph.

Scene Parsing

Fully Transformer Networks for Semantic Image Segmentation

1 code implementation8 Jun 2021 Sitong Wu, Tianyi Wu, Fangjian Lin, Shengwei Tian, Guodong Guo

Transformers have shown impressive performance in various natural language processing and computer vision tasks, due to the capability of modeling long-range dependencies.

Face Parsing Image Segmentation +2

Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped Attention

2 code implementations28 Dec 2021 Sitong Wu, Tianyi Wu, Haoru Tan, Guodong Guo

To reduce the quadratic computation complexity caused by the global self-attention, various methods constrain the range of attention within a local region to improve its efficiency.

Instance Segmentation object-detection +2

Coarse-to-Fine Cascaded Networks with Smooth Predicting for Video Facial Expression Recognition

1 code implementation24 Mar 2022 Fanglei Xue, Zichang Tan, Yu Zhu, Zhongsong Ma, Guodong Guo

To be specific, the universal features denote the general characteristic of facial emotions within a period and the unique features denote the specific characteristic at this moment.

Facial Expression Recognition Facial Expression Recognition (FER)

Anti-UAV: A Large Multi-Modal Benchmark for UAV Tracking

1 code implementation21 Jan 2021 Nan Jiang, Kuiran Wang, Xiaoke Peng, Xuehui Yu, Qiang Wang, Junliang Xing, Guorong Li, Jian Zhao, Guodong Guo, Zhenjun Han

The releasing of such a large-scale dataset could be a useful initial step in research of tracking UAVs.

Nested Collaborative Learning for Long-Tailed Visual Recognition

1 code implementation CVPR 2022 Jun Li, Zichang Tan, Jun Wan, Zhen Lei, Guodong Guo

NCL consists of two core components, namely Nested Individual Learning (NIL) and Nested Balanced Online Distillation (NBOD), which focus on the individual supervised learning for each single expert and the knowledge transferring among multiple experts, respectively.

Image Classification Long-tail Learning

Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer

1 code implementation13 Oct 2022 Yanjing Li, Sheng Xu, Baochang Zhang, Xianbin Cao, Peng Gao, Guodong Guo

The large pre-trained vision transformers (ViTs) have demonstrated remarkable performance on various visual tasks, but suffer from expensive computational and memory cost problems when deployed on resource-constrained devices.

Quantization

SKFlow: Learning Optical Flow with Super Kernels

1 code implementation29 May 2022 Shangkun Sun, Yuanqi Chen, Yu Zhu, Guodong Guo, Ge Li

In this paper, we propose the Super Kernel Flow Network (SKFlow), a CNN architecture to ameliorate the impacts of occlusions on optical flow estimation.

Optical Flow Estimation

How is Gaze Influenced by Image Transformations? Dataset and Model

1 code implementation16 May 2019 Zhaohui Che, Ali Borji, Guangtao Zhai, Xiongkuo Min, Guodong Guo, Patrick Le Callet

Data size is the bottleneck for developing deep saliency models, because collecting eye-movement data is very time consuming and expensive.

Data Augmentation Generative Adversarial Network +1

EAN: Event Adaptive Network for Enhanced Action Recognition

1 code implementation22 Jul 2021 Yuan Tian, Yichao Yan, Guangtao Zhai, Guodong Guo, Zhiyong Gao

In this paper, we propose a unified action recognition framework to investigate the dynamic nature of video content by introducing the following designs.

Action Recognition

Q-DETR: An Efficient Low-Bit Quantized Detection Transformer

1 code implementation CVPR 2023 Sheng Xu, Yanjing Li, Mingbao Lin, Peng Gao, Guodong Guo, Jinhu Lu, Baochang Zhang

At the upper level, we introduce a new foreground-aware query matching scheme to effectively transfer the teacher information to distillation-desired features to minimize the conditional information entropy.

object-detection Object Detection +1

Recurrent Bilinear Optimization for Binary Neural Networks

2 code implementations4 Sep 2022 Sheng Xu, Yanjing Li, Tiancheng Wang, Teli Ma, Baochang Zhang, Peng Gao, Yu Qiao, Jinhu Lv, Guodong Guo

To address this issue, Recurrent Bilinear Optimization is proposed to improve the learning process of BNNs (RBONNs) by associating the intrinsic bilinear variables in the back propagation process.

object-detection Object Detection

Self-Conditioned Probabilistic Learning of Video Rescaling

1 code implementation ICCV 2021 Yuan Tian, Guo Lu, Xiongkuo Min, Zhaohui Che, Guangtao Zhai, Guodong Guo, Zhiyong Gao

After optimization, the downscaled video by our framework preserves more meaningful information, which is beneficial for both the upscaling step and the downstream tasks, e. g., video action recognition task.

Video Compression Video Super-Resolution

iffDetector: Inference-aware Feature Filtering for Object Detection

1 code implementation23 Jun 2020 Mingyuan Mao, Yuxin Tian, Baochang Zhang, Qixiang Ye, Wanquan Liu, Guodong Guo, David Doermann

In this paper, we propose a new feature optimization approach to enhance features and suppress background noise in both the training and inference stages.

Object object-detection +1

Perceptual Attacks of No-Reference Image Quality Models with Human-in-the-Loop

1 code implementation3 Oct 2022 Weixia Zhang, Dingquan Li, Xiongkuo Min, Guangtao Zhai, Guodong Guo, Xiaokang Yang, Kede Ma

No-reference image quality assessment (NR-IQA) aims to quantify how humans perceive visual distortions of digital images without access to their undistorted references.

No-Reference Image Quality Assessment NR-IQA

Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully Exploiting Self-Attention

1 code implementation28 Sep 2022 Xiangcheng Liu, Tianyi Wu, Guodong Guo

The learnable thresholds are optimized in budget-aware training to balance accuracy and complexity, performing the corresponding pruning configurations for different input instances.

Efficient ViTs Informativeness

Looking Here or There? Gaze Following in 360-Degree Images

1 code implementation ICCV 2021 Yunhao Li, Wei Shen, Zhongpai Gao, Yucheng Zhu, Guangtao Zhai, Guodong Guo

Specifically, the local region is obtained as a 2D cone-shaped field along the 2D projection of the sight line starting at the human subject's head position, and the distant region is obtained by searching along the sight line in 3D sphere space.

Domain-Aware SE Network for Sketch-based Image Retrieval with Multiplicative Euclidean Margin Softmax

1 code implementation11 Dec 2018 Peng Lu, Gao Huang, Hangyu Lin, Wenming Yang, Guodong Guo, Yanwei Fu

This paper proposes a novel approach for Sketch-Based Image Retrieval (SBIR), for which the key is to bridge the gap between sketches and photos in terms of the data representation.

Retrieval Sketch-Based Image Retrieval

Defending Black-box Skeleton-based Human Activity Classifiers

2 code implementations9 Mar 2022 He Wang, Yunfeng Diao, Zichang Tan, Guodong Guo

Our method is featured by full Bayesian treatments of the clean data, the adversaries and the classifier, leading to (1) a new Bayesian Energy-based formulation of robust discriminative classifiers, (2) a new adversary sampling scheme based on natural motion manifolds, and (3) a new post-train Bayesian strategy for black-box defense.

Human Activity Recognition Time Series Analysis

Anti-Retroactive Interference for Lifelong Learning

1 code implementation27 Aug 2022 Runqi Wang, Yuxiang Bao, Baochang Zhang, Jianzhuang Liu, Wentao Zhu, Guodong Guo

Second, according to the similarity between incremental knowledge and base knowledge, we design an adaptive fusion of incremental knowledge, which helps the model allocate capacity to the knowledge of different difficulties.

Meta-Learning

Attributes in Multiple Facial Images

no code implementations23 May 2018 Xudong Liu, Guodong Guo

To address this question, we deploy deep training for facial attributes prediction, and we explore the inconsistency issue among the attributes computed from each single image.

Attribute Face Recognition

A Study on Cross-Population Age Estimation

no code implementations CVPR 2014 Guodong Guo, Chao Zhang

Further, we study the amount of data needed in the target population to learn a cross-population age estimator.

Age Estimation Human Aging +1

Adversarial Attacks against Deep Saliency Models

no code implementations2 Apr 2019 Zhaohui Che, Ali Borji, Guangtao Zhai, Suiyi Ling, Guodong Guo, Patrick Le Callet

The proposed attack only requires a part of the model information, and is able to generate a sparser and more insidious adversarial perturbation, compared to traditional image-space attacks.

Adversarial Attack object-detection +1

Supervised Online Hashing via Similarity Distribution Learning

no code implementations31 May 2019 Mingbao Lin, Rongrong Ji, Shen Chen, Feng Zheng, Xiaoshuai Sun, Baochang Zhang, Liujuan Cao, Guodong Guo, Feiyue Huang

In this paper, we propose to model the similarity distributions between the input data and the hashing codes, upon which a novel supervised online hashing method, dubbed as Similarity Distribution based Online Hashing (SDOH), is proposed, to keep the intrinsic semantic relationship in the produced Hamming space.

Retrieval

A database for face presentation attack using wax figure faces

no code implementations6 Jun 2019 Shan Jia, Chuanbo Hu, Guodong Guo, Zhengquan Xu

Compared to 2D face presentation attacks (e. g. printed photos and video replays), 3D type attacks are more challenging to face recognition systems (FRS) by presenting 3D characteristics or materials similar to real faces.

Face Presentation Attack Detection Face Recognition +1

UGAN: Untraceable GAN for Multi-Domain Face Translation

no code implementations26 Jul 2019 Defa Zhu, Si Liu, Wentao Jiang, Chen Gao, Tianyi Wu, Qaingchang Wang, Guodong Guo

To address this issue, we propose a method called Untraceable GAN, which has a novel source classifier to differentiate which domain an image is translated from, and determines whether the translated image still retains the characteristics of the source domain.

Image-to-Image Translation Translation

Consensus Feature Network for Scene Parsing

no code implementations29 Jul 2019 Tianyi Wu, Sheng Tang, Rui Zhang, Guodong Guo, Yongdong Zhang

However, classification networks are dominated by the discriminative portion, so directly applying classification networks to scene parsing will result in inconsistent parsing predictions within one instance and among instances of the same category.

General Classification Scene Parsing

ChaLearn Looking at People: IsoGD and ConGD Large-scale RGB-D Gesture Recognition

no code implementations29 Jul 2019 Jun Wan, Chi Lin, Longyin Wen, Yunan Li, Qiguang Miao, Sergio Escalera, Gholamreza Anbarjafari, Isabelle Guyon, Guodong Guo, Stan Z. Li

The ChaLearn large-scale gesture recognition challenge has been run twice in two workshops in conjunction with the International Conference on Pattern Recognition (ICPR) 2016 and International Conference on Computer Vision (ICCV) 2017, attracting more than $200$ teams round the world.

Gesture Recognition

Bayesian Optimized 1-Bit CNNs

no code implementations ICCV 2019 Jiaxin Gu, Junhe Zhao, Xiao-Long Jiang, Baochang Zhang, Jianzhuang Liu, Guodong Guo, Rongrong Ji

Deep convolutional neural networks (DCNNs) have dominated the recent developments in computer vision through making various record-breaking models.

RBCN: Rectified Binary Convolutional Networks for Enhancing the Performance of 1-bit DCNNs

no code implementations21 Aug 2019 Chunlei Liu, Wenrui Ding, Xin Xia, Yuan Hu, Baochang Zhang, Jianzhuang Liu, Bohan Zhuang, Guodong Guo

Binarized convolutional neural networks (BCNNs) are widely used to improve memory and computation efficiency of deep convolutional neural networks (DCNNs) for mobile and AI chips based applications.

Binarization Object Tracking

WiderPerson: A Diverse Dataset for Dense Pedestrian Detection in the Wild

no code implementations25 Sep 2019 Shifeng Zhang, Yiliang Xie, Jun Wan, Hansheng Xia, Stan Z. Li, Guodong Guo

To narrow this gap and facilitate future pedestrian detection research, we introduce a large and diverse dataset named WiderPerson for dense pedestrian detection in the wild.

Ranked #3 on Object Detection on WiderPerson (mMR metric)

Object Detection Pedestrian Detection

Aggregation Signature for Small Object Tracking

no code implementations24 Oct 2019 Chunlei Liu, Wenrui Ding, Jinyu Yang, Vittorio Murino, Baochang Zhang, Jungong Han, Guodong Guo

In this paper, we propose a novel aggregation signature suitable for small object tracking, especially aiming for the challenge of sudden and large drift.

Object Object Tracking

Face Detection on Surveillance Images

no code implementations22 Oct 2019 Mohammad Iqbal Nouyed, Guodong Guo

In this paper, we perform a comparative performance analysis of some of the well known face detection methods including the few used in that competition, and, compare them to our proposed body pose based face detection method.

Benchmarking Face Detection +1

GBCNs: Genetic Binary Convolutional Networks for Enhancing the Performance of 1-bit DCNNs

no code implementations25 Nov 2019 Chunlei Liu, Wenrui Ding, Yuan Hu, Baochang Zhang, Jianzhuang Liu, Guodong Guo

The BGA method is proposed to modify the binary process of GBCNs to alleviate the local minima problem, which can significantly improve the performance of 1-bit DCNNs.

Face Recognition Object Recognition +1

Robust Invisible Hyperlinks in Physical Photographs Based on 3D Rendering Attacks

no code implementations3 Dec 2019 Jun Jia, Zhongpai Gao, Kang Chen, Menghan Hu, Guangtao Zhai, Guodong Guo, Xiaokang Yang

To train a robust decoder against the physical distortion from the real world, a distortion network based on 3D rendering is inserted between the encoder and the decoder to simulate the camera imaging process.

Static and Dynamic Fusion for Multi-modal Cross-ethnicity Face Anti-spoofing

no code implementations5 Dec 2019 Ajian Liu, Zichang Tan, Xuan Li, Jun Wan, Sergio Escalera, Guodong Guo, Stan Z. Li

Regardless of the usage of deep learning and handcrafted methods, the dynamic information from videos and the effect of cross-ethnicity are rarely considered in face anti-spoofing.

Face Anti-Spoofing

CASIA-SURF CeFA: A Benchmark for Multi-modal Cross-ethnicity Face Anti-spoofing

no code implementations11 Mar 2020 Ajian Li, Zichang Tan, Xuan Li, Jun Wan, Sergio Escalera, Guodong Guo, Stan Z. Li

Ethnic bias has proven to negatively affect the performance of face recognition systems, and it remains an open research problem in face anti-spoofing.

Face Anti-Spoofing Face Recognition

Cross-ethnicity Face Anti-spoofing Recognition Challenge: A Review

no code implementations23 Apr 2020 Ajian Liu, Xuan Li, Jun Wan, Sergio Escalera, Hugo Jair Escalante, Meysam Madadi, Yi Jin, Zhuoyuan Wu, Xiaogang Yu, Zichang Tan, Qi Yuan, Ruikun Yang, Benjia Zhou, Guodong Guo, Stan Z. Li

Although ethnic bias has been verified to severely affect the performance of face recognition systems, it still remains an open research problem in face anti-spoofing.

Face Anti-Spoofing Face Recognition

3D Face Anti-spoofing with Factorized Bilinear Coding

no code implementations12 May 2020 Shan Jia, Xin Li, Chuanbo Hu, Guodong Guo, Zhengquan Xu

We have witnessed rapid advances in both face presentation attack models and presentation attack detection (PAD) in recent years.

Face Anti-Spoofing Face Presentation Attack Detection +1

Cogradient Descent for Bilinear Optimization

no code implementations CVPR 2020 Li'an Zhuo, Baochang Zhang, Linlin Yang, Hanlin Chen, Qixiang Ye, David Doermann, Guodong Guo, Rongrong Ji

Conventional learning methods simplify the bilinear model by regarding two intrinsically coupled factors independently, which degrades the optimization procedure.

Image Reconstruction Network Pruning

Self-supervised Video Object Segmentation

no code implementations22 Jun 2020 Fangrui Zhu, Li Zhang, Yanwei Fu, Guodong Guo, Weidi Xie

The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a. k. a.

Object One-shot visual object segmentation +4

Binarized Neural Architecture Search for Efficient Object Recognition

no code implementations8 Sep 2020 Hanlin Chen, Li'an Zhuo, Baochang Zhang, Xiawu Zheng, Jianzhuang Liu, Rongrong Ji, David Doermann, Guodong Guo

In this paper, binarized neural architecture search (BNAS), with a search space of binarized convolutions, is introduced to produce extremely compressed models to reduce huge computational cost on embedded devices for edge computing.

Edge-computing Face Recognition +3

Joint Face Image Restoration and Frontalization for Recognition

no code implementations12 May 2021 Xiaoguang Tu, Jian Zhao, Qiankun Liu, Wenjie Ai, Guodong Guo, Zhifeng Li, Wei Liu, Jiashi Feng

First, MDFR is a well-designed encoder-decoder architecture which extracts feature representation from an input face image with arbitrary low-quality factors and restores it to a high-quality counterpart.

Face Recognition Image Restoration

Image-to-Video Generation via 3D Facial Dynamics

no code implementations31 May 2021 Xiaoguang Tu, Yingtian Zou, Jian Zhao, Wenjie Ai, Jian Dong, Yuan YAO, Zhikang Wang, Guodong Guo, Zhifeng Li, Wei Liu, Jiashi Feng

Video generation from a single face image is an interesting problem and usually tackled by utilizing Generative Adversarial Networks (GANs) to integrate information from the input face image and a sequence of sparse facial landmarks.

Image to Video Generation Video Prediction

SAR-Net: Shape Alignment and Recovery Network for Category-level 6D Object Pose and Size Estimation

no code implementations CVPR 2022 Haitao Lin, Zichang Liu, Chilam Cheang, Yanwei Fu, Guodong Guo, xiangyang xue

The concatenation of the observed point cloud and symmetric one reconstructs a coarse object shape, thus facilitating object center (3D translation) and 3D size estimation.

Object Optical Character Recognition (OCR)

The 2nd Anti-UAV Workshop & Challenge: Methods and Results

no code implementations23 Aug 2021 Jian Zhao, Gang Wang, Jianan Li, Lei Jin, Nana Fan, Min Wang, Xiaojuan Wang, Ting Yong, Yafeng Deng, Yandong Guo, Shiming Ge, Guodong Guo

The 2nd Anti-UAV Workshop \& Challenge aims to encourage research in developing novel and accurate methods for multi-scale object tracking.

Object Tracking

Sparse to Dense Motion Transfer for Face Image Animation

no code implementations1 Sep 2021 Ruiqi Zhao, Tianyi Wu, Guodong Guo

Given a source face image and a sequence of sparse face landmarks, our goal is to generate a video of the face imitating the motion of landmarks.

Image Animation Motion Estimation +1

IDARTS: Interactive Differentiable Architecture Search

no code implementations ICCV 2021 Song Xue, Runqi Wang, Baochang Zhang, Tian Wang, Guodong Guo, David Doermann

Differentiable Architecture Search (DARTS) improves the efficiency of architecture search by learning the architecture and network parameters end-to-end.

LAE : Long-tailed Age Estimation

no code implementations25 Oct 2021 Zenghao Bao, Zichang Tan, Yu Zhu, Jun Wan, Xibo Ma, Zhen Lei, Guodong Guo

To improve the performance of facial age estimation, we first formulate a simple standard baseline and build a much strong one by collecting the tricks in pre-training, data augmentation, model architecture, and so on.

Age Estimation Data Augmentation +1

Learning to Recognize the Unseen Visual Predicates

no code implementations25 Sep 2019 Defa Zhu, Si Liu, Wentao Jiang, Guanbin Li, Tianyi Wu, Guodong Guo

Visual relationship recognition models are limited in the ability to generalize from finite seen predicates to unseen ones.

Question Answering Visual Question Answering +1

POEM: 1-bit Point-wise Operations based on Expectation-Maximization for Efficient Point Cloud Processing

no code implementations26 Nov 2021 Sheng Xu, Yanjing Li, Junhe Zhao, Baochang Zhang, Guodong Guo

Real-time point cloud processing is fundamental for lots of computer vision tasks, while still challenged by the computational problem on resource-limited edge devices.

Associative Adversarial Learning Based on Selective Attack

no code implementations28 Dec 2021 Runqi Wang, Xiaoyue Duan, Baochang Zhang, Song Xue, Wentao Zhu, David Doermann, Guodong Guo

We show that our method improves the recognition accuracy of adversarial training on ImageNet by 8. 32% compared with the baseline.

Adversarial Robustness Few-Shot Learning +2

Dynamic Group Transformer: A General Vision Transformer Backbone with Dynamic Group Attention

no code implementations8 Mar 2022 Kai Liu, Tianyi Wu, Cong Liu, Guodong Guo

To reduce the quadratic computation complexity caused by each query attending to all keys/values, various methods have constrained the range of attention within local regions, where each query only attends to keys/values within a hand-crafted window.

Image Classification Instance Segmentation +3

Iwin: Human-Object Interaction Detection via Transformer with Irregular Windows

no code implementations20 Mar 2022 Danyang Tu, Xiongkuo Min, Huiyu Duan, Guodong Guo, Guangtao Zhai, Wei Shen

Iwin Transformer is a hierarchical Transformer which progressively performs token representation learning and token agglomeration within irregular windows.

Human-Object Interaction Detection Object +4

Feature Selective Transformer for Semantic Image Segmentation

no code implementations26 Mar 2022 Fangjian Lin, Tianyi Wu, Sitong Wu, Shengwei Tian, Guodong Guo

In this work, we focus on fusing multi-scale features from Transformer-based backbones for semantic segmentation, and propose a Feature Selective Transformer (FeSeFormer), which aggregates features from all scales (or levels) for each query feature.

feature selection Image Segmentation +2

Bi-level Doubly Variational Learning for Energy-based Latent Variable Models

no code implementations CVPR 2022 Ge Kan, Jinhu Lü, Tian Wang, Baochang Zhang, Aichun Zhu, Lei Huang, Guodong Guo, Hichem Snoussi

In this paper, we propose Bi-level doubly variational learning (BiDVL), which is based on a new bi-level optimization framework and two tractable variational distributions to facilitate learning EBLVMs.

Image Generation Image Reconstruction +1

CATrans: Context and Affinity Transformer for Few-Shot Segmentation

no code implementations27 Apr 2022 Shan Zhang, Tianyi Wu, Sitong Wu, Guodong Guo

In this work, we effectively integrate the context and affinity information via the proposed novel Context and Affinity Transformer (CATrans) in a hierarchical architecture.

Relation Transfer Learning

Region-level Contrastive and Consistency Learning for Semi-Supervised Semantic Segmentation

no code implementations28 Apr 2022 Jianrong Zhang, Tianyi Wu, Chuanghao Ding, Hongwei Zhao, Guodong Guo

Specifically, we first propose a Region Mask Contrastive (RMC) loss and a Region Feature Contrastive (RFC) loss to accomplish region-level contrastive property.

Segmentation Semi-Supervised Semantic Segmentation

DCP-NAS: Discrepant Child-Parent Neural Architecture Search for 1-bit CNNs

no code implementations27 Jun 2023 Yanjing Li, Sheng Xu, Xianbin Cao, Li'an Zhuo, Baochang Zhang, Tian Wang, Guodong Guo

One natural approach is to use 1-bit CNNs to reduce the computation and memory cost of NAS by taking advantage of the strengths of each in a unified framework, while searching the 1-bit CNNs is more challenging due to the more complicated processes involved.

Neural Architecture Search object-detection +2

NCL++: Nested Collaborative Learning for Long-Tailed Visual Recognition

no code implementations29 Jun 2023 Zichang Tan, Jun Li, Jinhao Du, Jun Wan, Zhen Lei, Guodong Guo

To achieve the collaborative learning in long-tailed learning, the balanced online distillation is proposed to force the consistent predictions among different experts and augmented copies, which reduces the learning uncertainties.

On visual BMI analysis from facial images

no code implementations Image and Vision Computing 2019 Min Jiang, Yuanyuan Shang, Guodong Guo

Various facial representations, including geometry based representations and deep learning based, are comprehensively evaluated and analyzed from three perspectives: the overall performance on visual BMI prediction, the redundancy in facial representations and the sensitivity to head pose changes.

MORPH

Fusion-Mamba for Cross-modality Object Detection

no code implementations14 Apr 2024 Wenhao Dong, Haodong Zhu, Shaohui Lin, Xiaoyan Luo, Yunhang Shen, Xuhui Liu, Juan Zhang, Guodong Guo, Baochang Zhang

In this paper, we investigate cross-modality fusion by associating cross-modal features in a hidden state space based on an improved Mamba with a gating mechanism.

Object object-detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.