Search Results for author: Zhangxuan Gu

Found 13 papers, 10 papers with code

Conditional Prototype Rectification Prompt Learning

1 code implementation • 15 Apr 2024 • Haoxing Chen, Yaohui Li, Zizheng Huang, Yan Hong, Zhuoer Xu, Zhangxuan Gu, Jun Lan, Huijia Zhu, Weiqiang Wang

Recent advancements in efficient transfer learning (ETL) have shown remarkable success in fine-tuning VLMs within the scenario of limited data, introducing only a few parameters to harness task-specific insights from VLMs.

Transfer Learning

Paper
Code

Segment Anything Model Meets Image Harmonization

no code implementations • 20 Dec 2023 • Haoxing Chen, Yaohui Li, Zhangxuan Gu, Zhuoer Xu, Jun Lan, Huaxiong Li

Image harmonization is a crucial technique in image composition that aims to seamlessly match the background by adjusting the foreground of composite images.

Image Harmonization Semantic Segmentation

Paper
Add Code

Boosting Audio-visual Zero-shot Learning with Large Language Models

1 code implementation • 21 Nov 2023 • Haoxing Chen, Yaohui Li, Yan Hong, Zizheng Huang, Zhuoer Xu, Zhangxuan Gu, Jun Lan, Huijia Zhu, Weiqiang Wang

Audio-visual zero-shot learning aims to recognize unseen categories based on paired audio-visual sequences.

Ranked #1 on GZSL Video Classification on ActivityNet-GZSL (cls)

GZSL Video Classification

Paper
Code

Backpropagation Path Search On Adversarial Transferability

no code implementations • ICCV 2023 • Zhuoer Xu, Zhangxuan Gu, Jianping Zhang, Shiwen Cui, Changhua Meng, Weiqiang Wang

Transfer-based attackers craft adversarial examples against surrogate models and transfer them to victim models deployed in the black-box situation.

Bayesian Optimization

Paper
Add Code

DiffUTE: Universal Text Editing Diffusion Model

1 code implementation • NeurIPS 2023 • Haoxing Chen, Zhuoer Xu, Zhangxuan Gu, Jun Lan, Xing Zheng, Yaohui Li, Changhua Meng, Huijia Zhu, Weiqiang Wang

Specifically, we build our model on a diffusion model and carefully modify the network structure to enable the model for drawing multilingual characters with the help of glyph and position information.

Self-Supervised Learning

Paper
Code

Mobile User Interface Element Detection Via Adaptively Prompt Tuning

1 code implementation • CVPR 2023 • Zhangxuan Gu, Zhuoer Xu, Haoxing Chen, Jun Lan, Changhua Meng, Weiqiang Wang

Recent object detection approaches rely on pretrained vision-language models for image-text alignment.

object-detection Object Detection +1

Paper
Code

DiffusionInst: Diffusion Model for Instance Segmentation

2 code implementations • 6 Dec 2022 • Zhangxuan Gu, Haoxing Chen, Zhuoer Xu, Jun Lan, Changhua Meng, Weiqiang Wang

Extensive experimental results on COCO and LVIS show that DiffusionInst achieves competitive performance compared to existing instance segmentation models with various backbones, such as ResNet and Swin Transformers.

Ranked #8 on Instance Segmentation on LVIS v1.0 val

Instance Segmentation Segmentation

214

Paper
Code

Hierarchical Dynamic Image Harmonization

1 code implementation • 16 Nov 2022 • Haoxing Chen, Zhangxuan Gu, Yaohui Li, Jun Lan, Changhua Meng, Weiqiang Wang, Huaxiong Li

The MGD effectively applies distinct convolution to the foreground and background, learning the representations of foreground and background regions as well as their correlations to the global harmonization, facilitating local visual consistency for the images much more efficiently.

Ranked #2 on Image Harmonization on HAdobe5k(1024$\times$1024)

Image Harmonization

Paper
Code

XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding

1 code implementation • CVPR 2022 • Zhangxuan Gu, Changhua Meng, Ke Wang, Jun Lan, Weiqiang Wang, Ming Gu, Liqing Zhang

Recently, various multimodal networks for Visually-Rich Document Understanding(VRDU) have been proposed, showing the promotion of transformers by integrating visual and layout information with the text embeddings.

document understanding Optical Character Recognition (OCR) +1

Paper
Code

STC: Spatio-Temporal Contrastive Learning for Video Instance Segmentation

no code implementations • 8 Feb 2022 • Zhengkai Jiang, Zhangxuan Gu, Jinlong Peng, Hang Zhou, Liang Liu, Yabiao Wang, Ying Tai, Chengjie Wang, Liqing Zhang

In contrast, we present a simple and efficient single-stage VIS framework based on the instance segmentation method CondInst by adding an extra tracking head.

Ranked #36 on Video Instance Segmentation on YouTube-VIS validation

Contrastive Learning Instance Segmentation +3

Paper
Add Code

From Pixel to Patch: Synthesize Context-aware Features for Zero-shot Semantic Segmentation

1 code implementation • 25 Sep 2020 • Zhangxuan Gu, Siyuan Zhou, Li Niu, Zihan Zhao, Liqing Zhang

Thus, we focus on zero-shot semantic segmentation, which aims to segment unseen objects with only category-level semantic representations provided for unseen categories.

Image Classification Segmentation +3

Paper
Code

Context-aware Feature Generation for Zero-shot Semantic Segmentation

2 code implementations • 16 Aug 2020 • Zhangxuan Gu, Siyuan Zhou, Li Niu, Zihan Zhao, Liqing Zhang

In this paper, we propose a novel context-aware feature generation method for zero-shot segmentation named CaGNet.

Segmentation Semantic Segmentation +3

234

Paper
Code

Hard Pixel Mining for Depth Privileged Semantic Segmentation

1 code implementation • 27 Jun 2019 • Zhangxuan Gu, Li Niu, Haohua Zhao, Liqing Zhang

Specifically, we propose a novel Loss Weight Module, which outputs a loss weight map by employing two depth-related measurements of hard pixels: Depth Prediction Error and Depthaware Segmentation Error.

Depth Estimation Depth Prediction +2

134

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.