Search Results for author: Zhenye Gan

Found 21 papers, 15 papers with code

UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer

no code implementations12 Mar 2025 Haoxuan Wang, Jinlong Peng, Qingdong He, Hao Yang, Ying Jin, Jiafu Wu, Xiaobin Hu, Yanjie Pan, Zhenye Gan, Mingmin Chi, Bo Peng, Yabiao Wang

With the rapid development of diffusion models in image generation, the demand for more powerful and flexible controllable frameworks is increasing.

Image Generation

SoftPatch+: Fully Unsupervised Anomaly Classification and Segmentation

1 code implementation30 Dec 2024 Chengjie Wang, Xi Jiang, Bin-Bin Gao, Zhenye Gan, Yong liu, Feng Zheng, Lizhuang Ma

Furthermore, the performance of SoftPatch and SoftPatch+ is comparable to that of the noise-free methods in conventional unsupervised AD setting.

Anomaly Classification Classification +1

MobileMamba: Lightweight Multi-Receptive Visual Mamba Network

1 code implementation CVPR 2025 Haoyang He, Jiangning Zhang, Yuxuan Cai, Hongxu Chen, Xiaobin Hu, Zhenye Gan, Yabiao Wang, Chengjie Wang, Yunsheng Wu, Lei Xie

CNNs, with their local receptive fields, struggle to capture long-range dependencies, while Transformers, despite their global modeling capabilities, are limited by quadratic computational complexity in high-resolution scenarios.

Mamba State Space Models

LLaVA-KD: A Framework of Distilling Multimodal Large Language Models

1 code implementation21 Oct 2024 Yuxuan Cai, Jiangning Zhang, Haoyang He, Xinwei He, Ao Tong, Zhenye Gan, Chengjie Wang, Xiang Bai

The success of Large Language Models (LLM) has led researchers to explore Multimodal Large Language Models (MLLM) for unified visual and linguistic understanding.

A Survey on Benchmarks of Multimodal Large Language Models

1 code implementation16 Aug 2024 Jian Li, Weiheng Lu, Hao Fei, Meng Luo, Ming Dai, Min Xia, Yizhang Jin, Zhenye Gan, Ding Qi, Chaoyou Fu, Ying Tai, Wankou Yang, Yabiao Wang, Chengjie Wang

Multimodal Large Language Models (MLLMs) are gaining increasing popularity in both academia and industry due to their remarkable performance in various applications such as visual question answering, visual perception, understanding, and reasoning.

Question Answering Survey +1

LLaVA-VSD: Large Language-and-Vision Assistant for Visual Spatial Description

1 code implementation9 Aug 2024 Yizhang Jin, Jian Li, Jiangning Zhang, Jianlong Hu, Zhenye Gan, Xin Tan, Yong liu, Yabiao Wang, Chengjie Wang, Lizhuang Ma

In this paper, we propose a Large Language-and-Vision Assistant for Visual Spatial Description, named LLaVA-VSD, which is designed for the classification, description, and open-ended description of visual spatial relationships.

Diversity Instruction Following +4

PSPU: Enhanced Positive and Unlabeled Learning by Leveraging Pseudo Supervision

no code implementations9 Jul 2024 Chengjie Wang, Chengming Xu, Zhenye Gan, Jianlong Hu, Wenbing Zhu, Lizhuag Ma

Positive and Unlabeled (PU) learning, a binary classification model trained with only positive and unlabeled data, generally suffers from overfitted risk estimation due to inconsistent data distributions.

Anomaly Detection Binary Classification

A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection

1 code implementation5 Jun 2024 Jiangning Zhang, Haoyang He, Zhenye Gan, Qingdong He, Yuxuan Cai, Zhucun Xue, Yabiao Wang, Chengjie Wang, Lei Xie, Yong liu

This paper addresses this issue by proposing a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework that is highly extensible for new methods.

Benchmarking Lesion Detection +1

Efficient Multimodal Large Language Models: A Survey

1 code implementation17 May 2024 Yizhang Jin, Jian Li, Yexin Liu, Tianjun Gu, Kai Wu, Zhengkai Jiang, Muyang He, Bo Zhao, Xin Tan, Zhenye Gan, Yabiao Wang, Chengjie Wang, Lizhuang Ma

In the past year, Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning.

Edge-computing Question Answering +2

DMAD: Dual Memory Bank for Real-World Anomaly Detection

1 code implementation19 Mar 2024 Jianlong Hu, Xu Chen, Zhenye Gan, Jinlong Peng, Shengchuan Zhang, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Liujuan Cao, Rongrong Ji

To address the challenge of real-world anomaly detection, we propose a new framework named Dual Memory bank enhanced representation learning for Anomaly Detection (DMAD).

Anomaly Detection Representation Learning

Transavs: End-To-End Audio-Visual Segmentation With Transformer

no code implementations12 May 2023 Yuhang Ling, Yuxi Li, Zhenye Gan, Jiangning Zhang, Mingmin Chi, Yabiao Wang

Generally AVS faces two key challenges: (1) Audio signals inherently exhibit a high degree of information density, as sounds produced by multiple objects are entangled within the same audio stream; (2) Objects of the same category tend to produce similar audio signals, making it difficult to distinguish between them and thus leading to unclear segmentation results.

Scene Understanding Segmentation +1

Calibrated Teacher for Sparsely Annotated Object Detection

1 code implementation14 Mar 2023 Haohan Wang, Liang Liu, Boshen Zhang, Jiangning Zhang, Wuhao Zhang, Zhenye Gan, Yabiao Wang, Chengjie Wang, Haoqian Wang

Recent works on sparsely annotated object detection alleviate this problem by generating pseudo labels for the missing annotations.

Object object-detection +2

Learning Distinctive Margin toward Active Domain Adaptation

1 code implementation CVPR 2022 Ming Xie, Yuxi Li, Yabiao Wang, Zekun Luo, Zhenye Gan, Zhongyi Sun, Mingmin Chi, Chengjie Wang, Pei Wang

Despite plenty of efforts focusing on improving the domain adaptation ability (DA) under unsupervised or few-shot semi-supervised settings, recently the solution of active learning started to attract more attention due to its suitability in transferring model in a more practical way with limited annotation resource on target data.

Active Learning Domain Adaptation

CFNet: Learning Correlation Functions for One-Stage Panoptic Segmentation

no code implementations13 Jan 2022 Yifeng Chen, Wenqing Chu, Fangfang Wang, Ying Tai, Ran Yi, Zhenye Gan, Liang Yao, Chengjie Wang, Xi Li

Recently, there is growing attention on one-stage panoptic segmentation methods which aim to segment instances and stuff jointly within a fully convolutional pipeline efficiently.

Instance Segmentation Panoptic Segmentation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.