Search Results for author: Zhen Zhao

Found 48 papers, 27 papers with code

Imbalanced Medical Image Segmentation with Pixel-dependent Noisy Labels

1 code implementation12 Jan 2025 Erjian Guo, Zicheng Wang, Zhen Zhao, Luping Zhou

CLCS advances the existing works by i) treating noisy labels as pixel-dependent and addressing them through a collaborative learning framework, and ii) employing a curriculum dynamic thresholding approach adapting to model learning progress to select clean data samples to mitigate the class imbalance issue, and iii) applying a noise balance loss to noisy data samples to improve data utilization instead of discarding them outright.

Image Segmentation Medical Image Segmentation +1

UniBrain: A Unified Model for Cross-Subject Brain Decoding

1 code implementation27 Dec 2024 Zicheng Wang, Zhen Zhao, Luping Zhou, Parashkev Nachev

We validate our UniBrain on the brain decoding benchmark, achieving comparable performance to current state-of-the-art subject-specific models with extremely fewer parameters.

Brain Decoding

Towards Small Object Editing: A Benchmark Dataset and A Training-Free Approach

1 code implementation3 Nov 2024 Qihe Pan, Zhen Zhao, Zicheng Wang, Sifan Long, Yiming Wu, Wei Ji, Haoran Liang, Ronghua Liang

A plethora of text-guided image editing methods has recently been developed by leveraging the impressive capabilities of large-scale diffusion-based generative models especially Stable Diffusion.

Image Generation Object +1

UniMatch V2: Pushing the Limit of Semi-Supervised Semantic Segmentation

1 code implementation14 Oct 2024 Lihe Yang, Zhen Zhao, Hengshuang Zhao

Despite the achieved progress, strangely, even in this flourishing era of numerous powerful vision models, almost all SSS works are still sticking to 1) using outdated ResNet encoders with small-scale ImageNet-1K pre-training, and 2) evaluation on simple Pascal and Cityscapes datasets.

Semi-supervised Change Detection Semi-Supervised Semantic Segmentation

Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation

1 code implementation19 Aug 2024 Yunxin Li, Haoyuan Shi, Baotian Hu, Longyue Wang, Jiashun Zhu, Jinyi Xu, Zhen Zhao, Min Zhang

These images are designed to maintain visual consistency across different scenes using a visual-language prompting method that combines scene descriptions and images of the appearing character and setting.

Image Generation Video Generation

Harmonizing Visual Text Comprehension and Generation

1 code implementation23 Jul 2024 Zhen Zhao, Jingqun Tang, Binghong Wu, Chunhui Lin, Shu Wei, Hao liu, Xin Tan, Zhizhong Zhang, Can Huang, Yuan Xie

Our work delineates the viability of an integrated approach to multimodal generation within the visual text domain, setting a foundation for subsequent inquiries.

multimodal generation Reading Comprehension +1

A Large Language Model-based multi-agent manufacturing system for intelligent shopfloor

no code implementations27 May 2024 Zhen Zhao, Dunbing Tang, Haihua Zhu, Zequn Zhang, Kai Chen, Changchun Liu, Yuchen Ji

To this end, a Large Language Model-based (LLM-based) multi-agent manufacturing system for intelligent shopfloor is proposed in the present study.

Language Modeling Language Modelling +2

PoinTramba: A Hybrid Transformer-Mamba Framework for Point Cloud Analysis

1 code implementation24 May 2024 Zicheng Wang, Zhenghao Chen, Yiming Wu, Zhen Zhao, Luping Zhou, Dong Xu

In this study, we introduce PoinTramba, a pioneering hybrid framework that synergies the analytical power of Transformer with the remarkable computational efficiency of Mamba for enhanced point cloud analysis.

Art Analysis Computational Efficiency +1

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering

1 code implementation20 May 2024 Jingqun Tang, Qi Liu, YongJie Ye, Jinghui Lu, Shu Wei, Chunhui Lin, Wanqing Li, Mohamad Fitri Faiz Bin Mahmood, Hao Feng, Zhen Zhao, Yanjie Wang, Yuliang Liu, Hao liu, Xiang Bai, Can Huang

Text-Centric Visual Question Answering (TEC-VQA) in its proper format not only facilitates human-machine interaction in text-centric visual environments but also serves as a de facto gold proxy to evaluate AI models in the domain of text-centric scene understanding.

Benchmarking Question Answering +4

SOEDiff: Efficient Distillation for Small Object Editing

no code implementations15 May 2024 Yiming Wu, Qihe Pan, Zhen Zhao, Zicheng Wang, Sifan Long, Ronghua Liang

In this paper, we delve into a new task known as small object editing (SOE), which focuses on text-based image inpainting within a constrained, small-sized area.

Image Inpainting Object

Training-Free Unsupervised Prompt for Vision-Language Models

1 code implementation25 Apr 2024 Sifan Long, Linbin Wang, Zhen Zhao, Zichang Tan, Yiming Wu, Shengsheng Wang, Jingdong Wang

In light of this, we propose Training-Free Unsupervised Prompts (TFUP), which maximally preserves the inherent representation capabilities and enhances them with a residual connection to similarity-based prediction probabilities in a training-free and labeling-free manner.

TextSquare: Scaling up Text-Centric Visual Instruction Tuning

no code implementations19 Apr 2024 Jingqun Tang, Chunhui Lin, Zhen Zhao, Shu Wei, Binghong Wu, Qi Liu, Hao Feng, Yang Li, Siqi Wang, Lei Liao, Wei Shi, Yuliang Liu, Hao liu, Yuan Xie, Xiang Bai, Can Huang

Text-centric visual question answering (VQA) has made great strides with the development of Multimodal Large Language Models (MLLMs), yet open-source models still fall short of leading models like GPT4V and Gemini, partly due to a lack of extensive, high-quality instruction tuning data.

Hallucination Hallucination Evaluation +2

SM$^3$: Self-Supervised Multi-task Modeling with Multi-view 2D Images for Articulated Objects

no code implementations17 Jan 2024 Haowen Wang, Zhen Zhao, Zhao Jin, Zhengping Che, Liang Qiao, Yakun Huang, Zhipeng Fan, XIUQUAN QIAO, Jian Tang

Reconstructing real-world objects and estimating their movable joint structures are pivotal technologies within the field of robotics.

Diversity

Roll With the Punches: Expansion and Shrinkage of Soft Label Selection for Semi-supervised Fine-Grained Learning

1 code implementation19 Dec 2023 Yue Duan, Zhen Zhao, Lei Qi, Luping Zhou, Lei Wang, Yinghuan Shi

While semi-supervised learning (SSL) has yielded promising results, the more realistic SSL scenario remains to be explored, in which the unlabeled data exhibits extremely high recognition difficulty, e. g., fine-grained visual classification in the context of SSL (SS-FGVC).

Fine-Grained Image Classification Semi-Supervised Image Classification

Alternate Diverse Teaching for Semi-supervised Medical Image Segmentation

1 code implementation29 Nov 2023 Zhen Zhao, Zicheng Wang, Longyue Wang, Dian Yu, Yixuan Yuan, Luping Zhou

To mitigate the confirmation bias from the diverse supervision, the core of AD-MT lies in two proposed modules: the Random Periodic Alternate (RPA) Updating Module and the Conflict-Combating Module (CCM).

Data Augmentation Image Segmentation +2

Clean Label Disentangling for Medical Image Segmentation with Noisy Labels

1 code implementation28 Nov 2023 Zicheng Wang, Zhen Zhao, Erjian Guo, Luping Zhou

Current methods focusing on medical image segmentation suffer from incorrect annotations, which is known as the noisy label issue.

Disentanglement Image Segmentation +2

Progressive Classifier and Feature Extractor Adaptation for Unsupervised Domain Adaptation on Point Clouds

1 code implementation27 Nov 2023 Zicheng Wang, Zhen Zhao, Yiming Wu, Luping Zhou, Dong Xu

In this work, we propose a novel framework that deeply couples the classifier and feature extractor adaption for 3D UDA, dubbed Progressive Classifier and Feature Extractor Adaptation (PCFEA).

Self-Supervised Learning Unsupervised Domain Adaptation

GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation

no code implementations25 Nov 2023 Zhanyu Wang, Longyue Wang, Zhen Zhao, Minghao Wu, Chenyang Lyu, Huayang Li, Deng Cai, Luping Zhou, Shuming Shi, Zhaopeng Tu

While the recent advances in Multimodal Large Language Models (MLLMs) constitute a significant leap forward in the field, these models are predominantly confined to the realm of input-side multimodal comprehension, lacking the capacity for multimodal content generation.

Instruction Following Language Modeling +9

Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer

1 code implementation CVPR 2024 Zhen Zhao, Jingqun Tang, Chunhui Lin, Binghong Wu, Can Huang, Hao liu, Xin Tan, Zhizhong Zhang, Yuan Xie

A straightforward solution is performing model fine-tuning tailored to a specific scenario, but it is computationally intensive and requires multiple model copies for various scenarios.

Diversity In-Context Learning +1

DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model

no code implementations2 Oct 2023 Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee. K. Wong, Zhenguo Li, Hengshuang Zhao

Multimodal large language models (MLLMs) have emerged as a prominent area of interest within the research community, given their proficiency in handling and reasoning with non-textual data, including images and videos.

Autonomous Driving Language Modeling +3

Enhancing Sample Utilization through Sample Adaptive Augmentation in Semi-Supervised Learning

1 code implementation ICCV 2023 Guan Gui, Zhen Zhao, Lei Qi, Luping Zhou, Lei Wang, Yinghuan Shi

Sample adaptive augmentation (SAA) is proposed for this stated purpose and consists of two modules: 1) sample selection module; 2) sample augmentation module.

Towards Semi-supervised Learning with Non-random Missing Labels

1 code implementation ICCV 2023 Yue Duan, Zhen Zhao, Lei Qi, Luping Zhou, Lei Wang, Yinghuan Shi

Semi-supervised learning (SSL) tackles the label missing problem by enabling the effective usage of unlabeled data.

Semi-Supervised Image Classification

DTF-Net: Category-Level Pose Estimation and Shape Reconstruction via Deformable Template Field

no code implementations4 Aug 2023 Haowen Wang, Zhipeng Fan, Zhen Zhao, Zhengping Che, Zhiyuan Xu, Dong Liu, Feifei Feng, Yakun Huang, XIUQUAN QIAO, Jian Tang

We introduce a pose regression module that shares the deformation features and template codes from the fields to estimate the accurate 6D pose of each object in the scene.

Object Pose Estimation

Conflict-Based Cross-View Consistency for Semi-Supervised Semantic Segmentation

2 code implementations CVPR 2023 Zicheng Wang, Zhen Zhao, Xiaoxia Xing, Dong Xu, Xiangyu Kong, Luping Zhou

In this work, we propose a new conflict-based cross-view consistency (CCVC) method based on a two-branch co-training framework which aims at enforcing the two sub-nets to learn informative features from irrelevant views.

Semi-Supervised Semantic Segmentation

Rethinking Gradient Projection Continual Learning: Stability / Plasticity Feature Space Decoupling

no code implementations CVPR 2023 Zhen Zhao, Zhizhong Zhang, Xin Tan, Jun Liu, Yanyun Qu, Yuan Xie, Lizhuang Ma

In this paper, we propose a space decoupling (SD) algorithm to decouple the feature space into a pair of complementary subspaces, i. e., the stability space I, and the plasticity space R. I is established by conducting space intersection between the historic and current feature space, and thus I contains more task-shared bases.

Continual Learning

Augmentation Matters: A Simple-yet-Effective Approach to Semi-supervised Semantic Segmentation

1 code implementation CVPR 2023 Zhen Zhao, Lihe Yang, Sifan Long, Jimin Pi, Luping Zhou, Jingdong Wang

Differently, in this work, we follow a standard teacher-student framework and propose AugSeg, a simple and clean approach that focuses mainly on data perturbations to boost the SSS performance.

Semi-Supervised Semantic Segmentation

Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers

1 code implementation CVPR 2023 Sifan Long, Zhen Zhao, Jimin Pi, Shengsheng Wang, Jingdong Wang

In this paper, we emphasize the cruciality of diverse global semantics and propose an efficient token decoupling and merging method that can jointly consider the token importance and diversity for token pruning.

Computational Efficiency Diversity +1

Instance-specific and Model-adaptive Supervision for Semi-supervised Semantic Segmentation

1 code implementation CVPR 2023 Zhen Zhao, Sifan Long, Jimin Pi, Jingdong Wang, Luping Zhou

Relying on the model's performance, iMAS employs a class-weighted symmetric intersection-over-union to evaluate quantitative hardness of each unlabeled instance and supervises the training on unlabeled data in a model-adaptive manner.

Segmentation Semi-Supervised Semantic Segmentation

MutexMatch: Semi-Supervised Learning with Mutex-Based Consistency Regularization

1 code implementation27 Mar 2022 Yue Duan, Zhen Zhao, Lei Qi, Lei Wang, Luping Zhou, Yinghuan Shi, Yang Gao

The core issue in semi-supervised learning (SSL) lies in how to effectively leverage unlabeled data, whereas most existing methods tend to put a great emphasis on the utilization of high-confidence samples yet seldom fully explore the usage of low-confidence samples.

Semi-Supervised Image Classification

The Winning Solution to the iFLYTEK Challenge 2021 Cultivated Land Extraction from High-Resolution Remote Sensing Image

1 code implementation22 Feb 2022 Zhen Zhao, Yuqiu Liu, Gang Zhang, Liang Tang, Xiaolin Hu

This report introduces our solution to the iFLYTEK challenge 2021 cultivated land extraction from high-resolution remote sensing image.

Instance Segmentation Segmentation +1

Bi-Dimensional Feature Alignment for Cross-Domain Object Detection

no code implementations14 Nov 2020 Zhen Zhao, Yuhong Guo, Jieping Ye

Recently the problem of cross-domain object detection has started drawing attention in the computer vision community.

Object Object Detection +1

Ensemble Model with Batch Spectral Regularization and Data Blending for Cross-Domain Few-Shot Learning with Unlabeled Data

1 code implementation8 Jun 2020 Zhen Zhao, Bingyu Liu, Yuhong Guo, Jieping Ye

In this paper, we present our proposed ensemble model with batch spectral regularization and data blending mechanisms for the Track 2 problem of the cross-domain few-shot learning (CD-FSL) challenge.

cross-domain few-shot learning

Feature Transformation Ensemble Model with Batch Spectral Regularization for Cross-Domain Few-Shot Classification

no code implementations18 May 2020 Bingyu Liu, Zhen Zhao, Zhenpeng Li, Jianan Jiang, Yuhong Guo, Jieping Ye

In this paper, we propose a feature transformation ensemble model with batch spectral regularization for the Cross-domain few-shot learning (CD-FSL) challenge.

cross-domain few-shot learning Data Augmentation +2

Adaptive Object Detection with Dual Multi-Label Prediction

no code implementations ECCV 2020 Zhen Zhao, Yuhong Guo, Haifeng Shen, Jieping Ye

In this paper, we propose a novel end-to-end unsupervised deep domain adaptation model for adaptive object detection by exploiting multi-label object recognition as a dual auxiliary task.

Image-to-Image Translation Object +5

Mutual Learning Network for Multi-Source Domain Adaptation

no code implementations29 Mar 2020 Zhenpeng Li, Zhen Zhao, Yuhong Guo, Haifeng Shen, Jieping Ye

However, in practice the labeled data can come from multiple source domains with different distributions.

Unsupervised Domain Adaptation

Fast Inference in Capsule Networks Using Accumulated Routing Coefficients

no code implementations15 Apr 2019 Zhen Zhao, Ashley Kleinhans, Gursharan Sandhu, Ishan Patel, K. P. Unnikrishnan

Afterward, the routing coefficients associated with the training examples are accumulated offline and used to create a set of "master" routing coefficients.

Object Rotated MNIST

Capsule Networks with Max-Min Normalization

no code implementations22 Mar 2019 Zhen Zhao, Ashley Kleinhans, Gursharan Sandhu, Ishan Patel, K. P. Unnikrishnan

Capsule Networks (CapsNet) use the Softmax function to convert the logits of the routing coefficients into a set of normalized values that signify the assignment probabilities between capsules in adjacent layers.

CT Super-resolution GAN Constrained by the Identical, Residual, and Cycle Learning Ensemble(GAN-CIRCLE)

no code implementations10 Aug 2018 Chenyu You, Guang Li, Yi Zhang, Xiaoliu Zhang, Hongming Shan, Shenghong Ju, Zhen Zhao, Zhuiyang Zhang, Wenxiang Cong, Michael W. Vannier, Punam K. Saha, Ge Wang

Specifically, with the generative adversarial network (GAN) as the building block, we enforce the cycle-consistency in terms of the Wasserstein distance to establish a nonlinear end-to-end mapping from noisy LR input images to denoised and deblurred HR outputs.

Computed Tomography (CT) Generative Adversarial Network +2

Structure-sensitive Multi-scale Deep Neural Network for Low-Dose CT Denoising

no code implementations2 May 2018 Chenyu You, Qingsong Yang, Hongming Shan, Lars Gjesteby, Guang Li, Shenghong Ju, Zhuiyang Zhang, Zhen Zhao, Yi Zhang, Wenxiang Cong, Ge Wang

However, the radiation dose reduction compromises the signal-to-noise ratio (SNR), leading to strong noise and artifacts that down-grade CT image quality.

Computed Tomography (CT) Denoising

Cannot find the paper you are looking for? You can Submit a new open access paper.