Search Results for author: Zhisheng Zhong

Found 20 papers, 18 papers with code

ARPO:End-to-End Policy Optimization for GUI Agents with Experience Replay

1 code implementation22 May 2025 Fanbin Lu, Zhisheng Zhong, Shu Liu, Chi-Wing Fu, Jiaya Jia

In this paper, we investigate end-to-end policy optimization for vision-language-based GUI agents with the aim of improving performance on complex, long-horizon computer tasks.

reinforcement-learning Reinforcement Learning +1

STEVE: A Step Verification Pipeline for Computer-use Agent Training

1 code implementation16 Mar 2025 Fanbin Lu, Zhisheng Zhong, Ziqin Wei, Shu Liu, Chi-Wing Fu, Jiaya Jia

Recent advances in data scaling law inspire us to train computer-use agents with a scaled instruction set, yet using behavior cloning to train agents still requires immense high-quality trajectories.

Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement

3 code implementations9 Mar 2025 Yuqi Liu, Bohao Peng, Zhisheng Zhong, Zihao Yue, Fanbin Lu, Bei Yu, Jiaya Jia

Traditional methods for reasoning segmentation rely on supervised fine-tuning with categorical labels and simple descriptions, limiting its out-of-domain generalization and lacking explicit reasoning processes.

Domain Generalization Open Vocabulary Object Detection +6

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

1 code implementation12 Dec 2024 Zhisheng Zhong, Chengyao Wang, Yuqi Liu, Senqiao Yang, Longxiang Tang, Yuechen Zhang, Jingyao Li, Tianyuan Qu, Yanwei Li, Yukang Chen, Shaozuo Yu, Sitong Wu, Eric Lo, Shu Liu, Jiaya Jia

As Multi-modal Large Language Models (MLLMs) evolve, expanding beyond single-domain capabilities is essential to meet the demands for more versatile and efficient AI.

EgoSchema +6

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

2 code implementations27 Mar 2024 Yanwei Li, Yuechen Zhang, Chengyao Wang, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia

We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i. e., high-resolution visual tokens, high-quality data, and VLM-guided generation.

Image Classification Image Comprehension +4

Decoupled Kullback-Leibler Divergence Loss

4 code implementations23 May 2023 Jiequan Cui, Zhuotao Tian, Zhisheng Zhong, Xiaojuan Qi, Bei Yu, Hanwang Zhang

In this paper, we delve deeper into the Kullback-Leibler (KL) Divergence loss and mathematically prove that it is equivalent to the Decoupled Kullback-Leibler (DKL) Divergence loss that consists of 1) a weighted Mean Square Error (wMSE) loss and 2) a Cross-Entropy loss incorporating soft labels.

Adversarial Defense Adversarial Robustness +1

Understanding Imbalanced Semantic Segmentation Through Neural Collapse

2 code implementations CVPR 2023 Zhisheng Zhong, Jiequan Cui, Yibo Yang, Xiaoyang Wu, Xiaojuan Qi, Xiangyu Zhang, Jiaya Jia

Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers.

3D Semantic Segmentation Segmentation

Generalized Parametric Contrastive Learning

4 code implementations26 Sep 2022 Jiequan Cui, Zhisheng Zhong, Zhuotao Tian, Shu Liu, Bei Yu, Jiaya Jia

Based on theoretical analysis, we observe that supervised contrastive loss tends to bias high-frequency classes and thus increases the difficulty of imbalanced learning.

Contrastive Learning Domain Generalization +3

Rebalanced Siamese Contrastive Mining for Long-Tailed Recognition

2 code implementations22 Mar 2022 Zhisheng Zhong, Jiequan Cui, Zeming Li, Eric Lo, Jian Sun, Jiaya Jia

Given the promising performance of contrastive learning, we propose Rebalanced Siamese Contrastive Mining (ResCom) to tackle imbalanced recognition.

Contrastive Learning Long-tail Learning +1

Improving Calibration for Long-Tailed Recognition

5 code implementations CVPR 2021 Zhisheng Zhong, Jiequan Cui, Shu Liu, Jiaya Jia

Motivated by the fact that predicted probability distributions of classes are highly related to the numbers of class instances, we propose label-aware smoothing to deal with different degrees of over-confidence for classes and improve classifier learning.

Long-tail Learning Representation Learning

ResLT: Residual Learning for Long-tailed Recognition

5 code implementations26 Jan 2021 Jiequan Cui, Shu Liu, Zhuotao Tian, Zhisheng Zhong, Jiaya Jia

From this perspective, the trivial solution utilizes different branches for the head, medium, and tail classes respectively, and then sums their outputs as the final results is not feasible.

Long-tail Learning

Channel-Level Variable Quantization Network for Deep Image Compression

1 code implementation15 Jul 2020 Zhisheng Zhong, Hiroaki Akutsu, Kiyoharu Aizawa

In this paper, we propose a channel-level variable quantization network to dynamically allocate more bitrates for significant channels and withdraw bitrates for negligible channels.

Decoder Image Compression +1

ADA-Tucker: Compressing Deep Neural Networks via Adaptive Dimension Adjustment Tucker Decomposition

no code implementations18 Jun 2019 Zhisheng Zhong, Fangyin Wei, Zhouchen Lin, Chao Zhang

Furthermore, we propose that weight tensors in networks with proper order and balanced dimension are easier to be compressed.

Differentiable Linearized ADMM

1 code implementation15 May 2019 Xingyu Xie, Jianlong Wu, Zhisheng Zhong, Guangcan Liu, Zhouchen Lin

Recently, a number of learning-based optimization methods that combine data-driven architectures with the classical optimization algorithms have been proposed and explored, showing superior empirical performance in solving various ill-posed inverse problems, but there is still a scarcity of rigorous analysis about the convergence behaviors of learning-based optimization.

Joint Sub-bands Learning with Clique Structures for Wavelet Domain Super-Resolution

no code implementations NeurIPS 2018 Zhisheng Zhong, Tiancheng Shen, Yibo Yang, Zhouchen Lin, Chao Zhang

To solve these problems, we propose the Super-Resolution CliqueNet (SRCliqueNet) to reconstruct the high resolution (HR) image with better textural details in the wavelet domain.

Image Super-Resolution

Convolutional Neural Networks with Alternately Updated Clique

3 code implementations CVPR 2018 Yibo Yang, Zhisheng Zhong, Tiancheng Shen, Zhouchen Lin

In contrast to prior networks, there are both forward and backward connections between any two layers in the same block.

Cannot find the paper you are looking for? You can Submit a new open access paper.