Search Results for author: Cheng Jin

Found 55 papers, 25 papers with code

Towards Robust Influence Functions with Flat Validation Minima

no code implementations25 May 2025 Xichen Ye, Yifan Wu, Weizhong Zhang, Cheng Jin, Yifan Chen

In this work, we establish a theoretical connection between influence estimation error, validation set risk, and its sharpness, underscoring the importance of flat validation minima for accurate influence estimation.

Discovering Pathology Rationale and Token Allocation for Efficient Multimodal Pathology Reasoning

no code implementations21 May 2025 Zhe Xu, Cheng Jin, Yihui Wang, Ziyi Liu, Hao Chen

Multimodal pathological image understanding has garnered widespread interest due to its potential to improve diagnostic accuracy and enable personalized treatment through integrated visual and textual data.

Computational Efficiency Diagnostic +3

Angle Domain Guidance: Latent Diffusion Requires Rotation Rather Than Extrapolation

1 code implementation21 May 2025 Cheng Jin, Zhenyu Xiao, Chutao Liu, Yuantao Gu

However, under high guidance weights, where text-image alignment is significantly enhanced, CFG also leads to pronounced color distortions in the generated images.

Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

1 code implementation6 May 2025 Yibin Wang, Zhimin Li, Yuhang Zang, Chunyu Wang, Qinglin Lu, Cheng Jin, Jiaqi Wang

To this end, this paper proposes UnifiedReward-Think, the first unified multimodal CoT-based reward model, capable of multi-dimensional, step-by-step long-chain reasoning for both visual understanding and generation reward tasks.

Image Generation

Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth

no code implementations2 May 2025 Changhai Zhou, Yuhua Zhou, Qian Qiao, Weizhong Zhang, Cheng Jin

QLoRA effectively combines low-bit quantization and LoRA to achieve memory-friendly fine-tuning for large language models (LLM).

GSM8K Quantization

Unified Reward Model for Multimodal Understanding and Generation

1 code implementation7 Mar 2025 Yibin Wang, Yuhang Zang, Hao Li, Cheng Jin, Jiaqi Wang

Recent advances in human preference alignment have significantly enhanced multimodal generation and understanding.

Image Generation model +3

PsychBench: A comprehensive and professional benchmark for evaluating the performance of LLM-assisted psychiatric clinical practice

no code implementations28 Feb 2025 Ruoxi Wang, Shuyu Liu, Ling Zhang, Xuequan Zhu, Rui Yang, Xinzhu Zhou, Fei Wu, Zhi Yang, Cheng Jin, Gang Wang

In response to this gap, by incorporating clinical demands in psychiatry and clinical data, we proposed a benchmarking system, PsychBench, to evaluate the practical performance of LLMs in psychiatric clinical settings.

Benchmarking Diagnostic

Population Normalization for Federated Learning

no code implementations CVPR 2025 Zhuoyao Wang, Fan Yi, Peizhu Gong, Caitou He, Cheng Jin, Weizhong Zhang

Second, estimating statistics from a mini-batch is often imprecise since the batch size has to be small in resource-limited clients.

Federated Learning

HammerBench: Fine-Grained Function-Calling Evaluation in Real Mobile Device Scenarios

1 code implementation21 Dec 2024 Jiamu Zhou, Muning Wen, Xiaoyun Mo, Haoyu Zhang, Qiqiang Lin, Cheng Jin, Xihuai Wang, Weinan Zhang, Qiuying Peng, Jun Wang

Evaluating the performance of LLMs in multi-turn human-agent interactions presents significant challenges, particularly due to the complexity and variability of user behavior.

Benchmarking

Optimized Gradient Clipping for Noisy Label Learning

1 code implementation12 Dec 2024 Xichen Ye, Yifan Wu, Weizhong Zhang, Xiaoqiang Li, Yifan Chen, Cheng Jin

Previous research has shown that constraining the gradient of loss function with respect to model-predicted probabilities can enhance the model robustness against noisy labels.

LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment

no code implementations6 Dec 2024 Yibin Wang, Zhiyu Tan, Junyan Wang, Xiaomeng Yang, Cheng Jin, Hao Li

Based on this, we train a reward model LiFT-Critic to learn reward function effectively, which serves as a proxy for human judgment, measuring the alignment between given videos and human expectations.

HMIL: Hierarchical Multi-Instance Learning for Fine-Grained Whole Slide Image Classification

1 code implementation12 Nov 2024 Cheng Jin, Luyang Luo, Huangjing Lin, Jun Hou, Hao Chen

Fine-grained classification of whole slide images (WSIs) is essential in precision oncology, enabling precise cancer diagnosis and personalized treatment strategies.

Contrastive Learning image-classification +3

GameGen-X: Interactive Open-world Game Video Generation

1 code implementation1 Nov 2024 Haoxuan Che, Xuanhua He, Quande Liu, Cheng Jin, Hao Chen

To realize this vision, we first collected and built an Open-World Video Game Dataset from scratch.

Text-to-Video Generation Video Generation

MagicFace: Training-free Universal-Style Human Image Customized Synthesis

no code implementations14 Aug 2024 Yibin Wang, Weizhong Zhang, Cheng Jin

In the first stage, RSA enables the latent image to query features from all reference concepts simultaneously, extracting the overall semantic understanding to facilitate the initial semantic layout establishment.

Attribute Image Generation +1

MultiColor: Image Colorization by Learning from Multiple Color Spaces

no code implementations8 Aug 2024 Xiangcheng Du, Zhao Zhou, Yanlong Wang, Zhuoyao Wang, Yingbin Zheng, Cheng Jin

Deep networks have shown impressive performance in the image restoration tasks, such as image colorization.

Colorization Decoder +2

A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model

2 code implementations22 Jul 2024 Yingxue Xu, Yihui Wang, Fengtao Zhou, Jiabo Ma, Cheng Jin, Shu Yang, Jinbang Li, Zhengyu Zhang, Chenglong Zhao, Huajun Zhou, Zhenhui Li, Huangjing Lin, Xin Wang, Jiguang Wang, Anjia Han, Ronald Cheong Kin Chan, Li Liang, Xiuming Zhang, Hao Chen

In this study, for the first time, we develop a pathology foundation model incorporating three levels of modalities: pathology slides, pathology reports, and gene expression data, which resulted in 26, 169 slide-level modality pairs from 10, 275 patients across 32 cancer types, amounting to over 116 million pathological patch images.

Diagnostic whole slide images

Minutes to Seconds: Speeded-up DDPM-based Image Inpainting with Coarse-to-Fine Sampling

1 code implementation8 Jul 2024 Lintao Zhang, Xiangcheng Du, LeoWu TomyEnrique, Yiqun Wang, Yingbin Zheng, Cheng Jin

Second, we introduce a skip-step sampling scheme of Denoising Diffusion Implicit Models (DDIM) for the denoising process.

Denoising Image Inpainting

Fine-Grained Scene Image Classification with Modality-Agnostic Adapter

1 code implementation3 Jul 2024 Yiqun Wang, Zhao Zhou, Xiangcheng Du, Xingjiao Wu, Yingbin Zheng, Cheng Jin

In this paper, we present a new multi-modal feature fusion approach named MAA (Modality-Agnostic Adapter), trying to make the model learn the importance of different modalities in different cases adaptively, without giving a prior setting in the model architecture.

image-classification Image Classification

Unleashing the Denoising Capability of Diffusion Prior for Solving Inverse Problems

1 code implementation11 Jun 2024 Jiawei Zhang, Jiaxin Zhuang, Cheng Jin, Gen Li, Yuantao Gu

The proposed algorithm, termed ProjDiff, effectively harnesses the prior information and the denoising capability of a pre-trained diffusion model within the optimization framework.

Denoising Image Restoration

DreamText: High Fidelity Scene Text Synthesis

1 code implementation CVPR 2025 Yibin Wang, Weizhong Zhang, Cheng Jin

Our key idea is to reconstruct the diffusion training process, introducing more refined guidance tailored to this task, to expose and rectify the model's attention at the character level and strengthen its learning of text regions.

Robust Fine-tuning for Pre-trained 3D Point Cloud Models

no code implementations25 Apr 2024 Zhibo Zhang, Ximing Yang, Weizhong Zhang, Cheng Jin

We apply this robust fine-tuning method to mainstream 3D point cloud pre-trained models and evaluate the quality of model parameters and the degradation of downstream task performance.

PrimeComposer: Faster Progressively Combined Diffusion for Image Composition with Attention Steering

1 code implementation8 Mar 2024 Yibin Wang, Weizhong Zhang, Jianwei Zheng, Cheng Jin

This prior information is encoded into the attention weights, which are then integrated into the self-attention layers of the generator to guide the synthesis process.

Out-of-Distribution Detection using Neural Activation Prior

no code implementations28 Feb 2024 Weilin Wan, Weizhong Zhang, Quan Zhou, Fan Yi, Cheng Jin

Our neural activation prior is based on a key observation that, for a channel before the global pooling layer of a fully trained neural network, the probability of a few neurons being activated with a large response by an in-distribution (ID) sample is significantly higher than that by an OOD sample.

Out-of-Distribution Detection

EPA: Neural Collapse Inspired Robust Out-of-Distribution Detector

no code implementations3 Jan 2024 Jiawei Zhang, Yufan Chen, Cheng Jin, Lei Zhu, Yuantao Gu

Out-of-distribution (OOD) detection plays a crucial role in ensuring the security of neural networks.

Out of Distribution (OOD) Detection

Point Cloud Part Editing: Segmentation, Generation, Assembly, and Selection

1 code implementation19 Dec 2023 Kaiyi Zhang, Yang Chen, Ximing Yang, Weizhong Zhang, Cheng Jin

Based on this process, we introduce SGAS, a model for part editing that employs two strategies: feature disentanglement and constraint.

Disentanglement Diversity +1

Shapley Values-enabled Progressive Pseudo Bag Augmentation for Whole Slide Image Classification

2 code implementations9 Dec 2023 Renao Yan, Qiehe Sun, Cheng Jin, Yiqing Liu, Yonghong He, Tian Guan, Hao Chen

While most of the conventional MIL methods use attention scores to estimate instance importance scores (IIS) which contribute to the prediction of the slide labels, these often lead to skewed attention distributions and inaccuracies in identifying crucial instances.

image-classification Image Classification +1

UMAAF: Unveiling Aesthetics via Multifarious Attributes of Images

no code implementations19 Nov 2023 Weijie Li, Yitian Wan, Xingjiao Wu, Junjie Xu, Cheng Jin, Liang He

Then, to better utilize image attributes in aesthetic assessment, we propose the Unified Multi-attribute Aesthetic Assessment Framework (UMAAF) to model both absolute and relative attributes of images.

Attribute

High-fidelity Person-centric Subject-to-Image Synthesis

1 code implementation CVPR 2024 Yibin Wang, Weizhong Zhang, Jianwei Zheng, Cheng Jin

Specifically, we first develop two specialized pre-trained diffusion models, i. e., Text-driven Diffusion Model (TDM) and Subject-augmented Diffusion Model (SDM), for scene and person generation, respectively.

Image Generation Scene Generation

DCQA: Document-Level Chart Question Answering towards Complex Reasoning and Common-Sense Understanding

1 code implementation29 Oct 2023 Anran Wu, Luwei Xiao, Xingjiao Wu, Shuwen Yang, Junjie Xu, Zisong Zhuang, Nian Xie, Cheng Jin, Liang He

Our DCQA dataset is expected to foster research on understanding visualizations in documents, especially for scenarios that require complex reasoning for charts in the visually-rich document.

Answer Generation Chart Question Answering +5

Progressive Evidence Refinement for Open-domain Multimodal Retrieval Question Answering

no code implementations15 Oct 2023 Shuwen Yang, Anran Wu, Xingjiao Wu, Luwei Xiao, Tianlong Ma, Cheng Jin, Liang He

Firstly, utilizing compressed evidence features as input to the model results in the loss of fine-grained information within the evidence.

Contrastive Learning Logical Sequence +2

ProtoEM: A Prototype-Enhanced Matching Framework for Event Relation Extraction

no code implementations22 Sep 2023 Zhilei Hu, Zixuan Li, Daozhu Xu, Long Bai, Cheng Jin, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng

To comprehensively understand their intrinsic semantics, in this paper, we obtain prototype representations for each type of event relation and propose a Prototype-Enhanced Matching (ProtoEM) framework for the joint extraction of multiple kinds of event relations.

Event Relation Extraction Graph Neural Network +2

Linear Speedup of Incremental Aggregated Gradient Methods on Streaming Data

no code implementations10 Sep 2023 Xiaolu Wang, Cheng Jin, Hoi-To Wai, Yuantao Gu

This paper considers a type of incremental aggregated gradient (IAG) method for large-scale distributed optimization.

Distributed Optimization

WYTIWYR: A User Intent-Aware Framework with Multi-modal Inputs for Visualization Retrieval

1 code implementation14 Apr 2023 Shishi Xiao, Yihan Hou, Cheng Jin, Wei Zeng

Retrieving charts from a large corpus is a fundamental task that can benefit numerous applications such as visualization recommendations. The retrieved results are expected to conform to both explicit visual attributes (e. g., chart type, colormap) and implicit user intents (e. g., design style, context information) that vary upon application scenarios.

Retrieval Zero-Shot Learning

DDT: Dual-branch Deformable Transformer for Image Denoising

1 code implementation13 Apr 2023 Kangliang Liu, Xiangcheng Du, Sijie Liu, Yingbin Zheng, Xingjiao Wu, Cheng Jin

Transformer is beneficial for image denoising tasks since it can model long-range dependencies to overcome the limitations presented by inductive convolutional biases.

Image Denoising

Label-Efficient Deep Learning in Medical Image Analysis: Challenges and Future Directions

no code implementations22 Mar 2023 Cheng Jin, Zhengrui Guo, Yi Lin, Luyang Luo, Hao Chen

Deep learning has significantly advanced medical imaging analysis (MIA), achieving state-of-the-art performance across diverse clinical tasks.

Medical Image Analysis Survey +1

Aggregated Text Transformer for Scene Text Detection

no code implementations25 Nov 2022 Zhao Zhou, Xiangcheng Du, Yingbin Zheng, Cheng Jin

We present the Aggregated Text TRansformer(ATTR), which is designed to represent texts in scene images with a multi-scale self-attention mechanism.

Decoder Scene Text Detection +1

Progressive Scene Text Erasing with Self-Supervision

no code implementations23 Jul 2022 Xiangcheng Du, Zhao Zhou, Yingbin Zheng, Xingjiao Wu, Tianlong Ma, Cheng Jin

Scene text erasing seeks to erase text contents from scene images and current state-of-the-art text erasing models are trained on large-scale synthetic data.

SIT: A Bionic and Non-Linear Neuron for Spiking Neural Network

no code implementations30 Mar 2022 Cheng Jin, Rui-Jie Zhu, Xiao Wu, Liang-Jian Deng

Spiking Neural Networks (SNNs) have piqued researchers' interest because of their capacity to process temporal information and low power consumption.

image-classification Image Classification

SRPCN: Structure Retrieval based Point Completion Network

no code implementations6 Feb 2022 Kaiyi Zhang, Ximing Yang, Yuan Wu, Cheng Jin

Besides, the missing patterns are diverse in reality, but existing methods can only handle fixed ones, which means a poor generalization ability.

Decoder Point Cloud Completion +1

Generate Point Clouds with Multiscale Details from Graph-Represented Structures

no code implementations13 Dec 2021 Ximing Yang, Zhibo Zhang, Zhengfu He, Cheng Jin

As details are missing in most representations of structures, the lack of controllability to more information is one of the major weaknesses in structure-based controllable point cloud generation.

Miscellaneous Point Cloud Generation

Attention-based Transformation from Latent Features to Point Clouds

1 code implementation10 Dec 2021 Kaiyi Zhang, Ximing Yang, Yuan Wu, Cheng Jin

The points generated by AXform do not have the strong 2-manifold constraint, which improves the generation of non-smooth surfaces.

Point Cloud Completion Unsupervised Semantic Segmentation

Safe Distillation Box

1 code implementation5 Dec 2021 Jingwen Ye, Yining Mao, Jie Song, Xinchao Wang, Cheng Jin, Mingli Song

In other words, all users may employ a model in SDB for inference, but only authorized users get access to KD from the model.

Knowledge Distillation

Document Layout Analysis with Aesthetic-Guided Image Augmentation

no code implementations27 Nov 2021 Tianlong Ma, Xingjiao Wu, Xin Li, Xiangcheng Du, Zhao Zhou, Liang Xue, Cheng Jin

To measure the proposed image layer modeling method, we propose a manually-labeled non-Manhattan layout fine-grained segmentation dataset named FPD.

Document Layout Analysis document understanding +2

Adaptive Charging Networks: A Framework for Smart Electric Vehicle Charging

1 code implementation4 Dec 2020 Zachary J. Lee, George Lee, Ted Lee, Cheng Jin, Rand Lee, Zhi Low, Daniel Chang, Christine Ortega, Steven H. Low

We describe the architecture and algorithms of the Adaptive Charging Network (ACN), which was first deployed on the Caltech campus in early 2016 and is currently operating at over 100 other sites in the United States.

Model Predictive Control Scheduling

One-sample Guided Object Representation Disassembling

no code implementations NeurIPS 2020 Zunlei Feng, Yongming He, Xinchao Wang, Xin Gao, Jie Lei, Cheng Jin, Mingli Song

In this paper, we introduce the One-sample Guided Object Representation Disassembling (One-GORD) method, which only requires one annotated sample for each object category to learn disassembled object representation from unannotated images.

Data Augmentation image-classification +2

GASNet: Weakly-supervised Framework for COVID-19 Lesion Segmentation

no code implementations19 Oct 2020 Zhanwei Xu, Yukun Cao, Cheng Jin, Guozhu Shao, Xiaoqing Liu, Jie zhou, Heshui Shi, Jianjiang Feng

Segmentation of infected areas in chest CT volumes is of great significance for further diagnosis and treatment of COVID-19 patients.

Image Segmentation Lesion Segmentation +2

TripletGAN: Training Generative Model with Triplet Loss

no code implementations14 Nov 2017 Gongze Cao, Yezhou Yang, Jie Lei, Cheng Jin, Yang Liu, Mingli Song

As an effective way of metric learning, triplet loss has been widely used in many deep learning tasks, including face recognition and person-ReID, leading to many states of the arts.

Face Recognition General Classification +3

Cannot find the paper you are looking for? You can Submit a new open access paper.