no code implementations • 26 May 2025 • Jiabo Ma, Yingxue Xu, Fengtao Zhou, Yihui Wang, Cheng Jin, Zhengrui Guo, Jianfeng Wu, On Ki Tang, Huajun Zhou, Xi Wang, Luyang Luo, Zhengyu Zhang, Du Cai, Zizhao Gao, Wei Wang, Yueping Liu, Jiankun He, Jing Cui, Zhenhui Li, Jing Zhang, Feng Gao, Xiuming Zhang, Li Liang, Ronald Cheong Kin Chan, Zhe Wang, Hao Chen
Currently, our evaluation of 19 PFMs shows that Virchow2 and H-Optimus-1 are the most effective models overall.
no code implementations • 25 May 2025 • Xichen Ye, Yifan Wu, Weizhong Zhang, Cheng Jin, Yifan Chen
In this work, we establish a theoretical connection between influence estimation error, validation set risk, and its sharpness, underscoring the importance of flat validation minima for accurate influence estimation.
no code implementations • 21 May 2025 • Zhe Xu, Cheng Jin, Yihui Wang, Ziyi Liu, Hao Chen
Multimodal pathological image understanding has garnered widespread interest due to its potential to improve diagnostic accuracy and enable personalized treatment through integrated visual and textual data.
1 code implementation • 21 May 2025 • Cheng Jin, Zhenyu Xiao, Chutao Liu, Yuantao Gu
However, under high guidance weights, where text-image alignment is significantly enhanced, CFG also leads to pronounced color distortions in the generated images.
1 code implementation • 6 May 2025 • Yibin Wang, Zhimin Li, Yuhang Zang, Chunyu Wang, Qinglin Lu, Cheng Jin, Jiaqi Wang
To this end, this paper proposes UnifiedReward-Think, the first unified multimodal CoT-based reward model, capable of multi-dimensional, step-by-step long-chain reasoning for both visual understanding and generation reward tasks.
no code implementations • 2 May 2025 • Changhai Zhou, Yuhua Zhou, Qian Qiao, Weizhong Zhang, Cheng Jin
QLoRA effectively combines low-bit quantization and LoRA to achieve memory-friendly fine-tuning for large language models (LLM).
1 code implementation • 7 Mar 2025 • Yibin Wang, Yuhang Zang, Hao Li, Cheng Jin, Jiaqi Wang
Recent advances in human preference alignment have significantly enhanced multimodal generation and understanding.
no code implementations • 28 Feb 2025 • Ruoxi Wang, Shuyu Liu, Ling Zhang, Xuequan Zhu, Rui Yang, Xinzhu Zhou, Fei Wu, Zhi Yang, Cheng Jin, Gang Wang
In response to this gap, by incorporating clinical demands in psychiatry and clinical data, we proposed a benchmarking system, PsychBench, to evaluate the practical performance of LLMs in psychiatric clinical settings.
no code implementations • 12 Feb 2025 • Hao Jiang, Cheng Jin, Huangjing Lin, Yanning Zhou, Xi Wang, Jiabo Ma, Li Ding, Jun Hou, Runsheng Liu, Zhizhong Chai, Luyang Luo, Huijuan Shi, Yinling Qian, Qiong Wang, Changzhong Li, Anjia Han, Ronald Cheong Kin Chan, Hao Chen
In retrospective cohorts, Smart-CCS achieved an overall area under the curve (AUC) value of 0. 965 and sensitivity of 0. 913 for cancer screening on 11 internal test datasets.
no code implementations • CVPR 2025 • Zhuoyao Wang, Fan Yi, Peizhu Gong, Caitou He, Cheng Jin, Weizhong Zhang
Second, estimating statistics from a mini-batch is often imprecise since the batch size has to be small in resource-limited clients.
1 code implementation • 21 Dec 2024 • Jiamu Zhou, Muning Wen, Xiaoyun Mo, Haoyu Zhang, Qiqiang Lin, Cheng Jin, Xihuai Wang, Weinan Zhang, Qiuying Peng, Jun Wang
Evaluating the performance of LLMs in multi-turn human-agent interactions presents significant challenges, particularly due to the complexity and variability of user behavior.
1 code implementation • 12 Dec 2024 • Xichen Ye, Yifan Wu, Weizhong Zhang, Xiaoqiang Li, Yifan Chen, Cheng Jin
Previous research has shown that constraining the gradient of loss function with respect to model-predicted probabilities can enhance the model robustness against noisy labels.
no code implementations • 6 Dec 2024 • Yibin Wang, Zhiyu Tan, Junyan Wang, Xiaomeng Yang, Cheng Jin, Hao Li
Based on this, we train a reward model LiFT-Critic to learn reward function effectively, which serves as a proxy for human judgment, measuring the alignment between given videos and human expectations.
1 code implementation • 12 Nov 2024 • Cheng Jin, Luyang Luo, Huangjing Lin, Jun Hou, Hao Chen
Fine-grained classification of whole slide images (WSIs) is essential in precision oncology, enabling precise cancer diagnosis and personalized treatment strategies.
1 code implementation • 1 Nov 2024 • Haoxuan Che, Xuanhua He, Quande Liu, Cheng Jin, Hao Chen
To realize this vision, we first collected and built an Open-World Video Game Dataset from scratch.
no code implementations • 14 Aug 2024 • Yibin Wang, Weizhong Zhang, Cheng Jin
In the first stage, RSA enables the latent image to query features from all reference concepts simultaneously, extracting the overall semantic understanding to facilitate the initial semantic layout establishment.
no code implementations • 8 Aug 2024 • Xiangcheng Du, Zhao Zhou, Yanlong Wang, Zhuoyao Wang, Yingbin Zheng, Cheng Jin
Deep networks have shown impressive performance in the image restoration tasks, such as image colorization.
1 code implementation • 26 Jul 2024 • Jiabo Ma, Zhengrui Guo, Fengtao Zhou, Yihui Wang, Yingxue Xu, Jinbang Li, Fang Yan, Yu Cai, Zhengjie ZHU, Cheng Jin, Yi Lin, Xinrui Jiang, Chenglong Zhao, Danyi Li, Anjia Han, Zhenhui Li, Ronald Cheong Kin Chan, Jiguang Wang, Peng Fei, Kwang-Ting Cheng, Shaoting Zhang, Li Liang, Hao Chen
Foundation models pretrained on large-scale datasets are revolutionizing the field of computational pathology (CPath).
2 code implementations • 22 Jul 2024 • Yingxue Xu, Yihui Wang, Fengtao Zhou, Jiabo Ma, Cheng Jin, Shu Yang, Jinbang Li, Zhengyu Zhang, Chenglong Zhao, Huajun Zhou, Zhenhui Li, Huangjing Lin, Xin Wang, Jiguang Wang, Anjia Han, Ronald Cheong Kin Chan, Li Liang, Xiuming Zhang, Hao Chen
In this study, for the first time, we develop a pathology foundation model incorporating three levels of modalities: pathology slides, pathology reports, and gene expression data, which resulted in 26, 169 slide-level modality pairs from 10, 275 patients across 32 cancer types, amounting to over 116 million pathological patch images.
1 code implementation • 8 Jul 2024 • Lintao Zhang, Xiangcheng Du, LeoWu TomyEnrique, Yiqun Wang, Yingbin Zheng, Cheng Jin
Second, we introduce a skip-step sampling scheme of Denoising Diffusion Implicit Models (DDIM) for the denoising process.
1 code implementation • 3 Jul 2024 • Yiqun Wang, Zhao Zhou, Xiangcheng Du, Xingjiao Wu, Yingbin Zheng, Cheng Jin
In this paper, we present a new multi-modal feature fusion approach named MAA (Modality-Agnostic Adapter), trying to make the model learn the importance of different modalities in different cases adaptively, without giving a prior setting in the model architecture.
1 code implementation • 11 Jun 2024 • Jiawei Zhang, Jiaxin Zhuang, Cheng Jin, Gen Li, Yuantao Gu
The proposed algorithm, termed ProjDiff, effectively harnesses the prior information and the denoising capability of a pre-trained diffusion model within the optimization framework.
1 code implementation • CVPR 2025 • Yibin Wang, Weizhong Zhang, Cheng Jin
Our key idea is to reconstruct the diffusion training process, introducing more refined guidance tailored to this task, to expose and rectify the model's attention at the character level and strengthen its learning of text regions.
no code implementations • 7 May 2024 • Zhibo Zhang, Ximing Yang, Weizhong Zhang, Cheng Jin
Cross-modal knowledge transfer enhances point cloud representation learning in LiDAR semantic segmentation.
no code implementations • 25 Apr 2024 • Zhibo Zhang, Ximing Yang, Weizhong Zhang, Cheng Jin
We apply this robust fine-tuning method to mainstream 3D point cloud pre-trained models and evaluate the quality of model parameters and the degradation of downstream task performance.
1 code implementation • 20 Mar 2024 • LeoWu TomyEnrique, Xiangcheng Du, Kangliang Liu, Han Yuan, Zhao Zhou, Cheng Jin
Scene text image super-resolution has significantly improved the accuracy of scene text recognition.
1 code implementation • 8 Mar 2024 • Yibin Wang, Weizhong Zhang, Jianwei Zheng, Cheng Jin
This prior information is encoded into the attention weights, which are then integrated into the self-attention layers of the generator to guide the synthesis process.
no code implementations • 28 Feb 2024 • Weilin Wan, Weizhong Zhang, Quan Zhou, Fan Yi, Cheng Jin
Our neural activation prior is based on a key observation that, for a channel before the global pooling layer of a fully trained neural network, the probability of a few neurons being activated with a large response by an in-distribution (ID) sample is significantly higher than that by an OOD sample.
no code implementations • 3 Jan 2024 • Jiawei Zhang, Yufan Chen, Cheng Jin, Lei Zhu, Yuantao Gu
Out-of-distribution (OOD) detection plays a crucial role in ensuring the security of neural networks.
no code implementations • CVPR 2024 • Yanlu Cai, Weizhong Zhang, Yuan Wu, Cheng Jin
A natural solution is to artificially synthesize some samples i. e. 2D-3D pose pairs under massive new camera settings.
1 code implementation • 19 Dec 2023 • Kaiyi Zhang, Yang Chen, Ximing Yang, Weizhong Zhang, Cheng Jin
Based on this process, we introduce SGAS, a model for part editing that employs two strategies: feature disentanglement and constraint.
2 code implementations • 9 Dec 2023 • Renao Yan, Qiehe Sun, Cheng Jin, Yiqing Liu, Yonghong He, Tian Guan, Hao Chen
While most of the conventional MIL methods use attention scores to estimate instance importance scores (IIS) which contribute to the prediction of the slide labels, these often lead to skewed attention distributions and inaccuracies in identifying crucial instances.
no code implementations • 19 Nov 2023 • Weijie Li, Yitian Wan, Xingjiao Wu, Junjie Xu, Cheng Jin, Liang He
Then, to better utilize image attributes in aesthetic assessment, we propose the Unified Multi-attribute Aesthetic Assessment Framework (UMAAF) to model both absolute and relative attributes of images.
1 code implementation • CVPR 2024 • Yibin Wang, Weizhong Zhang, Jianwei Zheng, Cheng Jin
Specifically, we first develop two specialized pre-trained diffusion models, i. e., Text-driven Diffusion Model (TDM) and Subject-augmented Diffusion Model (SDM), for scene and person generation, respectively.
1 code implementation • 29 Oct 2023 • Anran Wu, Luwei Xiao, Xingjiao Wu, Shuwen Yang, Junjie Xu, Zisong Zhuang, Nian Xie, Cheng Jin, Liang He
Our DCQA dataset is expected to foster research on understanding visualizations in documents, especially for scenarios that require complex reasoning for charts in the visually-rich document.
no code implementations • 15 Oct 2023 • Shuwen Yang, Anran Wu, Xingjiao Wu, Luwei Xiao, Tianlong Ma, Cheng Jin, Liang He
Firstly, utilizing compressed evidence features as input to the model results in the loss of fine-grained information within the evidence.
no code implementations • 22 Sep 2023 • Zhilei Hu, Zixuan Li, Daozhu Xu, Long Bai, Cheng Jin, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng
To comprehensively understand their intrinsic semantics, in this paper, we obtain prototype representations for each type of event relation and propose a Prototype-Enhanced Matching (ProtoEM) framework for the joint extraction of multiple kinds of event relations.
no code implementations • 10 Sep 2023 • Xiaolu Wang, Cheng Jin, Hoi-To Wai, Yuantao Gu
This paper considers a type of incremental aggregated gradient (IAG) method for large-scale distributed optimization.
1 code implementation • 27 Jul 2023 • Lingdong Kong, Yaru Niu, Shaoyuan Xie, Hanjiang Hu, Lai Xing Ng, Benoit R. Cottereau, Liangjun Zhang, Hesheng Wang, Wei Tsang Ooi, Ruijie Zhu, Ziyang Song, Li Liu, Tianzhu Zhang, Jun Yu, Mohan Jing, Pengwei Li, Xiaohua Qi, Cheng Jin, Yingfeng Chen, Jie Hou, Jie Zhang, Zhen Kan, Qiang Ling, Liang Peng, Minglei Li, Di Xu, Changpeng Yang, Yuanqi Yao, Gang Wu, Jian Kuai, Xianming Liu, Junjun Jiang, Jiamian Huang, Baojun Li, Jiale Chen, Shuang Zhang, Sun Ao, Zhenyu Li, Runze Chen, Haiyong Luo, Fang Zhao, Jingze Yu
In this paper, we summarize the winning solutions from the RoboDepth Challenge -- an academic competition designed to facilitate and advance robust OoD depth estimation.
1 code implementation • 14 Apr 2023 • Shishi Xiao, Yihan Hou, Cheng Jin, Wei Zeng
Retrieving charts from a large corpus is a fundamental task that can benefit numerous applications such as visualization recommendations. The retrieved results are expected to conform to both explicit visual attributes (e. g., chart type, colormap) and implicit user intents (e. g., design style, context information) that vary upon application scenarios.
1 code implementation • 13 Apr 2023 • Kangliang Liu, Xiangcheng Du, Sijie Liu, Yingbin Zheng, Xingjiao Wu, Cheng Jin
Transformer is beneficial for image denoising tasks since it can model long-range dependencies to overcome the limitations presented by inductive convolutional biases.
no code implementations • 22 Mar 2023 • Cheng Jin, Zhengrui Guo, Yi Lin, Luyang Luo, Hao Chen
Deep learning has significantly advanced medical imaging analysis (MIA), achieving state-of-the-art performance across diverse clinical tasks.
no code implementations • 25 Nov 2022 • Zhao Zhou, Xiangcheng Du, Yingbin Zheng, Cheng Jin
We present the Aggregated Text TRansformer(ATTR), which is designed to represent texts in scene images with a multi-scale self-attention mechanism.
no code implementations • 23 Jul 2022 • Xiangcheng Du, Zhao Zhou, Yingbin Zheng, Xingjiao Wu, Tianlong Ma, Cheng Jin
Scene text erasing seeks to erase text contents from scene images and current state-of-the-art text erasing models are trained on large-scale synthetic data.
no code implementations • 30 Mar 2022 • Cheng Jin, Rui-Jie Zhu, Xiao Wu, Liang-Jian Deng
Spiking Neural Networks (SNNs) have piqued researchers' interest because of their capacity to process temporal information and low power consumption.
no code implementations • 6 Feb 2022 • Kaiyi Zhang, Ximing Yang, Yuan Wu, Cheng Jin
Besides, the missing patterns are diverse in reality, but existing methods can only handle fixed ones, which means a poor generalization ability.
no code implementations • 24 Jan 2022 • Xingjiao Wu, Luwei Xiao, Xiangcheng Du, Yingbin Zheng, Xin Li, Tianlong Ma, Cheng Jin, Liang He
Our framework is an unsupervised document layout analysis framework.
no code implementations • 13 Dec 2021 • Ximing Yang, Zhibo Zhang, Zhengfu He, Cheng Jin
As details are missing in most representations of structures, the lack of controllability to more information is one of the major weaknesses in structure-based controllable point cloud generation.
1 code implementation • 10 Dec 2021 • Kaiyi Zhang, Ximing Yang, Yuan Wu, Cheng Jin
The points generated by AXform do not have the strong 2-manifold constraint, which improves the generation of non-smooth surfaces.
1 code implementation • 5 Dec 2021 • Jingwen Ye, Yining Mao, Jie Song, Xinchao Wang, Cheng Jin, Mingli Song
In other words, all users may employ a model in SDB for inference, but only authorized users get access to KD from the model.
no code implementations • 27 Nov 2021 • Tianlong Ma, Xingjiao Wu, Xin Li, Xiangcheng Du, Zhao Zhou, Liang Xue, Cheng Jin
To measure the proposed image layer modeling method, we propose a manually-labeled non-Manhattan layout fine-grained segmentation dataset named FPD.
1 code implementation • 4 Dec 2020 • Zachary J. Lee, George Lee, Ted Lee, Cheng Jin, Rand Lee, Zhi Low, Daniel Chang, Christine Ortega, Steven H. Low
We describe the architecture and algorithms of the Adaptive Charging Network (ACN), which was first deployed on the Caltech campus in early 2016 and is currently operating at over 100 other sites in the United States.
no code implementations • NeurIPS 2020 • Zunlei Feng, Yongming He, Xinchao Wang, Xin Gao, Jie Lei, Cheng Jin, Mingli Song
In this paper, we introduce the One-sample Guided Object Representation Disassembling (One-GORD) method, which only requires one annotated sample for each object category to learn disassembled object representation from unannotated images.
no code implementations • 19 Oct 2020 • Zhanwei Xu, Yukun Cao, Cheng Jin, Guozhu Shao, Xiaoqing Liu, Jie zhou, Heshui Shi, Jianjiang Feng
Segmentation of infected areas in chest CT volumes is of great significance for further diagnosis and treatment of COVID-19 patients.
no code implementations • 14 Nov 2017 • Gongze Cao, Yezhou Yang, Jie Lei, Cheng Jin, Yang Liu, Mingli Song
As an effective way of metric learning, triplet loss has been widely used in many deep learning tasks, including face recognition and person-ReID, leading to many states of the arts.