1 code implementation • 1 Dec 2024 • Minghao Han, Dingkang Yang, Jiabei Cheng, Xukun Zhang, Linhao Qu, Zizhi Chen, Lihua Zhang
Recent advancements in multimodal pre-training models have significantly advanced computational pathology.
no code implementations • 5 Nov 2024 • Mingcheng Li, Dingkang Yang, Yang Liu, Shunli Wang, Jiawei Chen, Shuaibing Wang, Jinjie Wei, Yue Jiang, Qingyao Xu, Xiaolu Hou, Mingyang Sun, Ziyun Qian, Dongliang Kou, Lihua Zhang
Specifically, we propose a fine-grained representation factorization module that sufficiently extracts valuable sentiment information by factorizing modality into sentiment-relevant and modality-specific representations through crossmodal translation and sentiment semantic reconstruction.
no code implementations • 16 Oct 2024 • Jinjie Wei, Dingkang Yang, Yanshu Li, Qingyao Xu, Zhaoyu Chen, Mingcheng Li, Yue Jiang, Xiaolu Hou, Lihua Zhang
Large Language Model (LLM)-driven interactive systems currently show potential promise in healthcare domains.
no code implementations • 22 Aug 2024 • Dingkang Yang, Dongling Xiao, Jinjie Wei, Mingcheng Li, Zhaoyu Chen, Ke Li, Lihua Zhang
In this paper, we propose a Comparator-driven Decoding-Time (CDT) framework to alleviate the response hallucination.
1 code implementation • 21 Aug 2024 • Minghao Han, Linhao Qu, Dingkang Yang, Xukun Zhang, Xiaoying Wang, Lihua Zhang
However, this paradigm relies on the use of a large number of labelled WSIs for training.
no code implementations • 17 Aug 2024 • Xiao Zhao, Bo Chen, Mingyang Sun, Dingkang Yang, Youxing Wang, Xukun Zhang, Mingcheng Li, Dongliang Kou, Xiaoyi Wei, Lihua Zhang
This paper proposes HybridOcc, a hybrid 3D volume query proposal method generated by Transformer framework and NeRF representation and refined in a coarse-to-fine SSC prediction framework.
no code implementations • 17 Aug 2024 • Xiao Zhao, Xukun Zhang, Dingkang Yang, Mingyang Sun, Mingcheng Li, Shunli Wang, Lihua Zhang
However, current multimodal perception research follows independent paradigms designed for specific perception tasks, leading to a lack of complementary learning among tasks and decreased performance in multi-task learning (MTL) due to joint training.
no code implementations • 4 Aug 2024 • Shuaibing Wang, Shunli Wang, Mingcheng Li, Dingkang Yang, Haopeng Kuang, Ziyun Qian, Lihua Zhang
However, the heavy sampling steps required by diffusion models pose a substantial computational burden, limiting their practicality in real-time applications.
no code implementations • 16 Jul 2024 • Yuxuan Lei, Dingkang Yang, Zhaoyu Chen, Jiawei Chen, Peng Zhai, Lihua Zhang
Extensive experiments and analyses demonstrate that LVLMs achieve competitive performance in the CAER task across different paradigms.
no code implementations • 15 Jul 2024 • Linhao Qu, Dingkang Yang, Dan Huang, Qinhao Guo, Rongkui Luo, Shaoting Zhang, Xiaosong Wang
Prompt learning based on the pre-trained models (\eg, CLIP) appears to be a promising scheme for this setting; however, current research in this area is limited, and existing algorithms often focus solely on patch-level prompts or confine themselves to language prompts.
no code implementations • 6 Jul 2024 • Dingkang Yang, Mingcheng Li, Linhao Qu, Kun Yang, Peng Zhai, Song Wang, Lihua Zhang
To tackle these issues, we propose a Multimodal fusion approach for learning modality-Exclusive and modality-Agnostic representations (MEA) to refine multimodal features and leverage the complementarity across distinct modalities.
no code implementations • 6 Jul 2024 • Dingkang Yang, Kun Yang, Haopeng Kuang, Zhaoyu Chen, Yuzheng Wang, Lihua Zhang
To address the issue, we embrace causal inference to disentangle the models from the impact of such bias, and formulate the causalities among variables in the CAER task via a customized causal graph.
no code implementations • 2 Jul 2024 • Yuzheng Wang, Zhaoyu Chen, Dingkang Yang, Yunquan Sun, Lizhe Qi
Existing works focus on instance-level or class-level knowledge representation and build a shared representation space to achieve performance improvements.
no code implementations • 21 Jun 2024 • Jiawei Chen, Dingkang Yang, Yuxuan Lei, Lihua Zhang
Most medical image lesion segmentation methods rely on hand-crafted accurate annotations of the original image for supervised learning.
no code implementations • 17 Jun 2024 • Yue Jiang, Jiawei Chen, Dingkang Yang, Mingcheng Li, Shunli Wang, Tong Wu, Ke Li, Lihua Zhang
Automatic medical report generation (MRG), which possesses significant research value as it can aid radiologists in clinical diagnosis and report composition, has garnered increasing attention.
no code implementations • 14 Jun 2024 • Jiawei Chen, Dingkang Yang, Tong Wu, Yue Jiang, Xiaolu Hou, Mingcheng Li, Shunli Wang, Dongling Xiao, Ke Li, Lihua Zhang
To bridge this gap, we introduce Med-HallMark, the first benchmark specifically designed for hallucination detection and evaluation within the medical multimodal domain.
1 code implementation • 29 May 2024 • Dingkang Yang, Jinjie Wei, Dongling Xiao, Shunli Wang, Tong Wu, Gang Li, Mingcheng Li, Shuaibing Wang, Jiawei Chen, Yue Jiang, Qingyao Xu, Ke Li, Peng Zhai, Lihua Zhang
In the parameter-efficient secondary SFT phase, a mixture of universal-specific experts strategy is presented to resolve the competency conflict between medical generalist and pediatric expertise mastery.
no code implementations • 5 May 2024 • Ziyun Qian, Zeyu Xiao, Zhenyi Wu, Dingkang Yang, Mingcheng Li, Shunli Wang, Shuaibing Wang, Dongliang Kou, Lihua Zhang
To address these problems, we consider style motion as a condition and propose the Style Motion Conditioned Diffusion (SMCD) framework for the first time, which can more comprehensively learn the style features of motion.
1 code implementation • 30 Apr 2024 • Minghao Han, Xukun Zhang, Dingkang Yang, Tao Liu, Haopeng Kuang, Jinghui Feng, Lihua Zhang
Survival prediction is a complex ordinal regression task that aims to predict the survival coefficient ranking among a cohort of patients, typically achieved by analyzing patients' whole slide images.
no code implementations • 25 Apr 2024 • Jiawei Chen, Dingkang Yang, Yue Jiang, Mingcheng Li, Jinjie Wei, Xiaolu Hou, Lihua Zhang
In the realm of Medical Visual Language Models (Med-VLMs), the quest for universal efficient fine-tuning mechanisms remains paramount, especially given researchers in interdisciplinary fields are often extremely short of training resources, yet largely unexplored.
Medical Visual Question Answering parameter-efficient fine-tuning +2
no code implementations • CVPR 2024 • Mingcheng Li, Dingkang Yang, Xiao Zhao, Shuaibing Wang, Yan Wang, Kun Yang, Mingyang Sun, Dongliang Kou, Ziyun Qian, Lihua Zhang
Specifically, we present a sample-level contrastive distillation mechanism that transfers comprehensive knowledge containing cross-sample correlations to reconstruct missing semantics.
no code implementations • CVPR 2024 • Yuzheng Wang, Dingkang Yang, Zhaoyu Chen, Yang Liu, Siao Liu, Wenqiang Zhang, Lihua Zhang, Lizhe Qi
Data-Free Knowledge Distillation (DFKD) is a promising task to train high-performance small models to enhance actual deployment without relying on the original training data.
2 code implementations • 11 Mar 2024 • Jiawei Chen, Yue Jiang, Dingkang Yang, Mingcheng Li, Jinjie Wei, Ziyun Qian, Lihua Zhang
In this paper, we delve into the fine-tuning methods of LLMs and conduct extensive experiments to investigate the impact of fine-tuning methods for large models on the existing multimodal model in the medical domain from the training data level and the model structure level.
no code implementations • CVPR 2024 • Dingkang Yang, Kun Yang, Mingcheng Li, Shunli Wang, Shuaibing Wang, Lihua Zhang
Following the causal graph, CLEF introduces a non-invasive context branch to capture the adverse direct effect caused by the context bias.
no code implementations • 8 Mar 2024 • Zhi Xu, Dingkang Yang, Mingcheng Li, Yuzheng Wang, Zhaoyu Chen, Jiawei Chen, Jinjie Wei, Lihua Zhang
Multimodal intention understanding (MIU) is an indispensable component of human expression analysis (e. g., sentiment or humor) from heterogeneous modalities, including visual postures, linguistic contents, and acoustic behaviors.
no code implementations • 8 Mar 2024 • Dingkang Yang, Mingcheng Li, Dongling Xiao, Yang Liu, Kun Yang, Zhaoyu Chen, Yuzheng Wang, Peng Zhai, Ke Li, Lihua Zhang
In the inference phase, given a factual multimodal input, MCIS imagines two counterfactual scenarios to purify and mitigate these biases.
1 code implementation • 27 Feb 2024 • Shuaibing Wang, Shunli Wang, Dingkang Yang, Mingcheng Li, Ziyun Qian, Liuzhen Su, Lihua Zhang
KGC extracts hand prior information from 2D hand pose by graph convolution.
no code implementations • 2 Feb 2024 • Zhaoyu Chen, Zhengyang Shan, Jingwen Chang, Kaixun Jiang, Dingkang Yang, Yiting Cheng, Wenqiang Zhang
We conduct adversarial robustness evaluation on 5 models from Cityscapes and ADE20K under 8 attacks.
1 code implementation • 10 Jan 2024 • Jiawei Chen, Dingkang Yang, Yue Jiang, Yuxuan Lei, Lihua Zhang
Medical visual question answering (VQA) is a challenging multimodal task, where Vision-Language Pre-training (VLP) models can effectively improve the generalization performance.
no code implementations • CVPR 2024 • Shunli Wang, Qing Yu, Shuaibing Wang, Dingkang Yang, Liuzhen Su, Xiao Zhao, Haopeng Kuang, Peixuan Zhang, Peng Zhai, Lihua Zhang
For the first time, this paper constructs a vision-based system to complete error action recognition and skill assessment in CPR.
no code implementations • ICCV 2023 • Siao Liu, Zhaoyu Chen, Yang Liu, Yuzheng Wang, Dingkang Yang, Zhile Zhao, Ziqing Zhou, Xie Yi, Wei Li, Wenqiang Zhang, Zhongxue Gan
In particular, CG2A develops a Gradient Agreement Solver to adaptively balance the varying gradient magnitudes, and introduces a Soft Gradient Surgery strategy to alleviate the gradient conflicts.
no code implementations • 31 Jul 2023 • Yuzheng Wang, Zhaoyu Chen, Jie Zhang, Dingkang Yang, Zuhao Ge, Yang Liu, Siao Liu, Yunquan Sun, Wenqiang Zhang, Lizhe Qi
Data-Free Knowledge Distillation (DFKD) is a novel task that aims to train high-performance student models using only the pre-trained teacher network without original training data.
1 code implementation • ICCV 2023 • Kun Yang, Dingkang Yang, Jingyu Zhang, Mingcheng Li, Yang Liu, Jing Liu, Hanqi Wang, Peng Sun, Liang Song
In this paper, we propose SCOPE, a novel collaborative perception framework that aggregates the spatio-temporal awareness characteristics across on-road agents in an end-to-end manner.
1 code implementation • ICCV 2023 • Dingkang Yang, Shuai Huang, Zhi Xu, Zhenpeng Li, Shunli Wang, Mingcheng Li, Yuzheng Wang, Yang Liu, Kun Yang, Zhaoyu Chen, Yan Wang, Jing Liu, Peixuan Zhang, Peng Zhai, Lihua Zhang
Driver distraction has become a significant cause of severe traffic accidents over the past decade.
no code implementations • 6 Jun 2023 • Mingyang Sun, Dingkang Yang, Dongliang Kou, Yang Jiang, Weihua Shan, Zhe Yan, Lihua Zhang
This paper comprehensively reviews the application of implicit neural representation in human body modeling.
no code implementations • 21 Mar 2023 • Yuzheng Wang, Zhaoyu Chen, Dingkang Yang, Pinxue Guo, Kaixun Jiang, Wenqiang Zhang, Lizhe Qi
Adversarial Robustness Distillation (ARD) is a promising task to solve the issue of limited adversarial robustness of small capacity models while optimizing the expensive computational costs of Adversarial Training (AT).
no code implementations • ICCV 2023 • Kaixun Jiang, Zhaoyu Chen, Hao Huang, Jiafeng Wang, Dingkang Yang, Bo Li, Yan Wang, Wenqiang Zhang
First, STDE introduces target videos as patch textures and only adds patches on keyframes that are adaptively selected by temporal difference.
1 code implementation • CVPR 2023 • Dingkang Yang, Zhaoyu Chen, Yuzheng Wang, Shunli Wang, Mingcheng Li, Siao Liu, Xiao Zhao, Shuai Huang, Zhiyan Dong, Peng Zhai, Lihua Zhang
However, a long-overlooked issue is that a context bias in existing datasets leads to a significantly unbalanced distribution of emotional states among different context scenarios.
no code implementations • 23 Feb 2023 • Kun Yang, Jing Liu, Dingkang Yang, Hanqi Wang, Peng Sun, Yanni Zhang, Yan Liu, Liang Song
With the rapid development of intelligent transportation system applications, a tremendous amount of multi-view video data has emerged to enhance vehicle perception.
no code implementations • 20 Feb 2023 • Haopeng Kuang, Dingkang Yang, Shunli Wang, Xiaoying Wang, Lihua Zhang
Accurate visualization of liver tumors and their surrounding blood vessels is essential for noninvasive diagnosis and prognosis prediction of tumors.
no code implementations • 17 Feb 2023 • Yuzheng Wang, Zhaoyu Chen, Dingkang Yang, Yang Liu, Siao Liu, Wenqiang Zhang, Lizhe Qi
To this end, we propose a novel structured ARD method called Contrastive Relationship DeNoise Distillation (CRDND).
1 code implementation • 10 Feb 2023 • Yang Liu, Dingkang Yang, Yan Wang, Jing Liu, Jun Liu, Azzedine Boukerche, Peng Sun, Liang Song
Video Anomaly Detection (VAD) serves as a pivotal technology in the intelligent surveillance systems, enabling the temporal or spatial identification of anomalous events within videos.
2 code implementations • 21 Nov 2022 • Jiafeng Wang, Zhaoyu Chen, Kaixun Jiang, Dingkang Yang, Lingyi Hong, Pinxue Guo, Haijing Guo, Wenqiang Zhang
Particularly, when attacking advanced defense methods in the image domain, it achieves an average attack success rate of 95. 4%.
no code implementations • 27 Jul 2022 • Yang Liu, Jing Liu, Mengyang Zhao, Dingkang Yang, Xiaoguang Zhu, Liang Song
Video anomaly detection is a challenging task in the computer vision community.
1 code implementation • 16 Jul 2022 • Shunli Wang, Shuaibing Wang, Bo Jiao, Dingkang Yang, Liuzhen Su, Peng Zhai, Chixiao Chen, Lihua Zhang
Considering that the pose estimator is sensitive to background interference, this paper proposes a counterfactual analysis framework named CASpaceNet to complete robust 6D pose estimation of the spaceborne targets under complicated background.
no code implementations • 20 Apr 2022 • Shunli Wang, Dingkang Yang, Peng Zhai, Qing Yu, Tao Suo, Zhan Sun, Ka Li, Lihua Zhang
Most of the existing work focuses on sports and medical care.
2 code implementations • 11 Jan 2022 • Shunli Wang, Dingkang Yang, Peng Zhai, Chixiao Chen, Lihua Zhang
Specifically, we introduce a single object tracker into AQA and propose the Tube Self-Attention Module (TSA), which can efficiently generate rich spatio-temporal contextual information by adopting sparse feature interactions.