Search Results for author: Lihua Zhang

Found 41 papers, 14 papers with code

2DGS-Avatar: Animatable High-fidelity Clothed Avatar via 2D Gaussian Splatting

no code implementations4 Mar 2025 Qipeng Yan, Mingyang Sun, Lihua Zhang

To address these problems, we propose 2DGS-Avatar, a novel approach based on 2D Gaussian Splatting (2DGS) for modeling animatable clothed avatars with high-fidelity and fast training performance.

3DGS NeRF

BloomScene: Lightweight Structured 3D Gaussian Splatting for Crossmodal Scene Generation

1 code implementation15 Jan 2025 Xiaolu Hou, Mingcheng Li, Dingkang Yang, Jiawei Chen, Ziyun Qian, Xiao Zhao, Yue Jiang, Jinjie Wei, Qingyao Xu, Lihua Zhang

To this end, we propose BloomScene, a lightweight structured 3D Gaussian splatting for crossmodal scene generation, which creates diverse and high-quality 3D scenes from text or image inputs.

Point cloud reconstruction Scene Generation

Toward Robust Incomplete Multimodal Sentiment Analysis via Hierarchical Representation Learning

no code implementations5 Nov 2024 Mingcheng Li, Dingkang Yang, Yang Liu, Shunli Wang, Jiawei Chen, Shuaibing Wang, Jinjie Wei, Yue Jiang, Qingyao Xu, Xiaolu Hou, Mingyang Sun, Ziyun Qian, Dongliang Kou, Lihua Zhang

Specifically, we propose a fine-grained representation factorization module that sufficiently extracts valuable sentiment information by factorizing modality into sentiment-relevant and modality-specific representations through crossmodal translation and sentiment semantic reconstruction.

Multimodal Sentiment Analysis Representation Learning

Role Play: Learning Adaptive Role-Specific Strategies in Multi-Agent Interactions

no code implementations2 Nov 2024 Weifan Long, Wen Wen, Peng Zhai, Lihua Zhang

It trains a common policy with role embedding observations and employs a role predictor to estimate the joint role embeddings of other agents, helping the learning agent adapt to its assigned role.

Diversity Multi-agent Reinforcement Learning +1

MaskBEV: Towards A Unified Framework for BEV Detection and Map Segmentation

no code implementations17 Aug 2024 Xiao Zhao, Xukun Zhang, Dingkang Yang, Mingyang Sun, Mingcheng Li, Shunli Wang, Lihua Zhang

However, current multimodal perception research follows independent paradigms designed for specific perception tasks, leading to a lack of complementary learning among tasks and decreased performance in multi-task learning (MTL) due to joint training.

3D Object Detection Autonomous Driving +6

HybridOcc: NeRF Enhanced Transformer-based Multi-Camera 3D Occupancy Prediction

no code implementations17 Aug 2024 Xiao Zhao, Bo Chen, Mingyang Sun, Dingkang Yang, Youxing Wang, Xukun Zhang, Mingcheng Li, Dongliang Kou, Xiaoyi Wei, Lihua Zhang

This paper proposes HybridOcc, a hybrid 3D volume query proposal method generated by Transformer framework and NeRF representation and refined in a coarse-to-fine SSC prediction framework.

3D geometry 3D Semantic Scene Completion +2

Faster Diffusion Action Segmentation

no code implementations4 Aug 2024 Shuaibing Wang, Shunli Wang, Mingcheng Li, Dingkang Yang, Haopeng Kuang, Ziyun Qian, Lihua Zhang

However, the heavy sampling steps required by diffusion models pose a substantial computational burden, limiting their practicality in real-time applications.

Action Segmentation Computational Efficiency +2

Large Vision-Language Models as Emotion Recognizers in Context Awareness

no code implementations16 Jul 2024 Yuxuan Lei, Dingkang Yang, Zhaoyu Chen, Jiawei Chen, Peng Zhai, Lihua Zhang

Extensive experiments and analyses demonstrate that LVLMs achieve competitive performance in the CAER task across different paradigms.

Emotion Recognition In-Context Learning

Towards Context-Aware Emotion Recognition Debiasing from a Causal Demystification Perspective via De-confounded Training

no code implementations6 Jul 2024 Dingkang Yang, Kun Yang, Haopeng Kuang, Zhaoyu Chen, Yuzheng Wang, Lihua Zhang

To address the issue, we embrace causal inference to disentangle the models from the impact of such bias, and formulate the causalities among variables in the CAER task via a customized causal graph.

Causal Inference Emotion Recognition +2

Asynchronous Multimodal Video Sequence Fusion via Learning Modality-Exclusive and -Agnostic Representations

no code implementations6 Jul 2024 Dingkang Yang, Mingcheng Li, Linhao Qu, Kun Yang, Peng Zhai, Song Wang, Lihua Zhang

To tackle these issues, we propose a Multimodal fusion approach for learning modality-Exclusive and modality-Agnostic representations (MEA) to refine multimodal features and leverage the complementarity across distinct modalities.

Skip and Skip: Segmenting Medical Images with Prompts

no code implementations21 Jun 2024 Jiawei Chen, Dingkang Yang, Yuxuan Lei, Lihua Zhang

Most medical image lesion segmentation methods rely on hand-crafted accurate annotations of the original image for supervised learning.

Diagnostic Lesion Segmentation

CoMT: Chain-of-Medical-Thought Reduces Hallucination in Medical Report Generation

no code implementations17 Jun 2024 Yue Jiang, Jiawei Chen, Dingkang Yang, Mingcheng Li, Shunli Wang, Tong Wu, Ke Li, Lihua Zhang

Automatic medical report generation (MRG), which possesses significant research value as it can aid radiologists in clinical diagnosis and report composition, has garnered increasing attention.

Diagnostic Hallucination +1

Detecting and Evaluating Medical Hallucinations in Large Vision Language Models

no code implementations14 Jun 2024 Jiawei Chen, Dingkang Yang, Tong Wu, Yue Jiang, Xiaolu Hou, Mingcheng Li, Shunli Wang, Dongling Xiao, Ke Li, Lihua Zhang

To bridge this gap, we introduce Med-HallMark, the first benchmark specifically designed for hallucination detection and evaluation within the medical multimodal domain.

Hallucination Medical Visual Question Answering +2

PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications

1 code implementation29 May 2024 Dingkang Yang, Jinjie Wei, Dongling Xiao, Shunli Wang, Tong Wu, Gang Li, Mingcheng Li, Shuaibing Wang, Jiawei Chen, Yue Jiang, Qingyao Xu, Ke Li, Peng Zhai, Lihua Zhang

In the parameter-efficient secondary SFT phase, a mixture of universal-specific experts strategy is presented to resolve the competency conflict between medical generalist and pediatric expertise mastery.

Diagnostic Domain Adaptation

SMCD: High Realism Motion Style Transfer via Mamba-based Diffusion

no code implementations5 May 2024 Ziyun Qian, Zeyu Xiao, Zhenyi Wu, Dingkang Yang, Mingcheng Li, Shunli Wang, Shuaibing Wang, Dongliang Kou, Lihua Zhang

To address these problems, we consider style motion as a condition and propose the Style Motion Conditioned Diffusion (SMCD) framework for the first time, which can more comprehensively learn the style features of motion.

Mamba Motion Style Transfer +1

Multi-Scale Heterogeneity-Aware Hypergraph Representation for Histopathology Whole Slide Images

1 code implementation30 Apr 2024 Minghao Han, Xukun Zhang, Dingkang Yang, Tao Liu, Haopeng Kuang, Jinghui Feng, Lihua Zhang

Survival prediction is a complex ordinal regression task that aims to predict the survival coefficient ranking among a cohort of patients, typically achieved by analyzing patients' whole slide images.

Multiple Instance Learning Survival Prediction +1

Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities

no code implementations CVPR 2024 Mingcheng Li, Dingkang Yang, Xiao Zhao, Shuaibing Wang, Yan Wang, Kun Yang, Mingyang Sun, Dongliang Kou, Ziyun Qian, Lihua Zhang

Specifically, we present a sample-level contrastive distillation mechanism that transfers comprehensive knowledge containing cross-sample correlations to reconstruct missing semantics.

Disentanglement Knowledge Distillation +1

Efficiency in Focus: LayerNorm as a Catalyst for Fine-tuning Medical Visual Language Pre-trained Models

no code implementations25 Apr 2024 Jiawei Chen, Dingkang Yang, Yue Jiang, Mingcheng Li, Jinjie Wei, Xiaolu Hou, Lihua Zhang

In the realm of Medical Visual Language Models (Med-VLMs), the quest for universal efficient fine-tuning mechanisms remains paramount, especially given researchers in interdisciplinary fields are often extremely short of training resources, yet largely unexplored.

Medical Visual Question Answering parameter-efficient fine-tuning +2

De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts

no code implementations CVPR 2024 Yuzheng Wang, Dingkang Yang, Zhaoyu Chen, Yang Liu, Siao Liu, Wenqiang Zhang, Lihua Zhang, Lizhe Qi

Data-Free Knowledge Distillation (DFKD) is a promising task to train high-performance small models to enhance actual deployment without relying on the original training data.

Causal Inference Data-free Knowledge Distillation

Can LLMs' Tuning Methods Work in Medical Multimodal Domain?

2 code implementations11 Mar 2024 Jiawei Chen, Yue Jiang, Dingkang Yang, Mingcheng Li, Jinjie Wei, Ziyun Qian, Lihua Zhang

In this paper, we delve into the fine-tuning methods of LLMs and conduct extensive experiments to investigate the impact of fine-tuning methods for large models on the existing multimodal model in the medical domain from the training data level and the model structure level.

Transfer Learning World Knowledge

Robust Emotion Recognition in Context Debiasing

no code implementations CVPR 2024 Dingkang Yang, Kun Yang, Mingcheng Li, Shunli Wang, Shuaibing Wang, Lihua Zhang

Following the causal graph, CLEF introduces a non-invasive context branch to capture the adverse direct effect caused by the context bias.

counterfactual Emotion Recognition in Context

Debiased Multimodal Understanding for Human Language Sequences

no code implementations8 Mar 2024 Zhi Xu, Dingkang Yang, Mingcheng Li, Yuzheng Wang, Zhaoyu Chen, Jiawei Chen, Jinjie Wei, Lihua Zhang

Human multimodal language understanding (MLU) is an indispensable component of expression analysis (e. g., sentiment or humor) from heterogeneous modalities, including visual postures, linguistic contents, and acoustic behaviours.

Towards Multimodal Sentiment Analysis Debiasing via Bias Purification

no code implementations8 Mar 2024 Dingkang Yang, Mingcheng Li, Dongling Xiao, Yang Liu, Kun Yang, Zhaoyu Chen, Yuzheng Wang, Peng Zhai, Ke Li, Lihua Zhang

In the inference phase, given a factual multimodal input, MCIS imagines two counterfactual scenarios to purify and mitigate these biases.

counterfactual Counterfactual Inference +1

MISS: A Generative Pretraining and Finetuning Approach for Med-VQA

1 code implementation10 Jan 2024 Jiawei Chen, Dingkang Yang, Yue Jiang, Yuxuan Lei, Lihua Zhang

Medical visual question answering (VQA) is a challenging multimodal task, where Vision-Language Pre-training (VLP) models can effectively improve the generalization performance.

Medical Visual Question Answering Multi-Task Learning +3

Human 3D Avatar Modeling with Implicit Neural Representation: A Brief Survey

no code implementations6 Jun 2023 Mingyang Sun, Dingkang Yang, Dongliang Kou, Yang Jiang, Weihua Shan, Zhe Yan, Lihua Zhang

This paper comprehensively reviews the application of implicit neural representation in human body modeling.

NeRF

Context De-confounded Emotion Recognition

1 code implementation CVPR 2023 Dingkang Yang, Zhaoyu Chen, Yuzheng Wang, Shunli Wang, Mingcheng Li, Siao Liu, Xiao Zhao, Shuai Huang, Zhiyan Dong, Peng Zhai, Lihua Zhang

However, a long-overlooked issue is that a context bias in existing datasets leads to a significantly unbalanced distribution of emotional states among different context scenarios.

Emotion Recognition

Towards Simultaneous Segmentation of Liver Tumors and Intrahepatic Vessels via Cross-attention Mechanism

no code implementations20 Feb 2023 Haopeng Kuang, Dingkang Yang, Shunli Wang, Xiaoying Wang, Lihua Zhang

Accurate visualization of liver tumors and their surrounding blood vessels is essential for noninvasive diagnosis and prognosis prediction of tumors.

Decoder Image Segmentation +4

Achieving a Given Financial Goal with Optimal Deferred Term Insurance Purchasing Policy

no code implementations9 Dec 2022 Yuqi Li, Lihua Zhang

It is worth noting that when m=0, n \rightarrow \infty, our problem is equivalent to achieving the just mentioned bequest goal by purchasing whole life insurance, at which point the maximum probability and the life insurance purchasing strategies we provide are consistent with those in \cite{Bayraktar2014, Bayraktar2016}.

CA-SpaceNet: Counterfactual Analysis for 6D Pose Estimation in Space

1 code implementation16 Jul 2022 Shunli Wang, Shuaibing Wang, Bo Jiao, Dingkang Yang, Liuzhen Su, Peng Zhai, Chixiao Chen, Lihua Zhang

Considering that the pose estimator is sensitive to background interference, this paper proposes a counterfactual analysis framework named CASpaceNet to complete robust 6D pose estimation of the spaceborne targets under complicated background.

6D Pose Estimation Causal Inference +2

TSA-Net: Tube Self-Attention Network for Action Quality Assessment

2 code implementations11 Jan 2022 Shunli Wang, Dingkang Yang, Peng Zhai, Chixiao Chen, Lihua Zhang

Specifically, we introduce a single object tracker into AQA and propose the Tube Self-Attention Module (TSA), which can efficiently generate rich spatio-temporal contextual information by adopting sparse feature interactions.

Action Assessment Action Quality Assessment +2

Sample Efficient Imitation Learning via Reward Function Trained in Advance

no code implementations23 Nov 2021 Lihua Zhang

Our method, which we call \textit{Model Reward Function Based Imitation Learning} (MRFIL), uses an ensemble dynamic model as a reward function, what is trained with expert demonstrations.

Imitation Learning

A Unified Joint Matrix Factorization Framework for Data Integration

1 code implementation25 Jul 2017 Lihua Zhang, Shihua Zhang

In this paper, we introduce a sparse multiple relationship data regularized joint matrix factorization (JMF) framework and two adapted prediction models for pattern recognition and data integration.

Data Integration

Cannot find the paper you are looking for? You can Submit a new open access paper.