Search Results for author: Peng Xia

Found 18 papers, 13 papers with code

ChemMLLM: Chemical Multimodal Large Language Model

1 code implementation22 May 2025 Qian Tan, Dongzhan Zhou, Peng Xia, Wanhao Liu, Wanli Ouyang, Lei Bai, Yuqiang Li, Tianfan Fu

To fill this gap, in this paper, we propose ChemMLLM, a unified chemical multimodal large language model for molecule understanding and generation.

Language Modeling Language Modelling +3

MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding

1 code implementation18 Mar 2025 Siwei Han, Peng Xia, Ruiyi Zhang, Tong Sun, Yun Li, Hongtu Zhu, Huaxiu Yao

These agents engage in multi-modal context retrieval, combining their individual insights to achieve a more comprehensive understanding of the document's content.

document understanding Question Answering +3

MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization

1 code implementation9 Dec 2024 Kangyu Zhu, Peng Xia, Yun Li, Hongtu Zhu, Sheng Wang, Huaxiu Yao

Previous attempts to enhance modality alignment in Med-LVLMs through preference optimization have inadequately mitigated clinical relevance in preference data, making these samples easily distinguishable and reducing alignment effectiveness.

Visual Question Answering (VQA)

MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models

1 code implementation16 Oct 2024 Peng Xia, Kangyu Zhu, Haoran Li, Tianze Wang, Weijia Shi, Sheng Wang, Linjun Zhang, James Zou, Huaxiu Yao

Artificial Intelligence (AI) has demonstrated significant potential in healthcare, particularly in disease diagnosis and treatment planning.

Diagnostic Hallucination +4

MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models

1 code implementation14 Oct 2024 Peng Xia, Siwei Han, Shi Qiu, Yiyang Zhou, Zhaoyang Wang, Wenhao Zheng, Zhaorun Chen, Chenhang Cui, Mingyu Ding, Linjie Li, Lijuan Wang, Huaxiu Yao

Extensive experiments demonstrate the effectiveness of our benchmark and metrics in providing a comprehensive evaluation of interleaved LVLMs.

Multiple-choice

RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models

1 code implementation6 Jul 2024 Peng Xia, Kangyu Zhu, Haoran Li, Hongtu Zhu, Yun Li, Gang Li, Linjun Zhang, Huaxiu Yao

Second, in cases where the model originally responds correctly, applying RAG can lead to an over-reliance on retrieved contexts, resulting in incorrect answers.

Medical Diagnosis RAG +3

TP-DRSeg: Improving Diabetic Retinopathy Lesion Segmentation with Explicit Text-Prompts Assisted SAM

1 code implementation22 Jun 2024 Wenxue Li, Xinyu Xiong, Peng Xia, Lie Ju, ZongYuan Ge

Furthermore, we design a prior-aligned injector to inject explicit priors into the segmentation process, which can facilitate knowledge sharing across multi-modality features and allow our framework to be trained in a parameter-efficient fashion.

Lesion Segmentation Medical Image Analysis +1

CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

1 code implementation10 Jun 2024 Peng Xia, Ze Chen, Juanxi Tian, Yangrui Gong, Ruibo Hou, Yue Xu, Zhenbang Wu, Zhiyuan Fan, Yiyang Zhou, Kangyu Zhu, Wenhao Zheng, Zhaoyang Wang, Xiao Wang, Xuchao Zhang, Chetan Bansal, Marc Niethammer, Junzhou Huang, Hongtu Zhu, Yun Li, Jimeng Sun, ZongYuan Ge, Gang Li, James Zou, Huaxiu Yao

Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare.

Fairness

Generalizing to Unseen Domains in Diabetic Retinopathy with Disentangled Representations

1 code implementation10 Jun 2024 Peng Xia, Ming Hu, Feilong Tang, Wenxue Li, Wenhao Zheng, Lie Ju, Peibo Duan, Huaxiu Yao, ZongYuan Ge

Subsequently, to improve the robustness of the decoupled representations, class and domain prototypes are employed to interpolate the disentangled representations while data-aware weights are designed to focus on rare classes and domains.

Diagnostic

Diffusion Model Driven Test-Time Image Adaptation for Robust Skin Lesion Classification

no code implementations18 May 2024 Ming Hu, Siyuan Yan, Peng Xia, Feilong Tang, Wenxue Li, Peibo Duan, Lin Zhang, ZongYuan Ge

In this paper, we propose a test-time image adaptation method to enhance the accuracy of the model on test data by simultaneously updating and predicting test images.

Diagnostic Lesion Classification +1

LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition

1 code implementation8 May 2023 Peng Xia, Di Xu, Ming Hu, Lie Ju, ZongYuan Ge

Long-tailed multi-label visual recognition (LTML) task is a highly challenging task due to the label co-occurrence and imbalanced data distribution.

 Ranked #1 on Long-tail Learning on COCO-MLT (using extra training data)

Long-tail Learning

Chinese grammatical error correction based on knowledge distillation

2 code implementations31 Jul 2022 Peng Xia, Yuechi Zhou, Ziyan Zhang, Zecheng Tang, Juntao Li

In view of the poor robustness of existing Chinese grammatical error correction models on attack test sets and large model parameters, this paper uses the method of knowledge distillation to compress model parameters and improve the anti-attack ability of the model.

Grammatical Error Correction Knowledge Distillation

Latency-Aware Neural Architecture Search with Multi-Objective Bayesian Optimization

no code implementations ICML Workshop AutoML 2021 David Eriksson, Pierce I-Jen Chuang, Samuel Daulton, Peng Xia, Akshat Shrivastava, Arun Babu, Shicong Zhao, Ahmed Aly, Ganesh Venkatesh, Maximilian Balandat

When tuning the architecture and hyperparameters of large machine learning models for on-device deployment, it is desirable to understand the optimal trade-offs between on-device latency and model accuracy.

Bayesian Optimization Natural Language Understanding +1

Cannot find the paper you are looking for? You can Submit a new open access paper.