1 code implementation • 22 May 2025 • Qian Tan, Dongzhan Zhou, Peng Xia, Wanhao Liu, Wanli Ouyang, Lei Bai, Yuqiang Li, Tianfan Fu
To fill this gap, in this paper, we propose ChemMLLM, a unified chemical multimodal large language model for molecule understanding and generation.
no code implementations • 27 Apr 2025 • Yiyang Zhou, Zhaoyang Wang, Tianle Wang, Shangyu Xing, Peng Xia, Bo Li, Kaiyuan Zheng, Zijian Zhang, Zhaorun Chen, Wenhao Zheng, Xuchao Zhang, Chetan Bansal, Weitong Zhang, Ying WEI, Mohit Bansal, Huaxiu Yao
High-quality preference data is essential for aligning foundation models with human values through preference learning.
1 code implementation • 18 Mar 2025 • Siwei Han, Peng Xia, Ruiyi Zhang, Tong Sun, Yun Li, Hongtu Zhu, Huaxiu Yao
These agents engage in multi-modal context retrieval, combining their individual insights to achieve a more comprehensive understanding of the document's content.
no code implementations • 3 Feb 2025 • Haibo Tong, Zhaoyang Wang, Zhaorun Chen, Haonian Ji, Shi Qiu, Siwei Han, Kexin Geng, Zhongkai Xue, Yiyang Zhou, Peng Xia, Mingyu Ding, Rafael Rafailov, Chelsea Finn, Huaxiu Yao
Recent advancements in video generation have significantly improved the ability to synthesize videos from text instructions.
no code implementations • 27 Dec 2024 • Feilong Tang, Zhongxing Xu, Ming Hu, Wenxue Li, Peng Xia, Yiheng Zhong, Hanjun Wu, Jionglong Su, ZongYuan Ge
Inspired by the density-based clustering hypothesis, we propose using feature density to locate sparse regions within feature clusters.
1 code implementation • 9 Dec 2024 • Kangyu Zhu, Peng Xia, Yun Li, Hongtu Zhu, Sheng Wang, Huaxiu Yao
Previous attempts to enhance modality alignment in Med-LVLMs through preference optimization have inadequately mitigated clinical relevance in preference data, making these samples easily distinguishable and reducing alignment effectiveness.
1 code implementation • 16 Oct 2024 • Peng Xia, Kangyu Zhu, Haoran Li, Tianze Wang, Weijia Shi, Sheng Wang, Linjun Zhang, James Zou, Huaxiu Yao
Artificial Intelligence (AI) has demonstrated significant potential in healthcare, particularly in disease diagnosis and treatment planning.
1 code implementation • 14 Oct 2024 • Peng Xia, Siwei Han, Shi Qiu, Yiyang Zhou, Zhaoyang Wang, Wenhao Zheng, Zhaorun Chen, Chenhang Cui, Mingyu Ding, Linjie Li, Lijuan Wang, Huaxiu Yao
Extensive experiments demonstrate the effectiveness of our benchmark and metrics in providing a comprehensive evaluation of interleaved LVLMs.
1 code implementation • 6 Jul 2024 • Peng Xia, Kangyu Zhu, Haoran Li, Hongtu Zhu, Yun Li, Gang Li, Linjun Zhang, Huaxiu Yao
Second, in cases where the model originally responds correctly, applying RAG can lead to an over-reliance on retrieved contexts, resulting in incorrect answers.
1 code implementation • 22 Jun 2024 • Wenxue Li, Xinyu Xiong, Peng Xia, Lie Ju, ZongYuan Ge
Furthermore, we design a prior-aligned injector to inject explicit priors into the segmentation process, which can facilitate knowledge sharing across multi-modality features and allow our framework to be trained in a parameter-efficient fashion.
1 code implementation • 11 Jun 2024 • Ming Hu, Peng Xia, Lin Wang, Siyuan Yan, Feilong Tang, Zhongxing Xu, Yimin Luo, Kaimin Song, Jurgen Leitner, Xuelian Cheng, Jun Cheng, Chi Liu, Kaijing Zhou, ZongYuan Ge
With approximately 285 hours of surgical videos, OphNet is about 20 times larger than the largest existing surgical workflow analysis benchmark.
1 code implementation • 10 Jun 2024 • Peng Xia, Ze Chen, Juanxi Tian, Yangrui Gong, Ruibo Hou, Yue Xu, Zhenbang Wu, Zhiyuan Fan, Yiyang Zhou, Kangyu Zhu, Wenhao Zheng, Zhaoyang Wang, Xiao Wang, Xuchao Zhang, Chetan Bansal, Marc Niethammer, Junzhou Huang, Hongtu Zhu, Yun Li, Jimeng Sun, ZongYuan Ge, Gang Li, James Zou, Huaxiu Yao
Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare.
1 code implementation • 10 Jun 2024 • Peng Xia, Ming Hu, Feilong Tang, Wenxue Li, Wenhao Zheng, Lie Ju, Peibo Duan, Huaxiu Yao, ZongYuan Ge
Subsequently, to improve the robustness of the decoupled representations, class and domain prototypes are employed to interpolate the disentangled representations while data-aware weights are designed to focus on rare classes and domains.
no code implementations • 18 May 2024 • Ming Hu, Siyuan Yan, Peng Xia, Feilong Tang, Wenxue Li, Peibo Duan, Lin Zhang, ZongYuan Ge
In this paper, we propose a test-time image adaptation method to enhance the accuracy of the model on test data by simultaneously updating and predicting test images.
1 code implementation • 23 Nov 2023 • Peng Xia, Xingtong Yu, Ming Hu, Lie Ju, Zhiyong Wang, Peibo Duan, ZongYuan Ge
We explore constructing the class hierarchy into a graph, with its nodes representing the textual or image features of each category.
Fine-Grained Visual Recognition
Graph Representation Learning
1 code implementation • 8 May 2023 • Peng Xia, Di Xu, Ming Hu, Lie Ju, ZongYuan Ge
Long-tailed multi-label visual recognition (LTML) task is a highly challenging task due to the label co-occurrence and imbalanced data distribution.
Ranked #1 on
Long-tail Learning
on COCO-MLT
(using extra training data)
2 code implementations • 31 Jul 2022 • Peng Xia, Yuechi Zhou, Ziyan Zhang, Zecheng Tang, Juntao Li
In view of the poor robustness of existing Chinese grammatical error correction models on attack test sets and large model parameters, this paper uses the method of knowledge distillation to compress model parameters and improve the anti-attack ability of the model.
no code implementations • ICML Workshop AutoML 2021 • David Eriksson, Pierce I-Jen Chuang, Samuel Daulton, Peng Xia, Akshat Shrivastava, Arun Babu, Shicong Zhao, Ahmed Aly, Ganesh Venkatesh, Maximilian Balandat
When tuning the architecture and hyperparameters of large machine learning models for on-device deployment, it is desirable to understand the optimal trade-offs between on-device latency and model accuracy.