Search Results for author: Xinyu Fang

Found 8 papers, 4 papers with code

MuMA-ToM: Multi-modal Multi-Agent Theory of Mind

2 code implementations22 Aug 2024 Haojun Shi, Suyu Ye, Xinyu Fang, Chuanyang Jin, Leyla Isik, Yen-Ling Kuo, Tianmin Shu

To truly understand how and why people interact with one another, we must infer the underlying mental states that give rise to the social interactions, i. e., Theory of Mind reasoning in multi-agent interactions.

JieHua Paintings Style Feature Extracting Model using Stable Diffusion with ControlNet

no code implementations21 Aug 2024 Yujia Gu, Haofeng Li, Xinyu Fang, Zihan Peng, Yinan Peng

This study proposes a novel approach to extract stylistic features of Jiehua: the utilization of the Fine-tuned Stable Diffusion Model with ControlNet (FSDMC) to refine depiction techniques from artists' Jiehua.

Style Transfer

A New Chinese Landscape Paintings Generation Model based on Stable Diffusion using DreamBooth

no code implementations16 Aug 2024 Yujia Gu, Xinyu Fang, Xueyuan Deng, Zihan Peng, Yinan Peng

This study mainly introduces a method combining the Stable Diffusion Model (SDM) and Parameter-Efficient Fine-Tuning method for generating Chinese Landscape Paintings.

parameter-efficient fine-tuning

VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

1 code implementation16 Jul 2024 Haodong Duan, Junming Yang, Yuxuan Qiao, Xinyu Fang, Lin Chen, YuAn Liu, Amit Agarwal, Zhe Chen, Mo Li, Yubo Ma, Hailong Sun, Xiangyu Zhao, Junbo Cui, Xiaoyi Dong, Yuhang Zang, Pan Zhang, Jiaqi Wang, Dahua Lin, Kai Chen

Based on the evaluation results obtained with the toolkit, we host OpenVLM Leaderboard, a comprehensive leaderboard to track the progress of multi-modality learning research.

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs

1 code implementation20 Jun 2024 Yuxuan Qiao, Haodong Duan, Xinyu Fang, Junming Yang, Lin Chen, Songyang Zhang, Jiaqi Wang, Dahua Lin, Kai Chen

Vision Language Models (VLMs) demonstrate remarkable proficiency in addressing a wide array of visual questions, which requires strong perception and reasoning faculties.

Language Modelling Large Language Model

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

1 code implementation20 Jun 2024 Xinyu Fang, Kangrui Mao, Haodong Duan, Xiangyu Zhao, Yining Li, Dahua Lin, Kai Chen

The advent of large vision-language models (LVLMs) has spurred research into their applications in multi-modal contexts, particularly in video understanding.

Video Understanding

LoRA-Switch: Boosting the Efficiency of Dynamic LLM Adapters via System-Algorithm Co-design

no code implementations28 May 2024 Rui Kong, Qiyang Li, Xinyu Fang, Qingtian Feng, Qingfeng He, Yazhu Dong, Weijun Wang, Yuanchun Li, Linghe Kong, Yunxin Liu

Recent literature has found that an effective method to customize or further improve large language models (LLMs) is to add dynamic adapters, such as low-rank adapters (LoRA) with Mixture-of-Experts (MoE) structures.

Multimodal Fusion of EHR in Structures and Semantics: Integrating Clinical Records and Notes with Hypergraph and LLM

no code implementations19 Feb 2024 Hejie Cui, Xinyu Fang, ran Xu, Xuan Kan, Joyce C. Ho, Carl Yang

While there has been a lot of research on representation learning of structured EHR data, the fusion of different types of EHR data (multimodal fusion) is not well studied.

Decision Making Representation Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.