Search Results for author: Xinyu Wei

Found 9 papers, 4 papers with code

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions

1 code implementation23 Sep 2024 Weifeng Lin, Xinyu Wei, Renrui Zhang, Le Zhuo, Shitian Zhao, Siyuan Huang, Junlin Xie, Yu Qiao, Peng Gao, Hongsheng Li

Furthermore, we adopt Diffusion Transformers (DiT) as our foundation model and extend its capabilities with a flexible any resolution mechanism, enabling the model to dynamically process images based on the aspect ratio of the input, closely aligning with human perceptual processes.

Image Restoration Text-to-Image Generation

MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine

2 code implementations11 Jul 2024 Renrui Zhang, Xinyu Wei, Dongzhi Jiang, Ziyu Guo, Shicheng Li, Yichi Zhang, Chengzhuo Tong, Jiaming Liu, Aojun Zhou, Bin Wei, Shanghang Zhang, Peng Gao, Chunyuan Li, Hongsheng Li

The mathematical capabilities of Multi-modal Large Language Models (MLLMs) remain under-explored with three areas to be improved: visual encoding of math diagrams, diagram-language alignment, and chain-of-thought (CoT) reasoning.

Contrastive Learning Language Modelling +4

MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception

no code implementations22 Jun 2024 Guanqun Wang, Xinyu Wei, Jiaming Liu, Ray Zhang, Yichi Zhang, Kevin Zhang, Maurice Chong, Shanghang Zhang

In recent years, multimodal large language models (MLLMs) have shown remarkable capabilities in tasks like visual question answering and common sense reasoning, while visual perception models have made significant strides in perception tasks, such as detection and segmentation.

Common Sense Reasoning Language Modelling +6

Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

1 code implementation29 Mar 2024 Weifeng Lin, Xinyu Wei, Ruichuan An, Peng Gao, Bocheng Zou, Yulin Luo, Siyuan Huang, Shanghang Zhang, Hongsheng Li

In this paper, we introduce the Draw-and-Understand project: a new model, a multi-domain dataset, and a challenging benchmark for visual prompting.

Instruction Following Language Modelling +6

IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models

no code implementations20 Mar 2024 Siying Cui, Jia Guo, Xiang An, Jiankang Deng, Yongle Zhao, Xinyu Wei, Ziyong Feng

Leveraging Stable Diffusion for the generation of personalized portraits has emerged as a powerful and noteworthy tool, enabling users to create high-fidelity, custom character avatars based on their specific prompts.

Diversity Image Generation +1

Cloud-Device Collaborative Learning for Multimodal Large Language Models

no code implementations CVPR 2024 Guanqun Wang, Jiaming Liu, Chenxuan Li, Junpeng Ma, Yuan Zhang, Xinyu Wei, Kevin Zhang, Maurice Chong, Ray Zhang, Yijiang Liu, Shanghang Zhang

However, the deployment of these large-scale MLLMs on client devices is hindered by their extensive model parameters, leading to a notable decline in generalization capabilities when these models are compressed for device deployment.

Device-Cloud Collaboration Knowledge Distillation +1

Accretionary Learning with Deep Neural Networks

no code implementations21 Nov 2021 Xinyu Wei, Biing-Hwang Fred Juang, Ouya Wang, Shenglong Zhou, Geoffrey Ye Li

In this paper, we propose a new learning method named Accretionary Learning (AL) to emulate human learning, in that the set of objects to be recognized may not be pre-specified.

Spatiotemporal Attention Networks for Wind Power Forecasting

1 code implementation14 Sep 2019 Xingbo Fu, Feng Gao, Jiang Wu, Xinyu Wei, Fangwei Duan

This model captures spatial correlations among wind farms and temporal dependencies of wind power time series.

Time Series Time Series Analysis

A Books Recommendation Approach Based on Online Bookstore Data

no code implementations15 Jun 2019 Xinyu Wei, Jiahui Chen, Jing Chen, Bernie Liu

In the era of information explosion, facing complex information, it is difficult for users to choose the information of interest, and businesses also need detailed information on ways to let the ad stand out.

Cannot find the paper you are looking for? You can Submit a new open access paper.