1 code implementation • 23 Sep 2024 • Weifeng Lin, Xinyu Wei, Renrui Zhang, Le Zhuo, Shitian Zhao, Siyuan Huang, Junlin Xie, Yu Qiao, Peng Gao, Hongsheng Li
Furthermore, we adopt Diffusion Transformers (DiT) as our foundation model and extend its capabilities with a flexible any resolution mechanism, enabling the model to dynamically process images based on the aspect ratio of the input, closely aligning with human perceptual processes.
2 code implementations • 11 Jul 2024 • Renrui Zhang, Xinyu Wei, Dongzhi Jiang, Ziyu Guo, Shicheng Li, Yichi Zhang, Chengzhuo Tong, Jiaming Liu, Aojun Zhou, Bin Wei, Shanghang Zhang, Peng Gao, Chunyuan Li, Hongsheng Li
The mathematical capabilities of Multi-modal Large Language Models (MLLMs) remain under-explored with three areas to be improved: visual encoding of math diagrams, diagram-language alignment, and chain-of-thought (CoT) reasoning.
no code implementations • 22 Jun 2024 • Guanqun Wang, Xinyu Wei, Jiaming Liu, Ray Zhang, Yichi Zhang, Kevin Zhang, Maurice Chong, Shanghang Zhang
In recent years, multimodal large language models (MLLMs) have shown remarkable capabilities in tasks like visual question answering and common sense reasoning, while visual perception models have made significant strides in perception tasks, such as detection and segmentation.
1 code implementation • 29 Mar 2024 • Weifeng Lin, Xinyu Wei, Ruichuan An, Peng Gao, Bocheng Zou, Yulin Luo, Siyuan Huang, Shanghang Zhang, Hongsheng Li
In this paper, we introduce the Draw-and-Understand project: a new model, a multi-domain dataset, and a challenging benchmark for visual prompting.
no code implementations • 20 Mar 2024 • Siying Cui, Jia Guo, Xiang An, Jiankang Deng, Yongle Zhao, Xinyu Wei, Ziyong Feng
Leveraging Stable Diffusion for the generation of personalized portraits has emerged as a powerful and noteworthy tool, enabling users to create high-fidelity, custom character avatars based on their specific prompts.
no code implementations • CVPR 2024 • Guanqun Wang, Jiaming Liu, Chenxuan Li, Junpeng Ma, Yuan Zhang, Xinyu Wei, Kevin Zhang, Maurice Chong, Ray Zhang, Yijiang Liu, Shanghang Zhang
However, the deployment of these large-scale MLLMs on client devices is hindered by their extensive model parameters, leading to a notable decline in generalization capabilities when these models are compressed for device deployment.
no code implementations • 21 Nov 2021 • Xinyu Wei, Biing-Hwang Fred Juang, Ouya Wang, Shenglong Zhou, Geoffrey Ye Li
In this paper, we propose a new learning method named Accretionary Learning (AL) to emulate human learning, in that the set of objects to be recognized may not be pre-specified.
1 code implementation • 14 Sep 2019 • Xingbo Fu, Feng Gao, Jiang Wu, Xinyu Wei, Fangwei Duan
This model captures spatial correlations among wind farms and temporal dependencies of wind power time series.
no code implementations • 15 Jun 2019 • Xinyu Wei, Jiahui Chen, Jing Chen, Bernie Liu
In the era of information explosion, facing complex information, it is difficult for users to choose the information of interest, and businesses also need detailed information on ways to let the ad stand out.