Search Results for author: Haoye Zhang

Found 5 papers, 5 papers with code

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

2 code implementations3 Aug 2024 Yuan YAO, Tianyu Yu, Ao Zhang, Chongyi Wang, Junbo Cui, Hongji Zhu, Tianchi Cai, Haoyu Li, Weilin Zhao, Zhihui He, Qianyu Chen, Huarong Zhou, Zhensheng Zou, Haoye Zhang, Shengding Hu, Zhi Zheng, Jie zhou, Jie Cai, Xu Han, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun

The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally reshaped the landscape of AI research and industry, shedding light on a promising path toward the next AI milestone.

Hallucination Multiple-choice +3

Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants

2 code implementations1 Oct 2023 Tianyu Yu, Jinyi Hu, Yuan YAO, Haoye Zhang, Yue Zhao, Chongyi Wang, Shan Wang, Yinxv Pan, Jiao Xue, Dahai Li, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun

The capabilities of MLLMs depend on two crucial factors: the model architecture to facilitate the feature alignment of visual modules and large language models; the multimodal instruction tuning datasets for human instruction following.

Instruction Following

Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages

2 code implementations23 Aug 2023 Jinyi Hu, Yuan YAO, Chongyi Wang, Shan Wang, Yinxu Pan, Qianyu Chen, Tianyu Yu, Hanghao Wu, Yue Zhao, Haoye Zhang, Xu Han, Yankai Lin, Jiao Xue, Dahai Li, Zhiyuan Liu, Maosong Sun

Building a competitive counterpart in other languages is highly challenging due to the low-resource nature of non-English multimodal data (i. e., lack of large-scale, high-quality image-text data).

Image to text Language Modeling +3

Cannot find the paper you are looking for? You can Submit a new open access paper.