1 code implementation • 21 Feb 2024 • Chaoqun He, Renjie Luo, Yuzhuo Bai, Shengding Hu, Zhen Leng Thai, Junhao Shen, Jinyi Hu, Xu Han, Yujie Huang, Yuxiang Zhang, Jie Liu, Lei Qi, Zhiyuan Liu, Maosong Sun
Notably, the best-performing model, GPT-4V, attains an average score of 17. 23% on OlympiadBench, with a mere 11. 28% in physics, highlighting the benchmark rigor and the intricacy of physical reasoning.
1 code implementation • 12 Feb 2024 • Jiarui Zhang, Jinyi Hu, Mahyar Khayatkhoei, Filip Ilievski, Maosong Sun
Multimodal Large Language Models (MLLMs) have recently shown remarkable perceptual capability in answering visual questions, however, little is known about the limits of their perception.
2 code implementations • 1 Dec 2023 • Tianyu Yu, Yuan YAO, Haoye Zhang, Taiwen He, Yifeng Han, Ganqu Cui, Jinyi Hu, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun, Tat-Seng Chua
Multimodal Large Language Models (MLLMs) have recently demonstrated impressive capabilities in multimodal understanding, reasoning, and interaction.
2 code implementations • 1 Oct 2023 • Tianyu Yu, Jinyi Hu, Yuan YAO, Haoye Zhang, Yue Zhao, Chongyi Wang, Shan Wang, Yinxv Pan, Jiao Xue, Dahai Li, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun
The capabilities of MLLMs depend on two crucial factors: the model architecture to facilitate the feature alignment of visual modules and large language models; the multimodal instruction tuning datasets for human instruction following.
2 code implementations • 23 Aug 2023 • Jinyi Hu, Yuan YAO, Chongyi Wang, Shan Wang, Yinxu Pan, Qianyu Chen, Tianyu Yu, Hanghao Wu, Yue Zhao, Haoye Zhang, Xu Han, Yankai Lin, Jiao Xue, Dahai Li, Zhiyuan Liu, Maosong Sun
Building a competitive counterpart in other languages is highly challenging due to the low-resource nature of non-English multimodal data (i. e., lack of large-scale, high-quality image-text data).
no code implementations • 19 May 2023 • Jinyi Hu, Xu Han, Xiaoyuan Yi, Yutong Chen, Wenhao Li, Zhiyuan Liu, Maosong Sun
IAP optimizes only a separate Chinese text encoder with all other parameters fixed to align Chinese semantics space to the English one in CLIP.
1 code implementation • 14 Nov 2022 • Wenhao Li, Xiaoyuan Yi, Jinyi Hu, Maosong Sun, Xing Xie
In this work, we dig into the intrinsic mechanism of this problem and found that sparser attention values in Transformer could improve diversity.
no code implementations • 22 Oct 2022 • Jinyi Hu, Xiaoyuan Yi, Wenhao Li, Maosong Sun, Xing Xie
We demonstrate that TRACE could enhance the entanglement of each segment and preceding latent variables and deduce a non-zero lower bound of the KL term, providing a theoretical guarantee of generation diversity.
1 code implementation • NAACL 2022 • Jinyi Hu, Xiaoyuan Yi, Wenhao Li, Maosong Sun, Xing Xie
The past several years have witnessed Variational Auto-Encoder's superiority in various text generation tasks.
1 code implementation • NeurIPS 2020 • Wangchunshu Zhou, Jinyi Hu, HANLIN ZHANG, Xiaodan Liang, Maosong Sun, Chenyan Xiong, Jian Tang
In this paper, we develop a general framework for interpretable natural language understanding that requires only a small set of human annotated explanations for training.
no code implementations • LREC 2020 • Jinyi Hu, Maosong Sun
In this paper, we propose a GPT-2 based uniformed framework for generating major types of Chinese classical poems.