1 code implementation • 11 Apr 2024 • Chaoqun He, Renjie Luo, Shengding Hu, Yuanqian Zhao, Jie zhou, Hanghao Wu, Jiajie Zhang, Xu Han, Zhiyuan Liu, Maosong Sun
The rapid development of LLMs calls for a lightweight and easy-to-use framework for swift evaluation deployment.
1 code implementation • 21 Feb 2024 • Chaoqun He, Renjie Luo, Yuzhuo Bai, Shengding Hu, Zhen Leng Thai, Junhao Shen, Jinyi Hu, Xu Han, Yujie Huang, Yuxiang Zhang, Jie Liu, Lei Qi, Zhiyuan Liu, Maosong Sun
Notably, the best-performing model, GPT-4V, attains an average score of 17. 97% on OlympiadBench, with a mere 10. 74% in physics, highlighting the benchmark rigor and the intricacy of physical reasoning.