1 code implementation • 7 Oct 2024 • Qingchen Yu, Shichao Song, Ke Fang, Yunfeng Shi, Zifan Zheng, Hanyu Wang, Simin Niu, Zhiyu Li
This approach allows for the relatively dynamic generation of evaluation datasets, mitigating the risk of model cheating while aligning assessments more closely with genuine user needs for reasoning capabilities, thus enhancing the reliability of evaluations.
1 code implementation • 5 Sep 2024 • Zifan Zheng, Yezhaohui Wang, Yuxin Huang, Shichao Song, MingChuan Yang, Bo Tang, Feiyu Xiong, Zhiyu Li
Our survey aims to shed light on the internal reasoning processes of LLMs by concentrating on the underlying mechanisms of attention heads.
1 code implementation • 19 Jul 2024 • Xun Liang, Shichao Song, Zifan Zheng, Hanyu Wang, Qingchen Yu, Xunkai Li, Rong-Hua Li, Yi Wang, Zhonghao Wang, Feiyu Xiong, Zhiyu Li
In this paper, we use a unified perspective of internal consistency, offering explanations for reasoning deficiencies and hallucinations.
1 code implementation • 20 May 2024 • Qingchen Yu, Zifan Zheng, Shichao Song, Zhiyu Li, Feiyu Xiong, Bo Tang, Ding Chen
The continuous advancement of large language models (LLMs) has brought increasing attention to the critical issue of developing fair and reliable methods for evaluating their performance.