2 code implementations • 17 Sep 2023 • Qianyu He, Jie Zeng, Wenhao Huang, Lina Chen, Jin Xiao, Qianxi He, Xunzhe Zhou, Lida Chen, Xintao Wang, Yuncheng Huang, Haoning Ye, Zihan Li, Shisong Chen, Yikai Zhang, Zhouhong Gu, Jiaqing Liang, Yanghua Xiao
To bridge this gap, we propose CELLO, a benchmark for evaluating LLMs' ability to follow complex instructions systematically.
no code implementations • 11 Jul 2023 • Zhouhong Gu, Lin Zhang, Jiangjie Chen, Haoning Ye, Xiaoxuan Zhu, Zihan Li, Zheyu Ye, Yan Gao, Yao Hu, Yanghua Xiao, Hongwei Feng
We introduces the DetectBench, a reading comprehension dataset designed to assess a model's ability to jointly ability in key information detection and multi-hop reasoning when facing complex and implicit information.
2 code implementations • 9 Jun 2023 • Zhouhong Gu, Xiaoxuan Zhu, Haoning Ye, Lin Zhang, Jianchen Wang, Yixin Zhu, Sihang Jiang, Zhuozhi Xiong, Zihan Li, Weijie Wu, Qianyu He, Rui Xu, Wenhao Huang, Jingping Liu, Zili Wang, Shusen Wang, Weiguo Zheng, Hongwei Feng, Yanghua Xiao
New Natural Langauge Process~(NLP) benchmarks are urgently needed to align with the rapid development of large language models (LLMs).
no code implementations • 23 Apr 2023 • Zhouhong Gu, Xiaoxuan Zhu, Haoning Ye, Lin Zhang, Zhuozhi Xiong, Zihan Li, Qianyu He, Sihang Jiang, Hongwei Feng, Yanghua Xiao
Domain knowledge refers to the in-depth understanding, expertise, and familiarity with a specific subject, industry, field, or area of special interest.