no code implementations • 18 Jul 2024 • Minyang Tian, Luyu Gao, Shizhuo Dylan Zhang, Xinan Chen, Cunwei Fan, Xuefei Guo, Roland Haas, Pan Ji, Kittithat Krongchon, Yao Li, Shengyan Liu, Di Luo, Yutao Ma, Hao Tong, Kha Trinh, Chenyu Tian, Zihan Wang, Bohao Wu, Yanyu Xiong, Shengzhu Yin, Minhui Zhu, Kilian Lieret, Yanxin Lu, Genglin Liu, Yufeng Du, Tianhua Tao, Ofir Press, Jamie Callan, Eliu Huerta, Hao Peng
Since language models (LMs) now outperform average humans on many challenging tasks, it has become increasingly difficult to develop challenging, high-quality, and realistic evaluations.
1 code implementation • 15 Feb 2024 • Changshu Liu, Shizhuo Dylan Zhang, Ali Reza Ibrahimzada, Reyhaneh Jabbarvand
The first two evaluate models to predict the execution output of an arbitrary code or code the model could correctly synthesize.
no code implementations • 24 May 2023 • Shizhuo Dylan Zhang, Curt Tigges, Stella Biderman, Maxim Raginsky, Talia Ringer
Neural networks have in recent years shown promise for helping software engineers write programs and even formally verify them.