no code implementations • 29 Mar 2025 • Anjiang Wei, Tarun Suresh, Jiannan Cao, Naveen Kannan, Yuheng Wu, Kai Yan, Thiago S. F. X. Teixeira, Ke Wang, Alex Aiken
CodeARC provides a more realistic and challenging testbed for evaluating LLM-based program synthesis and inductive reasoning.
no code implementations • 18 Feb 2025 • Anjiang Wei, Jiannan Cao, Ran Li, Hongyu Chen, Yuhui Zhang, Ziheng Wang, Yaofeng Sun, YuAn Liu, Thiago S. F. X. Teixeira, Diyi Yang, Ke Wang, Alex Aiken
Equivalence checking, i. e., determining whether two programs produce identical outputs for all possible inputs, underpins a broad range of applications, including software refactoring, testing, and optimization.
no code implementations • 2 Oct 2024 • Yanming Liu, Xinyue Peng, Jiannan Cao, Shi Bo, Yanxin Shen, Xuhong Zhang, Sheng Cheng, Xun Wang, Jianwei Yin, Tianyu Du
Large language models (LLMs) have shown remarkable capabilities in natural language processing; however, they still face difficulties when tasked with understanding lengthy contexts and executing effective question answering.
1 code implementation • 20 Jun 2024 • Chunyuan Deng, Yilun Zhao, Yuzhao Heng, Yitong Li, Jiannan Cao, Xiangru Tang, Arman Cohan
This survey serves as a succinct overview of the most recent advancements in data contamination research, providing a straightforward guide for the benefit of future research endeavors.
no code implementations • 16 Jun 2024 • Yanming Liu, Xinyue Peng, Yuwei Zhang, Xiaolan Ke, Songhang Deng, Jiannan Cao, Chen Ma, Mengchen Fu, Tianyu Du, Sheng Cheng, Xun Wang, Jianwei Yin, Xuhong Zhang
Large language models have repeatedly shown outstanding performance across diverse applications.
1 code implementation • 6 Jun 2024 • Yanming Liu, Xinyue Peng, Jiannan Cao, Shi Bo, Yuwei Zhang, Xuhong Zhang, Sheng Cheng, Xun Wang, Jianwei Yin, Tianyu Du
Experiments show that our approach demonstrates a high pass and win rate across different datasets and optimizes the planning scheme for tool learning in models such as GPT-4 and Claude 3, showcasing the potential of our method.
1 code implementation • 11 Mar 2024 • Yanming Liu, Xinyue Peng, Xuhong Zhang, Weihao Liu, Jianwei Yin, Jiannan Cao, Tianyu Du
Large language models (LLMs) demonstrate exceptional performance in numerous tasks but still heavily rely on knowledge stored in their parameters.
1 code implementation • 11 Mar 2024 • Yanming Liu, Xinyue Peng, Shi Bo, Ningjing Sang, Yafeng Yan, Xiaolan Ke, Zhiting Zheng, Shaobo Liu, Songhang Deng, Jiannan Cao, Le Dai, Xingzu Liu, Ruilin Nong, Weihao Liu
Large language models(LLMs) have shown its outperforming ability on various tasks and question answering.
1 code implementation • 2 Nov 2023 • Yining Ye, Xin Cong, Shizuo Tian, Jiannan Cao, Hao Wang, Yujia Qin, Yaxi Lu, Heyang Yu, Huadong Wang, Yankai Lin, Zhiyuan Liu, Maosong Sun
Empirical experiments are conducted to detail its construction and execution procedure of workflow, showcasing the feasibility of APA, unveiling the possibility of a new paradigm of automation driven by agents.