no code implementations • 20 Feb 2024 • Jing Han Sun, Ali Emami
Our results emphasize the challenge posed by EvoGrad: Even the best performing LLM, GPT-3. 5, achieves an accuracy of 65. 0% with an average error depth of 7. 2, a stark contrast to human performance of 92.