JEEBench is a considerably more challenging benchmark dataset for evaluating the problem solving abilities of LLMs. It curates 515 challenging pre-engineering mathematics, physics and chemistry problems from the IIT JEE-Advanced Exam. Long-horizon reasoning on top of deep in-domain knowledge is essential for solving problems in this benchmark.
Source: Have LLMs Advanced Enough? A Challenging Problem Solving Benchmark For Large Language ModelsPaper | Code | Results | Date | Stars |
---|