SciEval is a comprehensive and multi-disciplinary evaluation benchmark designed to assess the performance of large language models (LLMs) in the scientific domain. It addresses several critical issues related to evaluating LLMs for scientific research.

Here are the key features of SciEval:

  1. Multi-Dimensional Evaluation: SciEval systematically evaluates scientific research ability across four dimensions based on Bloom's taxonomy. These dimensions cover various aspects of scientific understanding and reasoning.

  2. Objective and Subjective Questions: Unlike existing benchmarks that primarily rely on pre-collected objective questions, SciEval includes both objective and subjective questions. This approach ensures a more comprehensive evaluation of LLMs' abilities.

  3. Dynamic Subset: To prevent potential data leakage, SciEval introduces a "dynamic" subset based on scientific principles. This subset dynamically adapts to evaluate LLMs' performance without compromising the integrity of the evaluation process.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages