SciEval

Introduced by Sun et al. in SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research

SciEval is a comprehensive and multi-disciplinary evaluation benchmark designed to assess the performance of large language models (LLMs) in the scientific domain. It addresses several critical issues related to evaluating LLMs for scientific research.

Here are the key features of SciEval:

Multi-Dimensional Evaluation: SciEval systematically evaluates scientific research ability across four dimensions based on Bloom's taxonomy. These dimensions cover various aspects of scientific understanding and reasoning.
Objective and Subjective Questions: Unlike existing benchmarks that primarily rely on pre-collected objective questions, SciEval includes both objective and subjective questions. This approach ensures a more comprehensive evaluation of LLMs' abilities.
Dynamic Subset: To prevent potential data leakage, SciEval introduces a "dynamic" subset based on scientific principles. This subset dynamically adapts to evaluate LLMs' performance without compromising the integrity of the evaluation process.

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

SciEval

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

ScienceQA

UniProtQA

PubChemQA

JEEBench

Usage

License

Modalities

Languages

SciEval

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

ScienceQA

UniProtQA

PubChemQA

JEEBench

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages