LogiEval Dataset | Papers With Code

Name:*

Full name (optional):

Description (Markdown and $\LaTeX$ enabled):*

The **LogiEval dataset** is a benchmark suite designed for evaluating the **logical reasoning abilities** of prompt-based language models, particularly instruct-prompt large language models. Here are some key details about LogiEval:

1. **Purpose and Origin**:
   - LogiEval was created to assess how well language models perform in tasks that require logical reasoning.
   - It is based on the **OpenAI Eval library** and focuses on evaluating logical reasoning abilities.
   - The dataset was developed by researchers to address the need for robust logical reasoning evaluation.

2. **Contents**:
   - LogiEval contains a set of **logical reasoning tasks** that challenge models to reason deductively.
   - The tasks cover various types of logical reasoning, providing a comprehensive evaluation.
   - The dataset includes **8,678 QA instances** sourced from expert-written questions.

3. **Usage**:
   - Researchers and practitioners can use LogiEval to assess the logical reasoning capabilities of different models.
   - To utilize LogiEval, one can follow the instructions provided in the repository, including setting up the necessary environment and running evaluations.

4. **Citation**:
   - If you're interested in using LogiEval or referring to it in your work, you can cite the following paper:
     - **Title**: "Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4"
     - **Authors**: Hanmeng Liu, Ruoxi Ning, Zhiyang Teng, Jian Liu, Qiji Zhou, Yue Zhang
     - **Year**: 2023
     - **Link**: [Read the paper](https://arxiv.org/abs/2304.03439) ⁴

In summary, LogiEval provides a valuable resource for assessing logical reasoning abilities in prompt-based language models. Researchers can use it to evaluate and compare different models' performance in logical reasoning tasks.

Source: Conversation with Bing, 3/18/2024
(1) Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4 - arXiv.org. https://arxiv.org/pdf/2304.03439.pdf.
(2) GitHub - csitfun/LogiEval: a benchmark suite for testing logical .... https://github.com/csitfun/LogiEval.
(3) [2007.08124] LogiQA: A Challenge Dataset for Machine Reading .... https://arxiv.org/abs/2007.08124.
(4) [2203.15099] LogicInference: A New Dataset for Teaching Logical .... https://arxiv.org/abs/2203.15099.
(5) Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4. https://arxiv.org/abs/2304.03439.

Homepage URL (optional):

Paper where the dataset was introduced:

Introduction date:

Dataset license:

URL to full license terms:

Image

---

LogiEval

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

EntailmentBank

HELP

MED

TaxiNLI

Usage

License

Modalities

Languages

LogiEval

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit