The LogiEval dataset is a benchmark suite designed for evaluating the logical reasoning abilities of prompt-based language models, particularly instruct-prompt large language models. Here are some key details about LogiEval:

  1. Purpose and Origin:
  2. LogiEval was created to assess how well language models perform in tasks that require logical reasoning.
  3. It is based on the OpenAI Eval library and focuses on evaluating logical reasoning abilities.
  4. The dataset was developed by researchers to address the need for robust logical reasoning evaluation.

  5. Contents:

  6. LogiEval contains a set of logical reasoning tasks that challenge models to reason deductively.
  7. The tasks cover various types of logical reasoning, providing a comprehensive evaluation.
  8. The dataset includes 8,678 QA instances sourced from expert-written questions.

  9. Usage:

  10. Researchers and practitioners can use LogiEval to assess the logical reasoning capabilities of different models.
  11. To utilize LogiEval, one can follow the instructions provided in the repository, including setting up the necessary environment and running evaluations.

  12. Citation:

  13. If you're interested in using LogiEval or referring to it in your work, you can cite the following paper:
    • Title: "Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4"
    • Authors: Hanmeng Liu, Ruoxi Ning, Zhiyang Teng, Jian Liu, Qiji Zhou, Yue Zhang
    • Year: 2023
    • Link: Read the paper

In summary, LogiEval provides a valuable resource for assessing logical reasoning abilities in prompt-based language models. Researchers can use it to evaluate and compare different models' performance in logical reasoning tasks.

Source: Conversation with Bing, 3/18/2024 (1) Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4 - arXiv.org. https://arxiv.org/pdf/2304.03439.pdf. (2) GitHub - csitfun/LogiEval: a benchmark suite for testing logical .... https://github.com/csitfun/LogiEval. (3) [2007.08124] LogiQA: A Challenge Dataset for Machine Reading .... https://arxiv.org/abs/2007.08124. (4) [2203.15099] LogicInference: A New Dataset for Teaching Logical .... https://arxiv.org/abs/2203.15099. (5) Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4. https://arxiv.org/abs/2304.03439.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages