The causal reasoning dataset is generated using the Causal Reasoning in Closed Daily Activities (COLD) framework that helps evaluate large language models (LLMs) on their causal reasoning abilities within real-world, everyday activities. This dataset provides causal questions that simulate common activities such as shopping, baking a cake, riding a bus, planting a tree, and going on a train ride. With approximately 9 million causal queries, the COLD dataset challenges LLMs to understand and reason about the causal relationships between events that are familiar and grounded in human experience.
Each query consists of a premise (an event) and a pair of choices representing possible causal effects. The goal of the model is to correctly identify which choice is the most plausible cause/effect of the given premise, testing the model's understanding of cause-and-effect relationships.
Key Features: Activity Types: The dataset covers various everyday activities: shopping, cake baking, train ride, tree planting, and bus ride. Causal Queries: Each query includes a premise and two possible causal events (choices). The model must decide which of the two choices is the more likely cause or effect. Multiple-Choice Format: The queries can be formatted as multiple-choice questions (MCQA), where the model must choose between two options.
The dataset provides a valuable test for causal reasoning in NLP models, focusing on realistic, daily-life scenarios.
Paper | Code | Results | Date | Stars |
---|