TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Core Psychological Reasoning	AGENT	BIPaCK (Conditioned: G2)	Goal Preferences	.98	# 2
Core Psychological Reasoning	AGENT	BIPaCK (Conditioned: G2)	Action Efficiency	.95	# 4
Core Psychological Reasoning	AGENT	BIPaCK (Conditioned: G2)	Unobserved Constraints	.87	# 4
Core Psychological Reasoning	AGENT	BIPaCK (Conditioned: G2)	Cost-Reward	.92	# 3
Core Psychological Reasoning	AGENT	ToMnet-G (Conditioned: G2)	Goal Preferences	.71	# 7
Core Psychological Reasoning	AGENT	ToMnet-G (Conditioned: G2)	Action Efficiency	.65	# 7
Core Psychological Reasoning	AGENT	ToMnet-G (Conditioned: G2)	Unobserved Constraints	.73	# 6
Core Psychological Reasoning	AGENT	ToMnet-G (Conditioned: G2)	Cost-Reward	.75	# 6
Core Psychological Reasoning	AGENT	BIPaCK (Conditioned: G1)	Goal Preferences	.98	# 2
Core Psychological Reasoning	AGENT	BIPaCK (Conditioned: G1)	Action Efficiency	.97	# 2
Core Psychological Reasoning	AGENT	BIPaCK (Conditioned: G1)	Unobserved Constraints	.86	# 5
Core Psychological Reasoning	AGENT	BIPaCK (Conditioned: G1)	Cost-Reward	.94	# 2
Core Psychological Reasoning	AGENT	ToMnet-G (Conditioned: G1)	Goal Preferences	.75	# 6
Core Psychological Reasoning	AGENT	ToMnet-G (Conditioned: G1)	Action Efficiency	.66	# 6
Core Psychological Reasoning	AGENT	ToMnet-G (Conditioned: G1)	Unobserved Constraints	.69	# 7
Core Psychological Reasoning	AGENT	ToMnet-G (Conditioned: G1)	Cost-Reward	.48	# 7
Core Psychological Reasoning	AGENT	BIPaCK (Conditioned: All)	Goal Preferences	.99	# 1
Core Psychological Reasoning	AGENT	BIPaCK (Conditioned: All)	Action Efficiency	.97	# 2
Core Psychological Reasoning	AGENT	BIPaCK (Conditioned: All)	Unobserved Constraints	.90	# 2
Core Psychological Reasoning	AGENT	BIPaCK (Conditioned: All)	Cost-Reward	.95	# 1
Core Psychological Reasoning	AGENT	ToMnet-G (Conditioned: All)	Goal Preferences	.84	# 5
Core Psychological Reasoning	AGENT	ToMnet-G (Conditioned: All)	Action Efficiency	.98	# 1
Core Psychological Reasoning	AGENT	ToMnet-G (Conditioned: All)	Unobserved Constraints	.89	# 3
Core Psychological Reasoning	AGENT	ToMnet-G (Conditioned: All)	Cost-Reward	.89	# 4
Core Psychological Reasoning	AGENT	Human	Goal Preferences	.95	# 4
Core Psychological Reasoning	AGENT	Human	Action Efficiency	.91	# 5
Core Psychological Reasoning	AGENT	Human	Unobserved Constraints	.92	# 1
Core Psychological Reasoning	AGENT	Human	Cost-Reward	.87	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/agent-a-benchmark-for-core-psychological/core-psychological-reasoning-on-agent)](https://paperswithcode.com/sota/core-psychological-reasoning-on-agent?p=agent-a-benchmark-for-core-psychological)`

AGENT: A Benchmark for Core Psychological Reasoning

24 Feb 2021 · Tianmin Shu, Abhishek Bhandwaldar, Chuang Gan, Kevin A. Smith, Shari Liu, Dan Gutfreund, Elizabeth Spelke, Joshua B. Tenenbaum, Tomer D. Ullman ·

For machine agents to successfully interact with humans in real-world settings, they will need to develop an understanding of human mental life. Intuitive psychology, the ability to reason about hidden mental variables that drive observable actions, comes naturally to people: even pre-verbal infants can tell agents from objects, expecting agents to act efficiently to achieve goals given constraints. Despite recent interest in machine agents that reason about other agents, it is not clear if such agents learn or hold the core psychology principles that drive human reasoning. Inspired by cognitive development studies on intuitive psychology, we present a benchmark consisting of a large dataset of procedurally generated 3D animations, AGENT (Action, Goal, Efficiency, coNstraint, uTility), structured around four scenarios (goal preferences, action efficiency, unobserved constraints, and cost-reward trade-offs) that probe key concepts of core intuitive psychology. We validate AGENT with human-ratings, propose an evaluation protocol emphasizing generalization, and compare two strong baselines built on Bayesian inverse planning and a Theory of Mind neural network. Our results suggest that to pass the designed tests of core intuitive psychology at human levels, a model must acquire or have built-in representations of how agents plan, combining utility computations and core knowledge of objects and physics.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Core Psychological Reasoning

Datasets

Introduced in the Paper:

AGENT

Results from the Paper

Edit

Ranked #1 on Core Psychological Reasoning on AGENT

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Core Psychological Reasoning	AGENT	BIPaCK (Conditioned: G2)	Goal Preferences	.98	# 2	Compare
			Action Efficiency	.95	# 4	Compare
			Unobserved Constraints	.87	# 4	Compare
			Cost-Reward	.92	# 3	Compare
Core Psychological Reasoning	AGENT	ToMnet-G (Conditioned: G2)	Goal Preferences	.71	# 7	Compare
			Action Efficiency	.65	# 7	Compare
			Unobserved Constraints	.73	# 6	Compare
			Cost-Reward	.75	# 6	Compare
Core Psychological Reasoning	AGENT	BIPaCK (Conditioned: G1)	Goal Preferences	.98	# 2	Compare
			Action Efficiency	.97	# 2	Compare
			Unobserved Constraints	.86	# 5	Compare
			Cost-Reward	.94	# 2	Compare
Core Psychological Reasoning	AGENT	ToMnet-G (Conditioned: G1)	Goal Preferences	.75	# 6	Compare
			Action Efficiency	.66	# 6	Compare
			Unobserved Constraints	.69	# 7	Compare
			Cost-Reward	.48	# 7	Compare
Core Psychological Reasoning	AGENT	BIPaCK (Conditioned: All)	Goal Preferences	.99	# 1	Compare
			Action Efficiency	.97	# 2	Compare
			Unobserved Constraints	.90	# 2	Compare
			Cost-Reward	.95	# 1	Compare
Core Psychological Reasoning	AGENT	ToMnet-G (Conditioned: All)	Goal Preferences	.84	# 5	Compare
			Action Efficiency	.98	# 1	Compare
			Unobserved Constraints	.89	# 3	Compare
			Cost-Reward	.89	# 4	Compare
Core Psychological Reasoning	AGENT	Human	Goal Preferences	.95	# 4	Compare
			Action Efficiency	.91	# 5	Compare
			Unobserved Constraints	.92	# 1	Compare
			Cost-Reward	.87	# 5	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

AGENT: A Benchmark for Core Psychological Reasoning

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove