Phy-Q is a benchmark that requires an agent to reason about physical scenarios and take an action accordingly. Inspired by the physical knowledge acquired in infancy and the capabilities required for robots to operate in real-world environments, the authors identify 15 essential physical scenarios. For each scenario, a wide variety of distinct task templates are created, and all the task templates within the same scenario can be solved by using one specific physical rule.

By having such a design, two distinct levels of generalization can be evaluated, namely the local generalization and the broad generalization. The benchmark gives a Phy-Q (physical reasoning quotient) score that reflects the physical reasoning ability of the agents.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages