CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering

7 Nov 2022  ยท  Maitreya Patel, Tejas Gokhale, Chitta Baral, Yezhou Yang ยท

Videos often capture objects, their visible properties, their motion, and the interactions between different objects. Objects also have physical properties such as mass, which the imaging pipeline is unable to directly capture. However, these properties can be estimated by utilizing cues from relative object motion and the dynamics introduced by collisions. In this paper, we introduce CRIPP-VQA, a new video question answering dataset for reasoning about the implicit physical properties of objects in a scene. CRIPP-VQA contains videos of objects in motion, annotated with questions that involve counterfactual reasoning about the effect of actions, questions about planning in order to reach a goal, and descriptive questions about visible properties of objects. The CRIPP-VQA test set enables evaluation under several out-of-distribution settings -- videos with objects with masses, coefficients of friction, and initial velocities that are not observed in the training distribution. Our experiments reveal a surprising and significant performance gap in terms of answering questions about implicit properties (the focus of this paper) and explicit properties of objects (the focus of prior work).

PDF Abstract

Datasets


Introduced in the Paper:

CRIPP-VQA

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Counterfactual Planning CRIPP-VQA Aloe*+BERT Accuracy 63.21 # 1
Add - PO CRIPP-VQA Aloe*+BERT Accuracy 67.43 # 1
Add - PQ CRIPP-VQA Aloe*+BERT Accuracy 39.71 # 1
Replace - PO CRIPP-VQA Aloe*+BERT Accuracy 56.76 # 1
Replace - PQ CRIPP-VQA Aloe*+BERT Accuracy 22.07 # 1
Remove - PO CRIPP-VQA Aloe*+BERT Accuracy 65.46 # 1
Remove - PQ CRIPP-VQA Aloe*+BERT Accuracy 33.64 # 1
Descriptive CRIPP-VQA Aloe*+BERT Accuracy 71.04 # 1
Counterfactual Planning CRIPP-VQA Aloe* Accuracy 57.18 # 2
Add - PO CRIPP-VQA Aloe* Accuracy 56.55 # 2
Add - PQ CRIPP-VQA Aloe* Accuracy 18.13 # 3
Replace - PO CRIPP-VQA Aloe* Accuracy 52.1 # 3
Replace - PQ CRIPP-VQA Aloe* Accuracy 9.91 # 4
Remove - PO CRIPP-VQA Aloe* Accuracy 62.9 # 2
Remove - PQ CRIPP-VQA Aloe* Accuracy 31.1 # 2
Descriptive CRIPP-VQA Aloe* Accuracy 68.94 # 2
Counterfactual Planning CRIPP-VQA HCRN Accuracy 57.02 # 3
Add - PO CRIPP-VQA HCRN Accuracy 56.06 # 3
Add - PQ CRIPP-VQA HCRN Accuracy 20.49 # 2
Replace - PO CRIPP-VQA HCRN Accuracy 55.97 # 2
Replace - PQ CRIPP-VQA HCRN Accuracy 19.87 # 2
Remove - PO CRIPP-VQA HCRN Accuracy 59.04 # 3
Remove - PQ CRIPP-VQA HCRN Accuracy 27.2 # 3
Descriptive CRIPP-VQA HCRN Accuracy 64.98 # 3
Counterfactual Planning CRIPP-VQA MAC Accuracy 50.24 # 4
Add - PO CRIPP-VQA MAC Accuracy 49.83 # 4
Add - PQ CRIPP-VQA MAC Accuracy 16.29 # 4
Replace - PO CRIPP-VQA MAC Accuracy 50.21 # 4
Replace - PQ CRIPP-VQA MAC Accuracy 17.31 # 3
Remove - PO CRIPP-VQA MAC Accuracy 50.68 # 4
Remove - PQ CRIPP-VQA MAC Accuracy 16.41 # 4
Descriptive CRIPP-VQA MAC Accuracy 48.72 # 4

Methods