no code implementations • 5 Oct 2023 • Yilue Qian, Peiyu Yu, Ying Nian Wu, Yao Su, Wei Wang, Lifeng Fan
In this paper, we propose an interpretable and generalizable visual planning framework consisting of i) a novel Substitution-based Concept Learner (SCL) that abstracts visual inputs into disentangled concept representations, ii) symbol abstraction and reasoning that performs task planning via the self-learned symbols, and iii) a Visual Causal Transition model (ViCT) that grounds visual causal transitions to semantically similar real-world actions.