Two-stage Rule-induction Visual Reasoning on RPMs with an Application to Video Prediction

24 Nov 2021  ·  Wentao He, Jianfeng Ren, Ruibin Bai, Xudong Jiang ·

Raven's Progressive Matrices (RPMs) are frequently used in evaluating human's visual reasoning ability. Researchers have made considerable efforts in developing systems to automatically solve the RPM problem, often through a black-box end-to-end convolutional neural network for both visual recognition and logical reasoning tasks. Based on the two intrinsic natures of RPM problem, visual recognition and logical reasoning, we propose a Two-stage Rule-Induction Visual Reasoner (TRIVR), which consists of a perception module and a reasoning module, to tackle the challenges of real-world visual recognition and subsequent logical reasoning tasks, respectively. For the reasoning module, we further propose a "2+1" formulation that models human's thinking in solving RPMs and significantly reduces the model complexity. It derives a reasoning rule from each RPM sample, which is not feasible for existing methods. As a result, the proposed reasoning module is capable of yielding a set of reasoning rules modeling human in solving the RPM problems. To validate the proposed method on real-world applications, an RPM-like Video Prediction (RVP) dataset is constructed, where visual reasoning is conducted on RPMs constructed using real-world video frames. Experimental results on various RPM-like datasets demonstrate that the proposed TRIVR achieves a significant and consistent performance gain compared with the state-of-the-art models.

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here