IPM Move Planner: AN EFFICIENT EXPLOITING DEEP REINFORCEMENT LEARNING WITH MONTE CARLO TREE SEARCH

CUHK Course IERG5350 2020 · Fan Bai, Fei Meng ·

The literature about scene rearrangement focus on developing a move planner totransform a pair of layouts on a limited plane. Poorly considered moving planscan cause blocking each other and waste of manpower. Currently, the planningperformance by using a Deep Q-network (DQN) based Monte Carlo Tree Search(MCTS) algorithm catches researchers’ attention due to its powerful sequencedecision-making ability. However, the deterministic value-based policy of thereinforcement learning (RL) agent cannot represent multiple optimal actions andDQN network increases computation cost significantly, especially when beingcombined with MCTS. To enhance the ability of MCTS to select the optimalaction sequence efficiently, we develop a novel IPM Move Planner1, which isthe MCTS embedded with Proximal Policy Optimization (PPO) supervised byimitation learning (IL). To be precise, we let PPO explore optimal action sequencethrough exploiting the good previous trajectories efficiently; then the policy andvalue neural network plays a crucial part in the expansion and simulation processes.The primary policy network with residual structure is pre-trained by imitatingthe teacher policy through supervised learning, and finally is put into the overallframework to learn an advanced policy, contributing to a complex policy only with alightweight network. Experiments on synthetic layouts demonstrate that IPM MovePlanner generates an outstanding moving performance with less transportationlength and higher success rate than the state of the art.

PDF Abstract