Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing

NeurIPS 2018 Chen LiangMohammad NorouziJonathan BerantQuoc LeNi Lao

We present Memory Augmented Policy Optimization (MAPO), a simple and novel way to leverage a memory buffer of promising trajectories to reduce the variance of policy gradient estimate. MAPO is applicable to deterministic environments with discrete actions, such as structured prediction and combinatorial optimization tasks... (read more)

PDF Abstract

Evaluation Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.