Many aspects of human reasoning, including language, require learning rules from very little data.
A key challenge of existing program synthesizers is ensuring that the synthesized program generalizes well.
To stabilize this method, we adapt to contextual generation of categorical sequences a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control.
Neural models optimized for tree-based problems are of great value in tasks like SQL query extraction and program synthesis.
Program synthesis of general-purpose source code from natural language specifications is challenging due to the need to reason about high-level patterns in the target program and low-level implementation details at the same time.
By their nature, the composition of black box models is opaque.
We present MAKESPEARE, a simple delayed-acceptance hillclimbing method that synthesizes low-level looping programs from input/output examples.
We consider the problem of generating automatic code given sample input-output pairs.
Synthesizing programs using example input/outputs is a classic problem in artificial intelligence.
We present Memory Augmented Policy Optimization (MAPO), a simple and novel way to leverage a memory buffer of promising trajectories to reduce the variance of policy gradient estimate.