Efficient exploration is essential to reinforcement learning in huge state space.
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph which describes a set of subtasks and their dependencies that are unknown to the agent.
Each random draw from our generative model is a neural network that instantiates the dynamic function, hence multiple draws would approximate the posterior, and the variance in the future prediction based on this posterior is used as an intrinsic reward for exploration.
Configuration spaces for computer systems can be challenging for traditional and automatic tuning strategies.
Our learning algorithm, Adaptive Value-function Elimination (AVE), is inspired by the policy elimination algorithm proposed in (Jiang et al., 2017), known as OLIVE.
Recently, hybrid metaheuristics have become a trend in operations research.