And second, in model-based RL where agents aim to learn about the environment they are operating in, the model-learning part can be naturally phrased as an inference problem over the process that governs environment dynamics.
GPflux is compatible with and built on top of the Keras deep learning eco-system.
This paves the way for new research directions, e. g. investigating uncertainty-aware environment models that are not necessarily neural-network-based, or developing algorithms to solve industrially-motivated benchmarks that share characteristics with real-world problems.
In this context, a convenient choice for approximate inference is variational inference (VI), where the problem of Bayesian inference is cast as an optimization problem -- namely, to maximize a lower bound of the log marginal likelihood.
While this has been initially proposed for Markov Decision Processes (MDPs) in tabular settings, it was recently shown that a similar principle leads to significant improvements over vanilla SQL in RL for high-dimensional domains with discrete actions and function approximators.
Ensembling NNs provides an easily implementable, scalable method for uncertainty quantification, however, it has been criticised for not being Bayesian.
Here we consider perception and action as two serial information channels with limited information-processing capacity.
Within the context of video games the notion of perfectly rational agents can be undesirable as it leads to uninteresting situations, where humans face tough adversarial decision makers.
Reinforcement learning is concerned with identifying reward-maximizing behaviour policies in environments that are initially unknown.
As limit cases, this generalized scheme includes standard value iteration with a known model, Bayesian MDP planning, and robust planning.
Bounded rational decision-makers transform sensory input into motor output under limited computational resources.