We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss.
Ranked #8 on Atari Games on atari game
no code implementations • 7 Feb 2020 • Danilo J. Rezende, Ivo Danihelka, George Papamakarios, Nan Rosemary Ke, Ray Jiang, Theophane Weber, Karol Gregor, Hamza Merzic, Fabio Viola, Jane Wang, Jovana Mitrovic, Frederic Besse, Ioannis Antonoglou, Lars Buesing
In reinforcement learning, we can learn a model of future observations and rewards, and use it to plan the agent's next actions.
13 code implementations • 26 Aug 2019 • Marc Lanctot, Edward Lockhart, Jean-Baptiste Lespiau, Vinicius Zambaldi, Satyaki Upadhyay, Julien Pérolat, Sriram Srinivasan, Finbarr Timbers, Karl Tuyls, Shayegan Omidshafiei, Daniel Hennes, Dustin Morrill, Paul Muller, Timo Ewalds, Ryan Faulkner, János Kramár, Bart De Vylder, Brennan Saeta, James Bradbury, David Ding, Sebastian Borgeaud, Matthew Lai, Julian Schrittwieser, Thomas Anthony, Edward Hughes, Ivo Danihelka, Jonah Ryan-Davis
OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
We show that the Cram\'er distance possesses all three desired properties, combining the best of the Wasserstein and Kullback-Leibler divergences.
We train a generator by maximum likelihood and we also train the same generator architecture by Wasserstein GAN.
SAM learns with comparable data efficiency to existing models on a range of synthetic tasks and one-shot Omniglot character recognition, and can scale to tasks requiring $100,\! 000$s of time steps and memories.
Ranked #6 on Question Answering on bAbi (Mean Error Rate metric)
The VPN approaches the best possible performance on the Moving MNIST benchmark, a leap over the previous state of the art, and the generated videos show only minor deviations from the ground truth.
Ranked #1 on Video Prediction on KTH (Cond metric)
We propose a novel approach to reduce memory consumption of the backpropagation through time (BPTT) algorithm when training recurrent neural networks (RNNs).
We introduce a simple recurrent variational auto-encoder architecture that significantly improves image modeling.
Ranked #55 on Image Generation on CIFAR-10 (bits/dimension metric)
In particular, humans have an ability for one-shot generalization: an ability to encounter a new concept, understand its structure, and then be able to generate compelling alternative variations of the concept.
This paper introduces Grid Long Short-Term Memory, a network of LSTM cells arranged in a multidimensional grid that can be applied to vectors, sequences or higher dimensional data such as images.
This paper introduces the Deep Recurrent Attentive Writer (DRAW) neural network architecture for image generation.
Ranked #61 on Image Generation on CIFAR-10 (bits/dimension metric)