Search Results for author: Theophane Weber

Found 26 papers, 9 papers with code

Learning to Induce Causal Structure

no code implementations11 Apr 2022 Nan Rosemary Ke, Silvia Chiappa, Jane Wang, Anirudh Goyal, Jorg Bornschein, Melanie Rey, Theophane Weber, Matthew Botvinic, Michael Mozer, Danilo Jimenez Rezende

The fundamental challenge in causal induction is to infer the underlying graph structure given observational and/or interventional data.

Synthetic Returns for Long-Term Credit Assignment

2 code implementations24 Feb 2021 David Raposo, Sam Ritter, Adam Santoro, Greg Wayne, Theophane Weber, Matt Botvinick, Hado van Hasselt, Francis Song

We propose state-associative (SA) learning, where the agent learns associations between states and arbitrarily distant future rewards, then propagates credit directly between the two.

Beyond Tabula-Rasa: a Modular Reinforcement Learning Approach for Physically Embedded 3D Sokoban

no code implementations3 Oct 2020 Peter Karkus, Mehdi Mirza, Arthur Guez, Andrew Jaegle, Timothy Lillicrap, Lars Buesing, Nicolas Heess, Theophane Weber

We explore whether integrated tasks like Mujoban can be solved by composing RL modules together in a sense-plan-act hierarchy, where modules have well-defined roles similarly to classic robot architectures.

reinforcement-learning Reinforcement Learning (RL)

Approximate Inference in Discrete Distributions with Monte Carlo Tree Search and Value Functions

no code implementations15 Oct 2019 Lars Buesing, Nicolas Heess, Theophane Weber

A plethora of problems in AI, engineering and the sciences are naturally formalized as inference in discrete probabilistic models.

Decision Making Decision Making Under Uncertainty

Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search

no code implementations ICLR 2019 Lars Buesing, Theophane Weber, Yori Zwols, Sebastien Racaniere, Arthur Guez, Jean-Baptiste Lespiau, Nicolas Heess

In contrast to off-policy algorithms based on Importance Sampling which re-weight data, CF-GPS leverages a model to explicitly consider alternative outcomes, allowing the algorithm to make better use of experience data.

Temporal Difference Variational Auto-Encoder

1 code implementation ICLR 2019 Karol Gregor, George Papamakarios, Frederic Besse, Lars Buesing, Theophane Weber

To act and plan in complex environments, we posit that agents should have a mental simulator of the world with three characteristics: (a) it should build an abstract state representing the condition of the world; (b) it should form a belief which represents uncertainty on the world; (c) it should go beyond simple step-by-step simulation, and exhibit temporal abstraction.

reinforcement-learning Reinforcement Learning (RL)

Visual Interaction Networks: Learning a Physics Simulator from Video

no code implementations NeurIPS 2017 Nicholas Watters, Daniel Zoran, Theophane Weber, Peter Battaglia, Razvan Pascanu, Andrea Tacchetti

We introduce the Visual Interaction Network, a general-purpose model for learning the dynamics of a physical system from raw visual observations.

Decision Making

Visual Interaction Networks

3 code implementations5 Jun 2017 Nicholas Watters, Andrea Tacchetti, Theophane Weber, Razvan Pascanu, Peter Battaglia, Daniel Zoran

We found that from just six input video frames the Visual Interaction Network can generate accurate future trajectories of hundreds of time steps on a wide range of physical systems.

Decision Making

Deep Reinforcement Learning in Large Discrete Action Spaces

2 code implementations24 Dec 2015 Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, Theophane Weber, Thomas Degris, Ben Coppin

Being able to reason in an environment with a large number of discrete actions is essential to bringing reinforcement learning to a larger class of problems.

Recommendation Systems reinforcement-learning +1

Gradient Estimation Using Stochastic Computation Graphs

1 code implementation NeurIPS 2015 John Schulman, Nicolas Heess, Theophane Weber, Pieter Abbeel

In a variety of problems originating in supervised, unsupervised, and reinforcement learning, the loss function is defined by an expectation over a collection of random variables, which might be part of a probabilistic model or the external world.

Automated Variational Inference in Probabilistic Programming

no code implementations7 Jan 2013 David Wingate, Theophane Weber

We present a new algorithm for approximate inference in probabilistic programs, based on a stochastic gradient for variational programs.

Probabilistic Programming Variational Inference

Cannot find the paper you are looking for? You can Submit a new open access paper.