Search Results for author: Nando de Freitas

Found 69 papers, 28 papers with code

Active Offline Policy Selection

1 code implementation NeurIPS 2021 Ksenia Konyushkova, Yutian Chen, Tom Le Paine, Caglar Gulcehre, Cosmin Paduraru, Daniel J Mankowitz, Misha Denil, Nando de Freitas

To overcome this problem, we introduce \emph{active offline policy selection} - a novel sequential decision approach that combines logged data with online interaction to identify the best policy.

On Instrumental Variable Regression for Deep Offline Policy Evaluation

1 code implementation21 May 2021 Yutian Chen, Liyuan Xu, Caglar Gulcehre, Tom Le Paine, Arthur Gretton, Nando de Freitas, Arnaud Doucet

By applying different IV techniques to OPE, we are not only able to recover previously proposed OPE methods such as model-based techniques but also to obtain competitive new techniques.

Regularized Behavior Value Estimation

no code implementations17 Mar 2021 Caglar Gulcehre, Sergio Gómez Colmenarejo, Ziyu Wang, Jakub Sygnowski, Thomas Paine, Konrad Zolna, Yutian Chen, Matthew Hoffman, Razvan Pascanu, Nando de Freitas

Due to bootstrapping, these errors get amplified during training and can lead to divergence, thereby crippling learning.

Offline RL

Addressing Extrapolation Error in Deep Offline Reinforcement Learning

no code implementations1 Jan 2021 Caglar Gulcehre, Sergio Gómez Colmenarejo, Ziyu Wang, Jakub Sygnowski, Thomas Paine, Konrad Zolna, Yutian Chen, Matthew Hoffman, Razvan Pascanu, Nando de Freitas

These errors can be compounded by bootstrapping when the function approximator overestimates, leading the value function to *grow unbounded*, thereby crippling learning.

Offline RL

RL Unplugged: A Collection of Benchmarks for Offline Reinforcement Learning

1 code implementation NeurIPS 2020 Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Thomas Paine, Sergio Gómez, Konrad Zolna, Rishabh Agarwal, Josh S. Merel, Daniel J. Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matthew Hoffman, Nicolas Heess, Nando de Freitas

We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community.

Offline RL

Offline Learning from Demonstrations and Unlabeled Experience

no code implementations27 Nov 2020 Konrad Zolna, Alexander Novikov, Ksenia Konyushkova, Caglar Gulcehre, Ziyu Wang, Yusuf Aytar, Misha Denil, Nando de Freitas, Scott Reed

Behavior cloning (BC) is often practical for robot learning because it allows a policy to be trained offline without rewards, by supervised learning on expert demonstrations.

Continuous Control Imitation Learning

Large-scale multilingual audio visual dubbing

no code implementations6 Nov 2020 Yi Yang, Brendan Shillingford, Yannis Assael, Miaosen Wang, Wendi Liu, Yutian Chen, Yu Zhang, Eren Sezener, Luis C. Cobo, Misha Denil, Yusuf Aytar, Nando de Freitas

The visual content is translated by synthesizing lip movements for the speaker to match the translated audio, creating a seamless audiovisual experience in the target language.

Translation

Learning Deep Features in Instrumental Variable Regression

1 code implementation ICLR 2021 Liyuan Xu, Yutian Chen, Siddarth Srinivasan, Nando de Freitas, Arnaud Doucet, Arthur Gretton

We propose a novel method, deep feature instrumental variable regression (DFIV), to address the case where relations between instruments, treatments, and outcomes may be nonlinear.

Learning Compositional Neural Programs for Continuous Control

no code implementations27 Jul 2020 Thomas Pierrot, Nicolas Perrin, Feryal Behbahani, Alexandre Laterre, Olivier Sigaud, Karim Beguir, Nando de Freitas

Third, the self-models are harnessed to learn recursive compositional programs with multiple levels of abstraction.

Continuous Control

Hyperparameter Selection for Offline Reinforcement Learning

no code implementations17 Jul 2020 Tom Le Paine, Cosmin Paduraru, Andrea Michi, Caglar Gulcehre, Konrad Zolna, Alexander Novikov, Ziyu Wang, Nando de Freitas

Therefore, in this work, we focus on \textit{offline hyperparameter selection}, i. e. methods for choosing the best policy from a set of many policies trained using different hyperparameters, given only logged data.

Offline RL

Critic Regularized Regression

3 code implementations NeurIPS 2020 Ziyu Wang, Alexander Novikov, Konrad Zolna, Jost Tobias Springenberg, Scott Reed, Bobak Shahriari, Noah Siegel, Josh Merel, Caglar Gulcehre, Nicolas Heess, Nando de Freitas

Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction.

Offline RL

RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning

2 code implementations24 Jun 2020 Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Tom Le Paine, Sergio Gomez Colmenarejo, Konrad Zolna, Rishabh Agarwal, Josh Merel, Daniel Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matt Hoffman, Ofir Nachum, George Tucker, Nicolas Heess, Nando de Freitas

We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community.

Atari Games DQN Replay Dataset +1

Acme: A Research Framework for Distributed Reinforcement Learning

2 code implementations1 Jun 2020 Matt Hoffman, Bobak Shahriari, John Aslanides, Gabriel Barth-Maron, Feryal Behbahani, Tamara Norman, Abbas Abdolmaleki, Albin Cassirer, Fan Yang, Kate Baumli, Sarah Henderson, Alex Novikov, Sergio Gómez Colmenarejo, Serkan Cabi, Caglar Gulcehre, Tom Le Paine, Andrew Cowie, Ziyu Wang, Bilal Piot, Nando de Freitas

Ultimately, we show that the design decisions behind Acme lead to agents that can be scaled both up and down and that, for the most part, greater levels of parallelization result in agents with equivalent performance, just faster.

DQN Replay Dataset

Task-Relevant Adversarial Imitation Learning

no code implementations2 Oct 2019 Konrad Zolna, Scott Reed, Alexander Novikov, Sergio Gomez Colmenarejo, David Budden, Serkan Cabi, Misha Denil, Nando de Freitas, Ziyu Wang

We show that a critical vulnerability in adversarial imitation is the tendency of discriminator networks to learn spurious associations between visual features and expert labels.

Imitation Learning

Scaling data-driven robotics with reward sketching and batch reinforcement learning

1 code implementation26 Sep 2019 Serkan Cabi, Sergio Gómez Colmenarejo, Alexander Novikov, Ksenia Konyushkova, Scott Reed, Rae Jeong, Konrad Zolna, Yusuf Aytar, David Budden, Mel Vecerik, Oleg Sushkov, David Barker, Jonathan Scholz, Misha Denil, Nando de Freitas, Ziyu Wang

We present a framework for data-driven robotics that makes use of a large dataset of recorded robot experience and scales to several tasks using learned reward functions.

Modular Meta-Learning with Shrinkage

no code implementations NeurIPS 2020 Yutian Chen, Abram L. Friesen, Feryal Behbahani, Arnaud Doucet, David Budden, Matthew W. Hoffman, Nando de Freitas

Many real-world problems, including multi-speaker text-to-speech synthesis, can greatly benefit from the ability to meta-learn large models with only a few task-specific components.

Image Classification Meta-Learning +2

Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

1 code implementation ICLR 2020 Tom Le Paine, Caglar Gulcehre, Bobak Shahriari, Misha Denil, Matt Hoffman, Hubert Soyer, Richard Tanburn, Steven Kapturowski, Neil Rabinowitz, Duncan Williams, Gabriel Barth-Maron, Ziyu Wang, Nando de Freitas, Worlds Team

This paper introduces R2D3, an agent that makes efficient use of demonstrations to solve hard exploration problems in partially observable environments with highly variable initial conditions.

Intrinsic Social Motivation via Causal Influence in Multi-Agent RL

no code implementations ICLR 2019 Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas

Therefore, we also employ influence to train agents to use an explicit communication channel, and find that it leads to more effective communication and higher collective reward.

Multi-agent Reinforcement Learning

Bayesian Optimization in AlphaGo

no code implementations17 Dec 2018 Yutian Chen, Aja Huang, Ziyu Wang, Ioannis Antonoglou, Julian Schrittwieser, David Silver, Nando de Freitas

During the development of AlphaGo, its many hyper-parameters were tuned with Bayesian optimization multiple times.

Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning

3 code implementations ICLR 2019 Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas

We propose a unified mechanism for achieving coordination and communication in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for having causal influence over other agents' actions.

Multi-agent Reinforcement Learning

One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

no code implementations ICLR 2019 Tom Le Paine, Sergio Gómez Colmenarejo, Ziyu Wang, Scott Reed, Yusuf Aytar, Tobias Pfaff, Matt W. Hoffman, Gabriel Barth-Maron, Serkan Cabi, David Budden, Nando de Freitas

MetaMimic can learn both (i) policies for high-fidelity one-shot imitation of diverse novel skills, and (ii) policies that enable the agent to solve tasks more efficiently than the demonstrators.

Large-Scale Visual Speech Recognition

no code implementations ICLR 2019 Brendan Shillingford, Yannis Assael, Matthew W. Hoffman, Thomas Paine, Cían Hughes, Utsav Prabhu, Hank Liao, Hasim Sak, Kanishka Rao, Lorrayne Bennett, Marie Mulville, Ben Coppin, Ben Laurie, Andrew Senior, Nando de Freitas

To achieve this, we constructed the largest existing visual speech recognition dataset, consisting of pairs of text and video clips of faces speaking (3, 886 hours of video).

Ranked #6 on Lipreading on LRS3-TED (using extra training data)

Lipreading Visual Speech Recognition

Playing hard exploration games by watching YouTube

1 code implementation NeurIPS 2018 Yusuf Aytar, Tobias Pfaff, David Budden, Tom Le Paine, Ziyu Wang, Nando de Freitas

One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator.

Montezuma's Revenge

Hyperbolic Attention Networks

no code implementations ICLR 2019 Caglar Gulcehre, Misha Denil, Mateusz Malinowski, Ali Razavi, Razvan Pascanu, Karl Moritz Hermann, Peter Battaglia, Victor Bapst, David Raposo, Adam Santoro, Nando de Freitas

We introduce hyperbolic attention networks to endow neural networks with enough capacity to match the complexity of data with hierarchical and power-law structure.

Machine Translation Question Answering +2

Learning Awareness Models

no code implementations ICLR 2018 Brandon Amos, Laurent Dinh, Serkan Cabi, Thomas Rothörl, Sergio Gómez Colmenarejo, Alistair Muldal, Tom Erez, Yuval Tassa, Nando de Freitas, Misha Denil

We show that models trained to predict proprioceptive information about the agent's body come to represent objects in the external world.

Compositional Obverter Communication Learning From Raw Visual Input

2 code implementations ICLR 2018 Edward Choi, Angeliki Lazaridou, Nando de Freitas

Previously, it has been shown that neural network agents can learn to communicate in a highly structured, possibly compositional language based on disentangled input (e. g. hand- engineered features).

Reinforcement and Imitation Learning for Diverse Visuomotor Skills

1 code implementation ICLR 2018 Yuke Zhu, Ziyu Wang, Josh Merel, Andrei Rusu, Tom Erez, Serkan Cabi, Saran Tunyasuvunakool, János Kramár, Raia Hadsell, Nando de Freitas, Nicolas Heess

We propose a model-free deep reinforcement learning method that leverages a small amount of demonstration data to assist a reinforcement learning agent.

Imitation Learning

Robust Imitation of Diverse Behaviors

no code implementations NeurIPS 2017 Ziyu Wang, Josh Merel, Scott Reed, Greg Wayne, Nando de Freitas, Nicolas Heess

Compared to purely supervised methods, Generative Adversarial Imitation Learning (GAIL) can learn more robust controllers from fewer demonstrations, but is inherently mode-seeking and more difficult to train.

Imitation Learning

Programmable Agents

no code implementations20 Jun 2017 Misha Denil, Sergio Gómez Colmenarejo, Serkan Cabi, David Saxton, Nando de Freitas

We build deep RL agents that execute declarative programs expressed in formal language.

Learned Optimizers that Scale and Generalize

1 code implementation ICML 2017 Olga Wichrowska, Niru Maheswaranathan, Matthew W. Hoffman, Sergio Gomez Colmenarejo, Misha Denil, Nando de Freitas, Jascha Sohl-Dickstein

Two of the primary barriers to its adoption are an inability to scale to larger problems and a limited ability to generalize to new tasks.

Parallel Multiscale Autoregressive Density Estimation

no code implementations ICML 2017 Scott Reed, Aäron van den Oord, Nal Kalchbrenner, Sergio Gómez Colmenarejo, Ziyu Wang, Dan Belov, Nando de Freitas

Our new PixelCNN model achieves competitive density estimation and orders of magnitude speedup - O(log N) sampling instead of O(N) - enabling the practical generation of 512x512 images.

Conditional Image Generation Density Estimation +2

Learning to Perform Physics Experiments via Deep Reinforcement Learning

no code implementations6 Nov 2016 Misha Denil, Pulkit Agrawal, Tejas D. Kulkarni, Tom Erez, Peter Battaglia, Nando de Freitas

When encountering novel objects, humans are able to infer a wide range of physical properties such as mass, friction and deformability by interacting with them in a goal driven way.

Sample Efficient Actor-Critic with Experience Replay

8 code implementations3 Nov 2016 Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu, Nando de Freitas

This paper presents an actor-critic deep reinforcement learning agent with experience replay that is stable, sample efficient, and performs remarkably well on challenging environments, including the discrete 57-game Atari domain and several continuous control problems.

Continuous Control

Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks

no code implementations8 Feb 2016 Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson

We propose deep distributed recurrent Q-networks (DDRQN), which enable teams of agents to learn to solve communication-based coordination tasks.

Dueling Network Architectures for Deep Reinforcement Learning

65 code implementations20 Nov 2015 Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas

In recent years there have been many successes of using deep representations in reinforcement learning.

Atari Games

Neural Programmer-Interpreters

2 code implementations19 Nov 2015 Scott Reed, Nando de Freitas

We propose the neural programmer-interpreter (NPI): a recurrent and compositional neural network that learns to represent and execute programs.

ACDC: A Structured Efficient Linear Layer

2 code implementations18 Nov 2015 Marcin Moczulski, Misha Denil, Jeremy Appleyard, Nando de Freitas

Finally, this paper also provides a connection between structured linear transforms used in deep learning and the field of Fourier optics, illustrating how ACDC could in principle be implemented with lenses and diffractive elements.

Unbounded Bayesian Optimization via Regularization

no code implementations14 Aug 2015 Bobak Shahriari, Alexandre Bouchard-Côté, Nando de Freitas

Bayesian optimization has recently emerged as a popular and efficient tool for global optimization and hyperparameter tuning.

Deep Fried Convnets

1 code implementation ICCV 2015 Zichao Yang, Marcin Moczulski, Misha Denil, Nando de Freitas, Alex Smola, Le Song, Ziyu Wang

The fully connected layers of a deep convolutional neural network typically contain over 90% of the network parameters, and consume the majority of the memory required to store the network parameters.

Image Classification

Extraction of Salient Sentences from Labelled Documents

2 code implementations21 Dec 2014 Misha Denil, Alban Demiraj, Nando de Freitas

We present a hierarchical convolutional document model with an architecture designed to support introspection of the document structure.

Deep Multi-Instance Transfer Learning

no code implementations12 Nov 2014 Dimitrios Kotzias, Misha Denil, Phil Blunsom, Nando de Freitas

We present a new approach for transferring knowledge from groups to individuals that comprise them.

Transfer Learning

Heteroscedastic Treed Bayesian Optimisation

no code implementations27 Oct 2014 John-Alexander M. Assael, Ziyu Wang, Bobak Shahriari, Nando de Freitas

At the core of this approach is a Gaussian process prior that captures our belief about the distribution over functions.

Bayesian Optimisation

Theoretical Analysis of Bayesian Optimisation with Unknown Gaussian Process Hyper-Parameters

1 code implementation30 Jun 2014 Ziyu Wang, Nando de Freitas

Bayesian optimisation has gained great popularity as a tool for optimising the parameters of machine learning algorithms and models.

Bayesian Optimisation Gaussian Processes

An Entropy Search Portfolio for Bayesian Optimization

no code implementations18 Jun 2014 Bobak Shahriari, Ziyu Wang, Matthew W. Hoffman, Alexandre Bouchard-Côté, Nando de Freitas

How- ever, the performance of a Bayesian optimization method very much depends on its exploration strategy, i. e. the choice of acquisition function, and it is not clear a priori which choice will result in superior performance.

Modelling, Visualising and Summarising Documents with a Single Convolutional Neural Network

no code implementations15 Jun 2014 Misha Denil, Alban Demiraj, Nal Kalchbrenner, Phil Blunsom, Nando de Freitas

Capturing the compositional process which maps the meaning of words to that of documents is a central challenge for researchers in Natural Language Processing and Information Retrieval.

Feature Engineering Information Retrieval

Distributed Parameter Estimation in Probabilistic Graphical Models

no code implementations NeurIPS 2014 Yariv Dror Mizrahi, Misha Denil, Nando de Freitas

This paper presents foundational theoretical results on distributed parameter estimation for undirected probabilistic graphical models.

A Deep Architecture for Semantic Parsing

no code implementations WS 2014 Edward Grefenstette, Phil Blunsom, Nando de Freitas, Karl Moritz Hermann

Many successful approaches to semantic parsing build on top of the syntactic analysis of text, and make use of distributional representations or statistical models to match parses to ontology-specific queries.

Semantic Parsing

Bayesian Multi-Scale Optimistic Optimization

no code implementations27 Feb 2014 Ziyu Wang, Babak Shakibi, Lin Jin, Nando de Freitas

In this paper, we introduce a new technique for efficient global optimization that combines Gaussian process confidence bounds and treed simultaneous optimistic optimization to eliminate the need for auxiliary optimization of acquisition functions.

Gaussian Processes

Narrowing the Gap: Random Forests In Theory and In Practice

no code implementations4 Oct 2013 Misha Denil, David Matheson, Nando de Freitas

Despite widespread interest and practical use, the theoretical properties of random forests are still not well understood.

Linear and Parallel Learning of Markov Random Fields

no code implementations29 Aug 2013 Yariv Dror Mizrahi, Misha Denil, Nando de Freitas

We introduce a new embarrassingly parallel parameter learning algorithm for Markov random fields with untied parameters which is efficient for a large class of practical models.

Predicting Parameters in Deep Learning

no code implementations NeurIPS 2013 Misha Denil, Babak Shakibi, Laurent Dinh, Marc'Aurelio Ranzato, Nando de Freitas

We demonstrate that there is significant redundancy in the parameterization of several deep learning models.

Exploiting correlation and budget constraints in Bayesian multi-armed bandit optimization

no code implementations27 Mar 2013 Matthew W. Hoffman, Bobak Shahriari, Nando de Freitas

This problem is also known as fixed-budget best arm identification in the multi-armed bandit literature.

Consistency of Online Random Forests

1 code implementation20 Feb 2013 Misha Denil, David Matheson, Nando de Freitas

As a testament to their success, the theory of random forests has long been outpaced by their application in practice.

Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (2012)

no code implementations19 Jan 2013 Nando de Freitas, Kevin Murphy

This is the Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, which was held on Catalina Island, CA August 14-18 2012.

Variational MCMC

no code implementations10 Jan 2013 Nando de Freitas, Pedro Hojen-Sorensen, Michael. I. Jordan, Stuart Russell

One of these algorithms is a mixture of two MCMC kernels: a random walk Metropolis kernel and a blockMetropolis-Hastings (MH) kernel with a variational approximation as proposaldistribution.

Bayesian Optimization in a Billion Dimensions via Random Embeddings

1 code implementation9 Jan 2013 Ziyu Wang, Frank Hutter, Masrour Zoghi, David Matheson, Nando de Freitas

Bayesian optimization techniques have been successfully applied to robotics, planning, sensor placement, recommendation, advertising, intelligent user interfaces and automatic algorithm configuration.

Cannot find the paper you are looking for? You can Submit a new open access paper.