Search Results for author: Tom Le Paine

Found 17 papers, 14 papers with code

Active Offline Policy Selection

1 code implementation NeurIPS 2021 Ksenia Konyushkova, Yutian Chen, Tom Le Paine, Caglar Gulcehre, Cosmin Paduraru, Daniel J Mankowitz, Misha Denil, Nando de Freitas

We use multiple benchmarks, including real-world robotics, with a large number of candidate policies to show that the proposed approach improves upon state-of-the-art OPE estimates and pure online policy evaluation.

On Instrumental Variable Regression for Deep Offline Policy Evaluation

1 code implementation21 May 2021 Yutian Chen, Liyuan Xu, Caglar Gulcehre, Tom Le Paine, Arthur Gretton, Nando de Freitas, Arnaud Doucet

By applying different IV techniques to OPE, we are not only able to recover previously proposed OPE methods such as model-based techniques but also to obtain competitive new techniques.

Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization

no code implementations ICLR 2021 Michael R. Zhang, Tom Le Paine, Ofir Nachum, Cosmin Paduraru, George Tucker, Ziyu Wang, Mohammad Norouzi

This modeling choice assumes that different dimensions of the next state and reward are conditionally independent given the current state and action and may be driven by the fact that fully observable physics-based simulation environments entail deterministic transition dynamics.

Continuous Control Data Augmentation

Benchmarks for Deep Off-Policy Evaluation

3 code implementations ICLR 2021 Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang, Alexander Novikov, Mengjiao Yang, Michael R. Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, Tom Le Paine

Off-policy evaluation (OPE) holds the promise of being able to leverage large, offline datasets for both evaluating and selecting complex policies for decision making.

Continuous Control Decision Making +1

Hyperparameter Selection for Offline Reinforcement Learning

no code implementations17 Jul 2020 Tom Le Paine, Cosmin Paduraru, Andrea Michi, Caglar Gulcehre, Konrad Zolna, Alexander Novikov, Ziyu Wang, Nando de Freitas

Therefore, in this work, we focus on \textit{offline hyperparameter selection}, i. e. methods for choosing the best policy from a set of many policies trained using different hyperparameters, given only logged data.

Offline RL reinforcement-learning

RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning

2 code implementations24 Jun 2020 Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Tom Le Paine, Sergio Gomez Colmenarejo, Konrad Zolna, Rishabh Agarwal, Josh Merel, Daniel Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matt Hoffman, Ofir Nachum, George Tucker, Nicolas Heess, Nando de Freitas

We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community.

Atari Games DQN Replay Dataset +2

Acme: A Research Framework for Distributed Reinforcement Learning

2 code implementations1 Jun 2020 Matt Hoffman, Bobak Shahriari, John Aslanides, Gabriel Barth-Maron, Feryal Behbahani, Tamara Norman, Abbas Abdolmaleki, Albin Cassirer, Fan Yang, Kate Baumli, Sarah Henderson, Alex Novikov, Sergio Gómez Colmenarejo, Serkan Cabi, Caglar Gulcehre, Tom Le Paine, Andrew Cowie, Ziyu Wang, Bilal Piot, Nando de Freitas

Ultimately, we show that the design decisions behind Acme lead to agents that can be scaled both up and down and that, for the most part, greater levels of parallelization result in agents with equivalent performance, just faster.

DQN Replay Dataset reinforcement-learning

Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

1 code implementation ICLR 2020 Tom Le Paine, Caglar Gulcehre, Bobak Shahriari, Misha Denil, Matt Hoffman, Hubert Soyer, Richard Tanburn, Steven Kapturowski, Neil Rabinowitz, Duncan Williams, Gabriel Barth-Maron, Ziyu Wang, Nando de Freitas, Worlds Team

This paper introduces R2D3, an agent that makes efficient use of demonstrations to solve hard exploration problems in partially observable environments with highly variable initial conditions.

One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

no code implementations ICLR 2019 Tom Le Paine, Sergio Gómez Colmenarejo, Ziyu Wang, Scott Reed, Yusuf Aytar, Tobias Pfaff, Matt W. Hoffman, Gabriel Barth-Maron, Serkan Cabi, David Budden, Nando de Freitas

MetaMimic can learn both (i) policies for high-fidelity one-shot imitation of diverse novel skills, and (ii) policies that enable the agent to solve tasks more efficiently than the demonstrators.

Playing hard exploration games by watching YouTube

1 code implementation NeurIPS 2018 Yusuf Aytar, Tobias Pfaff, David Budden, Tom Le Paine, Ziyu Wang, Nando de Freitas

One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator.

Montezuma's Revenge

Fast Wavenet Generation Algorithm

6 code implementations29 Nov 2016 Tom Le Paine, Pooya Khorrami, Shiyu Chang, Yang Zhang, Prajit Ramachandran, Mark A. Hasegawa-Johnson, Thomas S. Huang

This paper presents an efficient implementation of the Wavenet generation process called Fast Wavenet.

Seq-NMS for Video Object Detection

1 code implementation26 Feb 2016 Wei Han, Pooya Khorrami, Tom Le Paine, Prajit Ramachandran, Mohammad Babaeizadeh, Honghui Shi, Jianan Li, Shuicheng Yan, Thomas S. Huang

Video object detection is challenging because objects that are easily detected in one frame may be difficult to detect in another frame within the same clip.

Frame General Classification +3

How Deep Neural Networks Can Improve Emotion Recognition on Video Data

1 code implementation24 Feb 2016 Pooya Khorrami, Tom Le Paine, Kevin Brady, Charlie Dagli, Thomas S. Huang

In this work, we present a system that performs emotion recognition on video data using both CNNs and RNNs, and we also analyze how much each neural network component contributes to the system's overall performance.

Emotion Recognition

Do Deep Neural Networks Learn Facial Action Units When Doing Expression Recognition?

1 code implementation10 Oct 2015 Pooya Khorrami, Tom Le Paine, Thomas S. Huang

Despite being the appearance-based classifier of choice in recent years, relatively few works have examined how much convolutional neural networks (CNNs) can improve performance on accepted expression recognition benchmarks and, more importantly, examine what it is they actually learn.

An Analysis of Unsupervised Pre-training in Light of Recent Advances

2 code implementations20 Dec 2014 Tom Le Paine, Pooya Khorrami, Wei Han, Thomas S. Huang

We discover unsupervised pre-training, as expected, helps when the ratio of unsupervised to supervised samples is high, and surprisingly, hurts when the ratio is low.

Data Augmentation Image Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.