Search Results for author: Daniel J. Mankowitz

Found 23 papers, 5 papers with code

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

1 code implementation ICLR 2022 Jongmin Lee, Cosmin Paduraru, Daniel J. Mankowitz, Nicolas Heess, Doina Precup, Kee-Eung Kim, Arthur Guez

We consider the offline constrained reinforcement learning (RL) problem, in which the agent aims to compute a policy that maximizes expected return while satisfying given cost constraints, learning only from a pre-collected dataset.

Offline RL reinforcement-learning

MuZero with Self-competition for Rate Control in VP9 Video Compression

no code implementations14 Feb 2022 Amol Mandhane, Anton Zhernov, Maribeth Rauh, Chenjie Gu, Miaosen Wang, Flora Xue, Wendy Shang, Derek Pang, Rene Claus, Ching-Han Chiang, Cheng Chen, Jingning Han, Angie Chen, Daniel J. Mankowitz, Jackson Broshear, Julian Schrittwieser, Thomas Hubert, Oriol Vinyals, Timothy Mann

Specifically, we target the problem of learning a rate control policy to select the quantization parameters (QP) in the encoding process of libvpx, an open source VP9 video compression library widely used by popular video-on-demand (VOD) services.

Decision Making Quantization +1

RL Unplugged: A Collection of Benchmarks for Offline Reinforcement Learning

1 code implementation NeurIPS 2020 Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Thomas Paine, Sergio Gómez, Konrad Zolna, Rishabh Agarwal, Josh S. Merel, Daniel J. Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matthew Hoffman, Nicolas Heess, Nando de Freitas

We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community.

Offline RL reinforcement-learning

Balancing Constraints and Rewards with Meta-Gradient D4PG

no code implementations ICLR 2021 Dan A. Calian, Daniel J. Mankowitz, Tom Zahavy, Zhongwen Xu, Junhyuk Oh, Nir Levine, Timothy Mann

Deploying Reinforcement Learning (RL) agents to solve real-world applications often requires satisfying complex system constraints.

reinforcement-learning

An empirical investigation of the challenges of real-world reinforcement learning

1 code implementation24 Mar 2020 Gabriel Dulac-Arnold, Nir Levine, Daniel J. Mankowitz, Jerry Li, Cosmin Paduraru, Sven Gowal, Todd Hester

We believe that an approach that addresses our set of proposed challenges would be readily deployable in a large number of real world problems.

Continuous Control reinforcement-learning

Robust Reinforcement Learning for Continuous Control with Model Misspecification

no code implementations ICLR 2020 Daniel J. Mankowitz, Nir Levine, Rae Jeong, Yuanyuan Shi, Jackie Kay, Abbas Abdolmaleki, Jost Tobias Springenberg, Timothy Mann, Todd Hester, Martin Riedmiller

We provide a framework for incorporating robustness -- to perturbations in the transition dynamics which we refer to as model misspecification -- into continuous control Reinforcement Learning (RL) algorithms.

Continuous Control reinforcement-learning

Action Assembly: Sparse Imitation Learning for Text Based Games with Combinatorial Action Spaces

no code implementations23 May 2019 Chen Tessler, Tom Zahavy, Deborah Cohen, Daniel J. Mankowitz, Shie Mannor

We propose a computationally efficient algorithm that combines compressed sensing with imitation learning to solve text-based games with combinatorial action spaces.

Decision Making Imitation Learning +2

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

no code implementations NeurIPS 2018 Tom Zahavy, Matan Haroush, Nadav Merlis, Daniel J. Mankowitz, Shie Mannor

Learning how to act when there are many available actions in each state is a challenging task for Reinforcement Learning (RL) agents, especially when many of the actions are redundant or irrelevant.

reinforcement-learning text-based games

Soft-Robust Actor-Critic Policy-Gradient

no code implementations11 Mar 2018 Esther Derman, Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

It learns an optimal policy with respect to a distribution over an uncertainty set and stays robust to model uncertainty but avoids the conservativeness of robust strategies.

reinforcement-learning

Learning Robust Options

no code implementations9 Feb 2018 Daniel J. Mankowitz, Timothy A. Mann, Pierre-Luc Bacon, Doina Precup, Shie Mannor

We present a Robust Options Policy Iteration (ROPI) algorithm with convergence guarantees, which learns options that are robust to model uncertainty.

Situationally Aware Options

no code implementations20 Nov 2017 Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

We learn reusable options in different scenarios in a RoboCup soccer domain (i. e., winning/losing).

Shallow Updates for Deep Reinforcement Learning

no code implementations NeurIPS 2017 Nir Levine, Tom Zahavy, Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

In this work we propose a hybrid approach -- the Least Squares Deep Q-Network (LS-DQN), which combines rich feature representations learned by a DRL algorithm with the stability of a linear least squares method.

Atari Games Feature Engineering +1

Adaptive Skills Adaptive Partitions (ASAP)

no code implementations NeurIPS 2016 Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that (1) learns skills (i. e., temporally extended actions or options) as well as (2) where to apply them.

Situational Awareness by Risk-Conscious Skills

no code implementations10 Oct 2016 Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

In addition, the learned risk aware skills are able to mitigate reward-based model misspecification.

Hierarchical Reinforcement Learning

A Deep Hierarchical Approach to Lifelong Learning in Minecraft

no code implementations25 Apr 2016 Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J. Mankowitz, Shie Mannor

Skill distillation enables the HDRLN to efficiently retain knowledge and therefore scale in lifelong learning, by accumulating knowledge and encapsulating multiple reusable skills into a single distilled network.

Iterative Hierarchical Optimization for Misspecified Problems (IHOMP)

no code implementations10 Feb 2016 Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

For complex, high-dimensional Markov Decision Processes (MDPs), it may be necessary to represent the policy with function approximation.

Adaptive Skills, Adaptive Partitions (ASAP)

no code implementations10 Feb 2016 Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that (1) learns skills (i. e., temporally extended actions or options) as well as (2) where to apply them.

CFORB: Circular FREAK-ORB Visual Odometry

no code implementations17 Jun 2015 Daniel J. Mankowitz, Ehud Rivlin

CFORB has also been run in an indoor environment and achieved an average translational error of $3. 70 \%$.

Visual Odometry

Bootstrapping Skills

no code implementations11 Jun 2015 Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

The monolithic approach to policy representation in Markov Decision Processes (MDPs) looks for a single policy that can be represented as a function from states to actions.

Cannot find the paper you are looking for? You can Submit a new open access paper.