Search Results for author: Ofir Nachum

Found 48 papers, 20 papers with code

Model Selection in Batch Policy Optimization

no code implementations23 Dec 2021 Jonathan N. Lee, George Tucker, Ofir Nachum, Bo Dai

We formalize the problem in the contextual bandit setting with linear model classes by identifying three sources of error that any model selection algorithm should optimally trade-off in order to be competitive: (1) approximation error, (2) statistical complexity, and (3) coverage.

Model Selection

Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions

no code implementations29 Nov 2021 Bogdan Mazoure, Ilya Kostrikov, Ofir Nachum, Jonathan Tompson

We show that performance of online algorithms for generalization in RL can be hindered in the offline setting due to poor estimation of similarity between observations.

Contrastive Learning Decision Making +2

TRAIL: Near-Optimal Imitation Learning with Suboptimal Data

1 code implementation27 Oct 2021 Mengjiao Yang, Sergey Levine, Ofir Nachum

In this work, we answer this question affirmatively and present training objectives that use offline datasets to learn a factored transition model whose structure enables the extraction of a latent action space.

Imitation Learning

Policy Gradients Incorporating the Future

no code implementations4 Aug 2021 David Venuto, Elaine Lau, Doina Precup, Ofir Nachum

Reasoning about the future -- understanding how decisions in the present time affect outcomes in the future -- is one of the central challenges for reinforcement learning (RL), especially in highly-stochastic or partially observable environments.

Offline RL

SparseDice: Imitation Learning for Temporally Sparse Data via Regularization

no code implementations ICML Workshop URL 2021 Alberto Camacho, Izzeddin Gur, Marcin Lukasz Moczulski, Ofir Nachum, Aleksandra Faust

We are concerned with a setting where the demonstrations comprise only a subset of state-action pairs (as opposed to the whole trajectories).

Imitation Learning

Provable Representation Learning for Imitation with Contrastive Fourier Features

1 code implementation NeurIPS 2021 Ofir Nachum, Mengjiao Yang

In imitation learning, it is common to learn a behavior policy to match an unknown target policy via max-likelihood training on a collected set of target demonstrations.

Atari Games Contrastive Learning +2

Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization

no code implementations ICLR 2021 Michael R. Zhang, Tom Le Paine, Ofir Nachum, Cosmin Paduraru, George Tucker, Ziyu Wang, Mohammad Norouzi

This modeling choice assumes that different dimensions of the next state and reward are conditionally independent given the current state and action and may be driven by the fact that fully observable physics-based simulation environments entail deterministic transition dynamics.

Continuous Control Data Augmentation

Benchmarks for Deep Off-Policy Evaluation

3 code implementations ICLR 2021 Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang, Alexander Novikov, Mengjiao Yang, Michael R. Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, Tom Le Paine

Off-policy evaluation (OPE) holds the promise of being able to leverage large, offline datasets for both evaluating and selecting complex policies for decision making.

Continuous Control Decision Making +1

Near Optimal Policy Optimization via REPS

no code implementations NeurIPS 2021 Aldo Pacchiano, Jonathan Lee, Peter Bartlett, Ofir Nachum

Since its introduction a decade ago, \emph{relative entropy policy search} (REPS) has demonstrated successful policy learning on a number of simulated and real-world robotic domains, not to mention providing algorithmic components used by many recently proposed reinforcement learning (RL) algorithms.

Offline Reinforcement Learning with Fisher Divergence Critic Regularization

1 code implementation14 Mar 2021 Ilya Kostrikov, Jonathan Tompson, Rob Fergus, Ofir Nachum

Many modern approaches to offline Reinforcement Learning (RL) utilize behavior regularization, typically augmenting a model-free actor critic algorithm with a penalty measuring divergence of the policy from the offline data.

Offline RL

Representation Matters: Offline Pretraining for Sequential Decision Making

no code implementations ICLR Workshop SSL-RL 2021 Mengjiao Yang, Ofir Nachum

The recent success of supervised learning methods on ever larger offline datasets has spurred interest in the reinforcement learning (RL) field to investigate whether the same paradigms can be translated to RL algorithms.

Decision Making Imitation Learning +1

Offline Policy Selection under Uncertainty

1 code implementation12 Dec 2020 Mengjiao Yang, Bo Dai, Ofir Nachum, George Tucker, Dale Schuurmans

More importantly, we show how the belief distribution estimated by BayesDICE may be used to rank policies with respect to any arbitrary downstream policy selection metric, and we empirically demonstrate that this selection procedure significantly outperforms existing approaches, such as ranking policies according to mean or high-confidence lower bound value estimates.

OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning

no code implementations ICLR 2021 Anurag Ajay, Aviral Kumar, Pulkit Agrawal, Sergey Levine, Ofir Nachum

Reinforcement learning (RL) has achieved impressive performance in a variety of online settings in which an agent's ability to query the environment for transitions and rewards is effectively unlimited.

Few-Shot Imitation Learning Imitation Learning +1

CoinDICE: Off-Policy Confidence Interval Estimation

no code implementations NeurIPS 2020 Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvári, Dale Schuurmans

We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal is to estimate a confidence interval on a target policy's value, given only access to a static experience dataset collected by unknown behavior policies.

Statistical Bootstrapping for Uncertainty Estimation in Off-Policy Evaluation

no code implementations27 Jul 2020 Ilya Kostrikov, Ofir Nachum

In reinforcement learning, it is typical to use the empirically observed transitions and rewards to estimate the value of a policy via either model-based or Q-fitting approaches.

Continuous Control

Off-Policy Evaluation via the Regularized Lagrangian

no code implementations NeurIPS 2020 Mengjiao Yang, Ofir Nachum, Bo Dai, Lihong Li, Dale Schuurmans

The recently proposed distribution correction estimation (DICE) family of estimators has advanced the state of the art in off-policy evaluation from behavior-agnostic data.

RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning

2 code implementations24 Jun 2020 Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Tom Le Paine, Sergio Gomez Colmenarejo, Konrad Zolna, Rishabh Agarwal, Josh Merel, Daniel Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matt Hoffman, Ofir Nachum, George Tucker, Nicolas Heess, Nando de Freitas

We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community.

Atari Games DQN Replay Dataset +1

Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization

1 code implementation ICLR 2021 Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, Shixiang Gu

We propose a novel model-based algorithm, Behavior-Regularized Model-ENsemble (BREMEN) that can effectively optimize a policy offline using 10-20 times fewer data than prior works.

Offline RL

D4RL: Datasets for Deep Data-Driven Reinforcement Learning

3 code implementations15 Apr 2020 Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, Sergey Levine

In this work, we introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.

Offline RL

BRPO: Batch Residual Policy Optimization

no code implementations8 Feb 2020 Sungryull Sohn, Yin-Lam Chow, Jayden Ooi, Ofir Nachum, Honglak Lee, Ed Chi, Craig Boutilier

In batch reinforcement learning (RL), one often constrains a learned policy to be close to the behavior (data-generating) policy, e. g., by constraining the learned action distribution to differ from the behavior policy by some maximum degree that is the same at each state.

Reinforcement Learning via Fenchel-Rockafellar Duality

1 code implementation7 Jan 2020 Ofir Nachum, Bo Dai

We review basic concepts of convex duality, focusing on the very general and supremely useful Fenchel-Rockafellar duality.

Imitation Learning via Off-Policy Distribution Matching

2 code implementations ICLR 2020 Ilya Kostrikov, Ofir Nachum, Jonathan Tompson

In this work, we show how the original distribution ratio estimation objective may be transformed in a principled manner to yield a completely off-policy objective.

Imitation Learning

AlgaeDICE: Policy Gradient from Arbitrary Experience

no code implementations4 Dec 2019 Ofir Nachum, Bo Dai, Ilya Kostrikov, Yin-Lam Chow, Lihong Li, Dale Schuurmans

In many real-world applications of reinforcement learning (RL), interactions with the environment are limited due to cost or feasibility.

Behavior Regularized Offline Reinforcement Learning

1 code implementation26 Nov 2019 Yifan Wu, George Tucker, Ofir Nachum

In reinforcement learning (RL) research, it is common to assume access to direct online interactions with the environment.

Continuous Control Offline RL

Group-based Fair Learning Leads to Counter-intuitive Predictions

no code implementations4 Oct 2019 Ofir Nachum, Heinrich Jiang

A number of machine learning (ML) methods have been proposed recently to maximize model predictive accuracy while enforcing notions of group parity or fairness across sub-populations.


Safe Policy Learning for Continuous Control

no code implementations25 Sep 2019 Yinlam Chow, Ofir Nachum, Aleksandra Faust, Edgar Duenez-Guzman, Mohammad Ghavamzadeh

We study continuous action reinforcement learning problems in which it is crucial that the agent interacts with the environment only through safe policies, i. e.,~policies that keep the agent in desirable situations, both during training and at convergence.

Continuous Control

Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?

no code implementations23 Sep 2019 Ofir Nachum, Haoran Tang, Xingyu Lu, Shixiang Gu, Honglak Lee, Sergey Levine

Hierarchical reinforcement learning has demonstrated significant success at solving difficult reinforcement learning (RL) tasks.

Hierarchical Reinforcement Learning

Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real

no code implementations13 Aug 2019 Ofir Nachum, Michael Ahn, Hugo Ponte, Shixiang Gu, Vikash Kumar

Our method hinges on the use of hierarchical sim2real -- a simulated environment is used to learn low-level goal-reaching skills, which are then used as the action space for a high-level RL controller, also trained in simulation.

DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections

2 code implementations NeurIPS 2019 Ofir Nachum, Yin-Lam Chow, Bo Dai, Lihong Li

In contrast to previous approaches, our algorithm is agnostic to knowledge of the behavior policy (or policies) used to generate the dataset.

DeepMDP: Learning Continuous Latent Space Models for Representation Learning

no code implementations6 Jun 2019 Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, Marc G. Bellemare

We show that the optimization of these objectives guarantees (1) the quality of the latent space as a representation of the state space and (2) the quality of the DeepMDP as a model of the environment.

Representation Learning

Stochastic Learning of Additive Second-Order Penalties with Applications to Fairness

no code implementations ICLR 2019 Heinrich Jiang, Yifan Wu, Ofir Nachum

In non-convex settings, the resulting problem may be difficult to solve as the Lagrangian is not guaranteed to have a deterministic saddle-point equilibrium.


Lyapunov-based Safe Policy Optimization for Continuous Control

no code implementations28 Jan 2019 Yin-Lam Chow, Ofir Nachum, Aleksandra Faust, Edgar Duenez-Guzman, Mohammad Ghavamzadeh

We formulate these problems as constrained Markov decision processes (CMDPs) and present safe policy optimization algorithms that are based on a Lyapunov approach to solve them.

Continuous Control Robot Navigation

Identifying and Correcting Label Bias in Machine Learning

no code implementations15 Jan 2019 Heinrich Jiang, Ofir Nachum

We do so by assuming the existence of underlying, unknown, and unbiased labels which are overwritten by an agent who intends to provide accurate labels but may have biases against certain groups.


The Laplacian in RL: Learning Representations with Efficient Approximations

no code implementations ICLR 2019 Yifan Wu, George Tucker, Ofir Nachum

In this paper, we present a fully general and scalable method for approximating the eigenvectors of the Laplacian in a model-free RL context.

Representation Learning

Lyapunov-based Safe Policy Optimization

no code implementations27 Sep 2018 Yinlam Chow, Ofir Nachum, Mohammad Ghavamzadeh, Edgar Guzman-Duenez

In many reinforcement learning applications, it is crucial that the agent interacts with the environment only through safe policies, i. e.,~policies that do not take the agent to certain undesirable situations.

Data-Efficient Hierarchical Reinforcement Learning

10 code implementations NeurIPS 2018 Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine

In this paper, we study how we can develop HRL algorithms that are general, in that they do not make onerous additional assumptions beyond standard RL algorithms, and efficient, in the sense that they can be used with modest numbers of interaction samples, making them suitable for real-world problems such as robotic control.

Hierarchical Reinforcement Learning

A Lyapunov-based Approach to Safe Reinforcement Learning

no code implementations NeurIPS 2018 Yin-Lam Chow, Ofir Nachum, Edgar Duenez-Guzman, Mohammad Ghavamzadeh

In many real-world reinforcement learning (RL) problems, besides optimizing the main objective function, an agent must concurrently avoid violating a number of constraints.

Decision Making Safe Reinforcement Learning

Smoothed Action Value Functions for Learning Gaussian Policies

no code implementations ICML 2018 Ofir Nachum, Mohammad Norouzi, George Tucker, Dale Schuurmans

State-action value functions (i. e., Q-values) are ubiquitous in reinforcement learning (RL), giving rise to popular algorithms such as SARSA and Q-learning.

Continuous Control Q-Learning

Path Consistency Learning in Tsallis Entropy Regularized MDPs

no code implementations ICML 2018 Ofir Nachum, Yin-Lam Chow, Mohammad Ghavamzadeh

In this paper, we follow the work of Nachum et al. (2017) in the soft ERL setting, and propose a class of novel path consistency learning (PCL) algorithms, called {\em sparse PCL}, for the sparse ERL problem that can work with both on-policy and off-policy data.

Trust-PCL: An Off-Policy Trust Region Method for Continuous Control

1 code implementation ICLR 2018 Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans

When evaluated on a number of continuous control tasks, Trust-PCL improves the solution quality and sample efficiency of TRPO.

Continuous Control

Bridging the Gap Between Value and Policy Based Reinforcement Learning

1 code implementation NeurIPS 2017 Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans

We establish a new connection between value and policy based reinforcement learning (RL) based on a relationship between softmax temporal value consistency and policy optimality under entropy regularization.


Improving Policy Gradient by Exploring Under-appreciated Rewards

no code implementations28 Nov 2016 Ofir Nachum, Mohammad Norouzi, Dale Schuurmans

We propose a more directed exploration strategy that promotes exploration of under-appreciated reward regions.

Cannot find the paper you are looking for? You can Submit a new open access paper.