1 code implementation • 26 May 2023 • David Brandfonbrener, Ofir Nachum, Joan Bruna
In recent years, domains such as natural language processing and image recognition have popularized the paradigm of using large datasets to pretrain representations that can be effectively transferred to downstream tasks.
no code implementations • 24 May 2023 • Ken Caluwaerts, Atil Iscen, J. Chase Kew, Wenhao Yu, Tingnan Zhang, Daniel Freeman, Kuang-Huei Lee, Lisa Lee, Stefano Saliceti, Vincent Zhuang, Nathan Batchelor, Steven Bohez, Federico Casarini, Jose Enrique Chen, Omar Cortes, Erwin Coumans, Adil Dostmohamed, Gabriel Dulac-Arnold, Alejandro Escontrela, Erik Frey, Roland Hafner, Deepali Jain, Bauyrjan Jyenis, Yuheng Kuang, Edward Lee, Linda Luu, Ofir Nachum, Ken Oslund, Jason Powell, Diego Reyes, Francesco Romano, Feresteh Sadeghi, Ron Sloat, Baruch Tabanpour, Daniel Zheng, Michael Neunert, Raia Hadsell, Nicolas Heess, Francesco Nori, Jeff Seto, Carolina Parada, Vikas Sindhwani, Vincent Vanhoucke, Jie Tan
In the second approach, we distill the specialist skills into a Transformer-based generalist locomotion policy, named Locomotion-Transformer, that can handle various terrains and adjust the robot's gait based on the perceived environment and robot states.
no code implementations • 19 May 2023 • Hiroki Furuta, Ofir Nachum, Kuang-Huei Lee, Yutaka Matsuo, Shixiang Shane Gu, Izzeddin Gur
The progress of autonomous web navigation has been hindered by the dependence on billions of exploratory interactions via online reinforcement learning, and domain-specific model designs that make it difficult to leverage generalization from rich out-of-domain data.
no code implementations • 7 Mar 2023 • Sherry Yang, Ofir Nachum, Yilun Du, Jason Wei, Pieter Abbeel, Dale Schuurmans
In response to these developments, new paradigms are emerging for training foundation models to interact with other agents and perform long-term reasoning.
no code implementations • 31 Jan 2023 • Yilun Du, Mengjiao Yang, Bo Dai, Hanjun Dai, Ofir Nachum, Joshua B. Tenenbaum, Dale Schuurmans, Pieter Abbeel
The proposed policy-as-video formulation can further represent environments with different state and action spaces in a unified space of images, which, for example, enables learning and generalization across a variety of robot manipulation tasks.
1 code implementation • 13 Dec 2022 • Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Tomas Jackson, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Kuang-Huei Lee, Sergey Levine, Yao Lu, Utsav Malla, Deeksha Manjunath, Igor Mordatch, Ofir Nachum, Carolina Parada, Jodilyn Peralta, Emily Perez, Karl Pertsch, Jornell Quiambao, Kanishka Rao, Michael Ryoo, Grecia Salazar, Pannag Sanketi, Kevin Sayed, Jaspiar Singh, Sumedh Sontakke, Austin Stone, Clayton Tan, Huong Tran, Vincent Vanhoucke, Steve Vega, Quan Vuong, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Tianhe Yu, Brianna Zitkovich
By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance.
no code implementations • 23 Nov 2022 • David Venuto, Sherry Yang, Pieter Abbeel, Doina Precup, Igor Mordatch, Ofir Nachum
Using massive datasets to train large-scale models has emerged as a dominant approach for broad generalization in natural language and vision applications.
no code implementations • 3 Nov 2022 • Bogdan Mazoure, Benjamin Eysenbach, Ofir Nachum, Jonathan Tompson
In this paper, we propose Contrastive Value Learning (CVL), which learns an implicit, multi-step model of the environment dynamics.
no code implementations • 3 Nov 2022 • Jonathan N. Lee, George Tucker, Ofir Nachum, Bo Dai, Emma Brunskill
We propose the first model selection algorithm for offline RL that achieves minimax rate-optimal oracle inequalities up to logarithmic factors.
1 code implementation • 24 Oct 2022 • Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum
While return-conditioning is at the heart of popular algorithms such as decision transformer (DT), these methods tend to perform poorly in highly stochastic environments, where an occasional high return can arise from randomness in the environment rather than the actions themselves.
no code implementations • 8 Oct 2022 • Izzeddin Gur, Ofir Nachum, Yingjie Miao, Mustafa Safdari, Austin Huang, Aakanksha Chowdhery, Sharan Narang, Noah Fiedel, Aleksandra Faust
We contribute HTML understanding models (fine-tuned LLMs) and an in-depth analysis of their capabilities under three tasks: (i) Semantic Classification of HTML elements, (ii) Description Generation for HTML inputs, and (iii) Autonomous Web Navigation of HTML pages.
no code implementations • 27 Jul 2022 • Kuang-Huei Lee, Ofir Nachum, Tingnan Zhang, Sergio Guadarrama, Jie Tan, Wenhao Yu
Evolution Strategy (ES) algorithms have shown promising results in training complex robotic control policies due to their massive parallelism capability, simple implementation, effective parameter-space exploration, and fast training time.
no code implementations • 24 Jun 2022 • Aldo Pacchiano, Ofir Nachum, Nilseh Tripuraneni, Peter Bartlett
In contrast with previous work that have studied multi task RL in other function approximation models, we show that in the presence of bilinear optimization oracle and finite state action spaces there exists a computationally efficient algorithm for multitask MatrixRL via a reduction to quadratic programming.
no code implementations • 31 May 2022 • Yinlam Chow, Aza Tulepbergenov, Ofir Nachum, MoonKyung Ryu, Mohammad Ghavamzadeh, Craig Boutilier
Despite recent advancements in language models (LMs), their application to dialogue management (DM) problems and ability to carry on rich conversations remain a challenge.
1 code implementation • 30 May 2022 • Kuang-Huei Lee, Ofir Nachum, Mengjiao Yang, Lisa Lee, Daniel Freeman, Winnie Xu, Sergio Guadarrama, Ian Fischer, Eric Jang, Henryk Michalewski, Igor Mordatch
Specifically, we show that a single transformer-based model - with a single set of weights - trained purely offline can play a suite of up to 46 Atari games simultaneously at close-to-human performance.
1 code implementation • 27 May 2022 • Seyed Kamyar Seyed Ghasemipour, Shixiang Shane Gu, Ofir Nachum
Motivated by the success of ensembles for uncertainty estimation in supervised learning, we take a renewed look at how ensembles of $Q$-functions can be leveraged as the primary source of pessimism for offline reinforcement learning (RL).
1 code implementation • 22 May 2022 • Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum
Imitation learning aims to extract high-performance policies from logged demonstrations of expert behavior.
no code implementations • 28 Jan 2022 • Scott Fujimoto, David Meger, Doina Precup, Ofir Nachum, Shixiang Shane Gu
In this work, we study the use of the Bellman equation as a surrogate objective for value prediction accuracy.
no code implementations • 23 Dec 2021 • Jonathan N. Lee, George Tucker, Ofir Nachum, Bo Dai
We formalize the problem in the contextual bandit setting with linear model classes by identifying three sources of error that any model selection algorithm should optimally trade-off in order to be competitive: (1) approximation error, (2) statistical complexity, and (3) coverage.
no code implementations • 29 Nov 2021 • Bogdan Mazoure, Ilya Kostrikov, Ofir Nachum, Jonathan Tompson
We show that performance of online algorithms for generalization in RL can be hindered in the offline setting due to poor estimation of similarity between observations.
1 code implementation • ICLR 2022 • Mengjiao Yang, Sergey Levine, Ofir Nachum
In this work, we answer this question affirmatively and present training objectives that use offline datasets to learn a factored transition model whose structure enables the extraction of a latent action space.
no code implementations • 29 Sep 2021 • Seyed Kamyar Seyed Ghasemipour, Shixiang Shane Gu, Ofir Nachum
Motivated by the success of MSG, we investigate whether efficient approximations to ensembles can be as effective.
no code implementations • 29 Sep 2021 • Anurag Ajay, Ge Yang, Ofir Nachum, Pulkit Agrawal
Deep Reinforcement Learning (RL) agents have achieved superhuman performance on several video game suites.
no code implementations • 29 Sep 2021 • Scott Fujimoto, David Meger, Doina Precup, Ofir Nachum, Shixiang Shane Gu
In this work, we analyze the effectiveness of the Bellman equation as a proxy objective for value prediction accuracy in off-policy evaluation.
no code implementations • 29 Sep 2021 • Izzeddin Gur, Ofir Nachum, Aleksandra Faust
We formalize our approach as offline targeted environment design(OTED), which automatically learns a distribution over simulator parameters to match a provided offline dataset, and then uses the learned simulator to train an RL agent in standard online fashion.
no code implementations • ICLR 2022 • David Venuto, Elaine Lau, Doina Precup, Ofir Nachum
Reasoning about the future -- understanding how decisions in the present time affect outcomes in the future -- is one of the central challenges for reinforcement learning (RL), especially in highly-stochastic or partially observable environments.
no code implementations • ICML Workshop URL 2021 • Alberto Camacho, Izzeddin Gur, Marcin Lukasz Moczulski, Ofir Nachum, Aleksandra Faust
We are concerned with a setting where the demonstrations comprise only a subset of state-action pairs (as opposed to the whole trajectories).
1 code implementation • NeurIPS 2021 • Ofir Nachum, Mengjiao Yang
In imitation learning, it is common to learn a behavior policy to match an unknown target policy via max-likelihood training on a collected set of target demonstrations.
no code implementations • ICLR 2021 • Michael R. Zhang, Tom Le Paine, Ofir Nachum, Cosmin Paduraru, George Tucker, Ziyu Wang, Mohammad Norouzi
This modeling choice assumes that different dimensions of the next state and reward are conditionally independent given the current state and action and may be driven by the fact that fully observable physics-based simulation environments entail deterministic transition dynamics.
3 code implementations • ICLR 2021 • Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang, Alexander Novikov, Mengjiao Yang, Michael R. Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, Tom Le Paine
Off-policy evaluation (OPE) holds the promise of being able to leverage large, offline datasets for both evaluating and selecting complex policies for decision making.
1 code implementation • 23 Mar 2021 • Hiroki Furuta, Tatsuya Matsushima, Tadashi Kozuno, Yutaka Matsuo, Sergey Levine, Ofir Nachum, Shixiang Shane Gu
Progress in deep reinforcement learning (RL) research is largely enabled by benchmark task environments.
no code implementations • NeurIPS 2021 • Aldo Pacchiano, Jonathan Lee, Peter Bartlett, Ofir Nachum
Since its introduction a decade ago, \emph{relative entropy policy search} (REPS) has demonstrated successful policy learning on a number of simulated and real-world robotic domains, not to mention providing algorithmic components used by many recently proposed reinforcement learning (RL) algorithms.
1 code implementation • 14 Mar 2021 • Ilya Kostrikov, Jonathan Tompson, Rob Fergus, Ofir Nachum
Many modern approaches to offline Reinforcement Learning (RL) utilize behavior regularization, typically augmenting a model-free actor critic algorithm with a penalty measuring divergence of the policy from the offline data.
no code implementations • ICLR Workshop SSL-RL 2021 • Mengjiao Yang, Ofir Nachum
The recent success of supervised learning methods on ever larger offline datasets has spurred interest in the reinforcement learning (RL) field to investigate whether the same paradigms can be translated to RL algorithms.
1 code implementation • 12 Dec 2020 • Mengjiao Yang, Bo Dai, Ofir Nachum, George Tucker, Dale Schuurmans
More importantly, we show how the belief distribution estimated by BayesDICE may be used to rank policies with respect to any arbitrary downstream policy selection metric, and we empirically demonstrate that this selection procedure significantly outperforms existing approaches, such as ranking policies according to mean or high-confidence lower bound value estimates.
no code implementations • ICLR 2021 • Anurag Ajay, Aviral Kumar, Pulkit Agrawal, Sergey Levine, Ofir Nachum
Reinforcement learning (RL) has achieved impressive performance in a variety of online settings in which an agent's ability to query the environment for transitions and rewards is effectively unlimited.
no code implementations • NeurIPS 2020 • Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvári, Dale Schuurmans
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal is to estimate a confidence interval on a target policy's value, given only access to a static experience dataset collected by unknown behavior policies.
no code implementations • 27 Jul 2020 • Ilya Kostrikov, Ofir Nachum
In reinforcement learning, it is typical to use the empirically observed transitions and rewards to estimate the value of a policy via either model-based or Q-fitting approaches.
no code implementations • NeurIPS 2020 • Mengjiao Yang, Ofir Nachum, Bo Dai, Lihong Li, Dale Schuurmans
The recently proposed distribution correction estimation (DICE) family of estimators has advanced the state of the art in off-policy evaluation from behavior-agnostic data.
2 code implementations • 24 Jun 2020 • Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Tom Le Paine, Sergio Gomez Colmenarejo, Konrad Zolna, Rishabh Agarwal, Josh Merel, Daniel Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matt Hoffman, Ofir Nachum, George Tucker, Nicolas Heess, Nando de Freitas
We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community.
1 code implementation • ICLR 2021 • Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, Shixiang Gu
We propose a novel model-based algorithm, Behavior-Regularized Model-ENsemble (BREMEN) that can effectively optimize a policy offline using 10-20 times fewer data than prior works.
6 code implementations • 15 Apr 2020 • Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, Sergey Levine
In this work, we introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.
no code implementations • 8 Feb 2020 • Sungryull Sohn, Yin-Lam Chow, Jayden Ooi, Ofir Nachum, Honglak Lee, Ed Chi, Craig Boutilier
In batch reinforcement learning (RL), one often constrains a learned policy to be close to the behavior (data-generating) policy, e. g., by constraining the learned action distribution to differ from the behavior policy by some maximum degree that is the same at each state.
1 code implementation • 7 Jan 2020 • Ofir Nachum, Bo Dai
We review basic concepts of convex duality, focusing on the very general and supremely useful Fenchel-Rockafellar duality.
2 code implementations • ICLR 2020 • Ilya Kostrikov, Ofir Nachum, Jonathan Tompson
In this work, we show how the original distribution ratio estimation objective may be transformed in a principled manner to yield a completely off-policy objective.
no code implementations • 4 Dec 2019 • Ofir Nachum, Bo Dai, Ilya Kostrikov, Yin-Lam Chow, Lihong Li, Dale Schuurmans
In many real-world applications of reinforcement learning (RL), interactions with the environment are limited due to cost or feasibility.
1 code implementation • 26 Nov 2019 • Yifan Wu, George Tucker, Ofir Nachum
In reinforcement learning (RL) research, it is common to assume access to direct online interactions with the environment.
no code implementations • 4 Oct 2019 • Ofir Nachum, Heinrich Jiang
A number of machine learning (ML) methods have been proposed recently to maximize model predictive accuracy while enforcing notions of group parity or fairness across sub-populations.
no code implementations • 25 Sep 2019 • Yinlam Chow, Ofir Nachum, Aleksandra Faust, Edgar Duenez-Guzman, Mohammad Ghavamzadeh
We study continuous action reinforcement learning problems in which it is crucial that the agent interacts with the environment only through safe policies, i. e.,~policies that keep the agent in desirable situations, both during training and at convergence.
no code implementations • 23 Sep 2019 • Ofir Nachum, Haoran Tang, Xingyu Lu, Shixiang Gu, Honglak Lee, Sergey Levine
Hierarchical reinforcement learning has demonstrated significant success at solving difficult reinforcement learning (RL) tasks.
Hierarchical Reinforcement Learning
reinforcement-learning
+1
no code implementations • 13 Aug 2019 • Ofir Nachum, Michael Ahn, Hugo Ponte, Shixiang Gu, Vikash Kumar
Our method hinges on the use of hierarchical sim2real -- a simulated environment is used to learn low-level goal-reaching skills, which are then used as the action space for a high-level RL controller, also trained in simulation.
2 code implementations • NeurIPS 2019 • Ofir Nachum, Yin-Lam Chow, Bo Dai, Lihong Li
In contrast to previous approaches, our algorithm is agnostic to knowledge of the behavior policy (or policies) used to generate the dataset.
no code implementations • 6 Jun 2019 • Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, Marc G. Bellemare
We show that the optimization of these objectives guarantees (1) the quality of the latent space as a representation of the state space and (2) the quality of the DeepMDP as a model of the environment.
no code implementations • ICLR 2019 • Heinrich Jiang, Yifan Wu, Ofir Nachum
In non-convex settings, the resulting problem may be difficult to solve as the Lagrangian is not guaranteed to have a deterministic saddle-point equilibrium.
1 code implementation • 28 Jan 2019 • Yin-Lam Chow, Ofir Nachum, Aleksandra Faust, Edgar Duenez-Guzman, Mohammad Ghavamzadeh
We formulate these problems as constrained Markov decision processes (CMDPs) and present safe policy optimization algorithms that are based on a Lyapunov approach to solve them.
no code implementations • 15 Jan 2019 • Heinrich Jiang, Ofir Nachum
We do so by assuming the existence of underlying, unknown, and unbiased labels which are overwritten by an agent who intends to provide accurate labels but may have biases against certain groups.
no code implementations • ICLR 2019 • Yifan Wu, George Tucker, Ofir Nachum
In this paper, we present a fully general and scalable method for approximating the eigenvectors of the Laplacian in a model-free RL context.
7 code implementations • ICLR 2019 • Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine
We study the problem of representation learning in goal-conditioned hierarchical reinforcement learning.
no code implementations • 27 Sep 2018 • Yinlam Chow, Ofir Nachum, Mohammad Ghavamzadeh, Edgar Guzman-Duenez
In many reinforcement learning applications, it is crucial that the agent interacts with the environment only through safe policies, i. e.,~policies that do not take the agent to certain undesirable situations.
11 code implementations • NeurIPS 2018 • Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine
In this paper, we study how we can develop HRL algorithms that are general, in that they do not make onerous additional assumptions beyond standard RL algorithms, and efficient, in the sense that they can be used with modest numbers of interaction samples, making them suitable for real-world problems such as robotic control.
Hierarchical Reinforcement Learning
reinforcement-learning
+1
1 code implementation • NeurIPS 2018 • Yin-Lam Chow, Ofir Nachum, Edgar Duenez-Guzman, Mohammad Ghavamzadeh
In many real-world reinforcement learning (RL) problems, besides optimizing the main objective function, an agent must concurrently avoid violating a number of constraints.
no code implementations • ICML 2018 • Ofir Nachum, Mohammad Norouzi, George Tucker, Dale Schuurmans
State-action value functions (i. e., Q-values) are ubiquitous in reinforcement learning (RL), giving rise to popular algorithms such as SARSA and Q-learning.
1 code implementation • 28 Feb 2018 • Deirdre Quillen, Eric Jang, Ofir Nachum, Chelsea Finn, Julian Ibarz, Sergey Levine
In this paper, we explore deep reinforcement learning algorithms for vision-based robotic grasping.
no code implementations • ICML 2018 • Ofir Nachum, Yin-Lam Chow, Mohammad Ghavamzadeh
In this paper, we follow the work of Nachum et al. (2017) in the soft ERL setting, and propose a class of novel path consistency learning (PCL) algorithms, called {\em sparse PCL}, for the sparse ERL problem that can work with both on-policy and off-policy data.
no code implementations • ICLR 2018 • Ofir Nachum, Mohammad Norouzi, George Tucker, Dale Schuurmans
We propose a new notion of action value defined by a Gaussian smoothed version of the expected Q-value used in SARSA.
3 code implementations • CVPR 2018 • Ariel Gordon, Elad Eban, Ofir Nachum, Bo Chen, Hao Wu, Tien-Ju Yang, Edward Choi
We present MorphNet, an approach to automate the design of neural network structures.
1 code implementation • ICLR 2018 • Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans
When evaluated on a number of continuous control tasks, Trust-PCL improves the solution quality and sample efficiency of TRPO.
1 code implementation • 9 Mar 2017 • Łukasz Kaiser, Ofir Nachum, Aurko Roy, Samy Bengio
We present a large-scale life-long memory module for use in deep learning.
1 code implementation • NeurIPS 2017 • Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans
We establish a new connection between value and policy based reinforcement learning (RL) based on a relationship between softmax temporal value consistency and policy optimality under entropy regularization.
no code implementations • 28 Nov 2016 • Ofir Nachum, Mohammad Norouzi, Dale Schuurmans
We propose a more directed exploration strategy that promotes exploration of under-appreciated reward regions.