Search Results for author: Ofir Nachum

Found 69 papers, 27 papers with code

Improving Policy Gradient by Exploring Under-appreciated Rewards

no code implementations • 28 Nov 2016 • Ofir Nachum, Mohammad Norouzi, Dale Schuurmans

We propose a more directed exploration strategy that promotes exploration of under-appreciated reward regions.

Paper
Add Code

Bridging the Gap Between Value and Policy Based Reinforcement Learning

1 code implementation • NeurIPS 2017 • Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans

We establish a new connection between value and policy based reinforcement learning (RL) based on a relationship between softmax temporal value consistency and policy optimality under entropy regularization.

Q-Learning reinforcement-learning +1

76,565

Paper
Code

Learning to Remember Rare Events

2 code implementations • 9 Mar 2017 • Łukasz Kaiser, Ofir Nachum, Aurko Roy, Samy Bengio

We present a large-scale life-long memory module for use in deep learning.

Ranked #10 on Few-Shot Image Classification on OMNIGLOT - 5-Shot, 5-way

Few-Shot Image Classification Machine Translation +1

76,565

Paper
Code

Trust-PCL: An Off-Policy Trust Region Method for Continuous Control

1 code implementation • ICLR 2018 • Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans

When evaluated on a number of continuous control tasks, Trust-PCL improves the solution quality and sample efficiency of TRPO.

Continuous Control Reinforcement Learning (RL)

76,565

Paper
Code

MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks

3 code implementations • CVPR 2018 • Ariel Gordon, Elad Eban, Ofir Nachum, Bo Chen, Hao Wu, Tien-Ju Yang, Edward Choi

We present MorphNet, an approach to automate the design of neural network structures.

Neural Architecture Search

76,565

Paper
Code

Learning Gaussian Policies from Smoothed Action Value Functions

no code implementations • ICLR 2018 • Ofir Nachum, Mohammad Norouzi, George Tucker, Dale Schuurmans

We propose a new notion of action value defined by a Gaussian smoothed version of the expected Q-value used in SARSA.

Continuous Control Q-Learning +1

Paper
Add Code

Path Consistency Learning in Tsallis Entropy Regularized MDPs

no code implementations • ICML 2018 • Ofir Nachum, Yin-Lam Chow, Mohammad Ghavamzadeh

In this paper, we follow the work of Nachum et al. (2017) in the soft ERL setting, and propose a class of novel path consistency learning (PCL) algorithms, called {\em sparse PCL}, for the sparse ERL problem that can work with both on-policy and off-policy data.

Paper
Add Code

Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods

1 code implementation • 28 Feb 2018 • Deirdre Quillen, Eric Jang, Ofir Nachum, Chelsea Finn, Julian Ibarz, Sergey Levine

In this paper, we explore deep reinforcement learning algorithms for vision-based robotic grasping.

Q-Learning reinforcement-learning +2

Paper
Code

Smoothed Action Value Functions for Learning Gaussian Policies

no code implementations • ICML 2018 • Ofir Nachum, Mohammad Norouzi, George Tucker, Dale Schuurmans

State-action value functions (i. e., Q-values) are ubiquitous in reinforcement learning (RL), giving rise to popular algorithms such as SARSA and Q-learning.

Continuous Control Q-Learning +1

Paper
Add Code

A Lyapunov-based Approach to Safe Reinforcement Learning

1 code implementation • NeurIPS 2018 • Yin-Lam Chow, Ofir Nachum, Edgar Duenez-Guzman, Mohammad Ghavamzadeh

In many real-world reinforcement learning (RL) problems, besides optimizing the main objective function, an agent must concurrently avoid violating a number of constraints.

Decision Making reinforcement-learning +2

Paper
Code

Data-Efficient Hierarchical Reinforcement Learning

12 code implementations • NeurIPS 2018 • Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine

In this paper, we study how we can develop HRL algorithms that are general, in that they do not make onerous additional assumptions beyond standard RL algorithms, and efficient, in the sense that they can be used with modest numbers of interaction samples, making them suitable for real-world problems such as robotic control.

Hierarchical Reinforcement Learning reinforcement-learning +1

76,565

Paper
Code

Lyapunov-based Safe Policy Optimization

no code implementations • 27 Sep 2018 • Yinlam Chow, Ofir Nachum, Mohammad Ghavamzadeh, Edgar Guzman-Duenez

In many reinforcement learning applications, it is crucial that the agent interacts with the environment only through safe policies, i. e.,~policies that do not take the agent to certain undesirable situations.

Paper
Add Code

Near-Optimal Representation Learning for Hierarchical Reinforcement Learning

7 code implementations • ICLR 2019 • Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine

We study the problem of representation learning in goal-conditioned hierarchical reinforcement learning.

2D Human Pose Estimation Continuous Control +4

76,565

Paper
Code

The Laplacian in RL: Learning Representations with Efficient Approximations

no code implementations • ICLR 2019 • Yifan Wu, George Tucker, Ofir Nachum

In this paper, we present a fully general and scalable method for approximating the eigenvectors of the Laplacian in a model-free RL context.

Reinforcement Learning (RL) Representation Learning

Paper
Add Code

Identifying and Correcting Label Bias in Machine Learning

no code implementations • 15 Jan 2019 • Heinrich Jiang, Ofir Nachum

We do so by assuming the existence of underlying, unknown, and unbiased labels which are overwritten by an agent who intends to provide accurate labels but may have biases against certain groups.

BIG-bench Machine Learning Fairness

Paper
Add Code

Lyapunov-based Safe Policy Optimization for Continuous Control

1 code implementation • 28 Jan 2019 • Yin-Lam Chow, Ofir Nachum, Aleksandra Faust, Edgar Duenez-Guzman, Mohammad Ghavamzadeh

We formulate these problems as constrained Markov decision processes (CMDPs) and present safe policy optimization algorithms that are based on a Lyapunov approach to solve them.

Continuous Control Robot Navigation

Paper
Code

Stochastic Learning of Additive Second-Order Penalties with Applications to Fairness

no code implementations • ICLR 2019 • Heinrich Jiang, Yifan Wu, Ofir Nachum

In non-convex settings, the resulting problem may be difficult to solve as the Lagrangian is not guaranteed to have a deterministic saddle-point equilibrium.

Fairness

Paper
Add Code

DeepMDP: Learning Continuous Latent Space Models for Representation Learning

no code implementations • 6 Jun 2019 • Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, Marc G. Bellemare

We show that the optimization of these objectives guarantees (1) the quality of the latent space as a representation of the state space and (2) the quality of the DeepMDP as a model of the environment.

Reinforcement Learning (RL) Representation Learning

Paper
Add Code

DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections

2 code implementations • NeurIPS 2019 • Ofir Nachum, Yin-Lam Chow, Bo Dai, Lihong Li

In contrast to previous approaches, our algorithm is agnostic to knowledge of the behavior policy (or policies) used to generate the dataset.

32,717

Paper
Code

Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real

no code implementations • 13 Aug 2019 • Ofir Nachum, Michael Ahn, Hugo Ponte, Shixiang Gu, Vikash Kumar

Our method hinges on the use of hierarchical sim2real -- a simulated environment is used to learn low-level goal-reaching skills, which are then used as the action space for a high-level RL controller, also trained in simulation.

Reinforcement Learning (RL)

Paper
Add Code

Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?

no code implementations • 23 Sep 2019 • Ofir Nachum, Haoran Tang, Xingyu Lu, Shixiang Gu, Honglak Lee, Sergey Levine

Hierarchical reinforcement learning has demonstrated significant success at solving difficult reinforcement learning (RL) tasks.

Hierarchical Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Safe Policy Learning for Continuous Control

no code implementations • 25 Sep 2019 • Yinlam Chow, Ofir Nachum, Aleksandra Faust, Edgar Duenez-Guzman, Mohammad Ghavamzadeh

We study continuous action reinforcement learning problems in which it is crucial that the agent interacts with the environment only through safe policies, i. e.,~policies that keep the agent in desirable situations, both during training and at convergence.

Continuous Control

Paper
Add Code

Group-based Fair Learning Leads to Counter-intuitive Predictions

no code implementations • 4 Oct 2019 • Ofir Nachum, Heinrich Jiang

A number of machine learning (ML) methods have been proposed recently to maximize model predictive accuracy while enforcing notions of group parity or fairness across sub-populations.

Fairness

Paper
Add Code

Behavior Regularized Offline Reinforcement Learning

1 code implementation • 26 Nov 2019 • Yifan Wu, George Tucker, Ofir Nachum

In reinforcement learning (RL) research, it is common to assume access to direct online interactions with the environment.

Continuous Control Offline RL +2

32,717

Paper
Code

AlgaeDICE: Policy Gradient from Arbitrary Experience

no code implementations • 4 Dec 2019 • Ofir Nachum, Bo Dai, Ilya Kostrikov, Yin-Lam Chow, Lihong Li, Dale Schuurmans

In many real-world applications of reinforcement learning (RL), interactions with the environment are limited due to cost or feasibility.

Reinforcement Learning (RL)

Paper
Add Code

Imitation Learning via Off-Policy Distribution Matching

3 code implementations • ICLR 2020 • Ilya Kostrikov, Ofir Nachum, Jonathan Tompson

In this work, we show how the original distribution ratio estimation objective may be transformed in a principled manner to yield a completely off-policy objective.

Imitation Learning Reinforcement Learning (RL)

32,717

Paper
Code

Reinforcement Learning via Fenchel-Rockafellar Duality

1 code implementation • 7 Jan 2020 • Ofir Nachum, Bo Dai

We review basic concepts of convex duality, focusing on the very general and supremely useful Fenchel-Rockafellar duality.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

BRPO: Batch Residual Policy Optimization

no code implementations • 8 Feb 2020 • Sungryull Sohn, Yin-Lam Chow, Jayden Ooi, Ofir Nachum, Honglak Lee, Ed Chi, Craig Boutilier

In batch reinforcement learning (RL), one often constrains a learned policy to be close to the behavior (data-generating) policy, e. g., by constraining the learned action distribution to differ from the behavior policy by some maximum degree that is the same at each state.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

D4RL: Datasets for Deep Data-Driven Reinforcement Learning

7 code implementations • 15 Apr 2020 • Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, Sergey Levine

In this work, we introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.

D4RL Offline RL +2

1,187

Paper
Code

Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization

1 code implementation • ICLR 2021 • Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, Shixiang Gu

We propose a novel model-based algorithm, Behavior-Regularized Model-ENsemble (BREMEN) that can effectively optimize a policy offline using 10-20 times fewer data than prior works.

Offline RL reinforcement-learning +1

Paper
Code

RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning

2 code implementations • 24 Jun 2020 • Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Tom Le Paine, Sergio Gomez Colmenarejo, Konrad Zolna, Rishabh Agarwal, Josh Merel, Daniel Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matt Hoffman, Ofir Nachum, George Tucker, Nicolas Heess, Nando de Freitas

We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community.

Atari Games DQN Replay Dataset +3

12,771

Paper
Code

Off-Policy Evaluation via the Regularized Lagrangian

no code implementations • NeurIPS 2020 • Mengjiao Yang, Ofir Nachum, Bo Dai, Lihong Li, Dale Schuurmans

The recently proposed distribution correction estimation (DICE) family of estimators has advanced the state of the art in off-policy evaluation from behavior-agnostic data.

Off-policy evaluation

Paper
Add Code

Statistical Bootstrapping for Uncertainty Estimation in Off-Policy Evaluation

no code implementations • 27 Jul 2020 • Ilya Kostrikov, Ofir Nachum

In reinforcement learning, it is typical to use the empirically observed transitions and rewards to estimate the value of a policy via either model-based or Q-fitting approaches.

Continuous Control Off-policy evaluation

Paper
Add Code

CoinDICE: Off-Policy Confidence Interval Estimation

no code implementations • NeurIPS 2020 • Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvári, Dale Schuurmans

We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal is to estimate a confidence interval on a target policy's value, given only access to a static experience dataset collected by unknown behavior policies.

Off-policy evaluation valid

Paper
Add Code

OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning

no code implementations • ICLR 2021 • Anurag Ajay, Aviral Kumar, Pulkit Agrawal, Sergey Levine, Ofir Nachum

Reinforcement learning (RL) has achieved impressive performance in a variety of online settings in which an agent's ability to query the environment for transitions and rewards is effectively unlimited.

Few-Shot Imitation Learning Imitation Learning +3

Paper
Add Code

Offline Policy Selection under Uncertainty

1 code implementation • 12 Dec 2020 • Mengjiao Yang, Bo Dai, Ofir Nachum, George Tucker, Dale Schuurmans

More importantly, we show how the belief distribution estimated by BayesDICE may be used to rank policies with respect to any arbitrary downstream policy selection metric, and we empirically demonstrate that this selection procedure significantly outperforms existing approaches, such as ranking policies according to mean or high-confidence lower bound value estimates.

Paper
Code

Representation Matters: Offline Pretraining for Sequential Decision Making

no code implementations • ICLR Workshop SSL-RL 2021 • Mengjiao Yang, Ofir Nachum

The recent success of supervised learning methods on ever larger offline datasets has spurred interest in the reinforcement learning (RL) field to investigate whether the same paradigms can be translated to RL algorithms.

Imitation Learning Offline RL +1

Paper
Add Code

Offline Reinforcement Learning with Fisher Divergence Critic Regularization

2 code implementations • 14 Mar 2021 • Ilya Kostrikov, Jonathan Tompson, Rob Fergus, Ofir Nachum

Many modern approaches to offline Reinforcement Learning (RL) utilize behavior regularization, typically augmenting a model-free actor critic algorithm with a penalty measuring divergence of the policy from the offline data.

Offline RL reinforcement-learning +1

32,717

Paper
Code

Near Optimal Policy Optimization via REPS

no code implementations • NeurIPS 2021 • Aldo Pacchiano, Jonathan Lee, Peter Bartlett, Ofir Nachum

Since its introduction a decade ago, \emph{relative entropy policy search} (REPS) has demonstrated successful policy learning on a number of simulated and real-world robotic domains, not to mention providing algorithmic components used by many recently proposed reinforcement learning (RL) algorithms.

Reinforcement Learning (RL)

Paper
Add Code

Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning

1 code implementation • 23 Mar 2021 • Hiroki Furuta, Tatsuya Matsushima, Tadashi Kozuno, Yutaka Matsuo, Sergey Levine, Ofir Nachum, Shixiang Shane Gu

Progress in deep reinforcement learning (RL) research is largely enabled by benchmark task environments.

Continuous Control OpenAI Gym +2

Paper
Code

Benchmarks for Deep Off-Policy Evaluation

3 code implementations • ICLR 2021 • Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang, Alexander Novikov, Mengjiao Yang, Michael R. Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, Tom Le Paine

Off-policy evaluation (OPE) holds the promise of being able to leverage large, offline datasets for both evaluating and selecting complex policies for decision making.

Benchmarking Continuous Control +3

Paper
Code

Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization

no code implementations • ICLR 2021 • Michael R. Zhang, Tom Le Paine, Ofir Nachum, Cosmin Paduraru, George Tucker, Ziyu Wang, Mohammad Norouzi

This modeling choice assumes that different dimensions of the next state and reward are conditionally independent given the current state and action and may be driven by the fact that fully observable physics-based simulation environments entail deterministic transition dynamics.

Continuous Control Data Augmentation +1

Paper
Add Code

Provable Representation Learning for Imitation with Contrastive Fourier Features

1 code implementation • NeurIPS 2021 • Ofir Nachum, Mengjiao Yang

In imitation learning, it is common to learn a behavior policy to match an unknown target policy via max-likelihood training on a collected set of target demonstrations.

Atari Games Contrastive Learning +2

32,717

Paper
Code

SparseDice: Imitation Learning for Temporally Sparse Data via Regularization

no code implementations • ICML Workshop URL 2021 • Alberto Camacho, Izzeddin Gur, Marcin Lukasz Moczulski, Ofir Nachum, Aleksandra Faust

We are concerned with a setting where the demonstrations comprise only a subset of state-action pairs (as opposed to the whole trajectories).

Imitation Learning

Paper
Add Code

Policy Gradients Incorporating the Future

no code implementations • ICLR 2022 • David Venuto, Elaine Lau, Doina Precup, Ofir Nachum

Reasoning about the future -- understanding how decisions in the present time affect outcomes in the future -- is one of the central challenges for reinforcement learning (RL), especially in highly-stochastic or partially observable environments.

Offline RL Reinforcement Learning (RL)

Paper
Add Code

Why so pessimistic? Estimating uncertainties for offline RL through ensembles, and why their independence matters.

no code implementations • 29 Sep 2021 • Seyed Kamyar Seyed Ghasemipour, Shixiang Shane Gu, Ofir Nachum

Motivated by the success of MSG, we investigate whether efficient approximations to ensembles can be as effective.

Continuous Control D4RL +3

Paper
Add Code

Understanding the Generalization Gap in Visual Reinforcement Learning

no code implementations • 29 Sep 2021 • Anurag Ajay, Ge Yang, Ofir Nachum, Pulkit Agrawal

Deep Reinforcement Learning (RL) agents have achieved superhuman performance on several video game suites.

Data Augmentation reinforcement-learning +1

Paper
Add Code

Why Should I Trust You, Bellman? Evaluating the Bellman Objective with Off-Policy Data

no code implementations • 29 Sep 2021 • Scott Fujimoto, David Meger, Doina Precup, Ofir Nachum, Shixiang Shane Gu

In this work, we analyze the effectiveness of the Bellman equation as a proxy objective for value prediction accuracy in off-policy evaluation.

Off-policy evaluation Value prediction

Paper
Add Code

Targeted Environment Design from Offline Data

no code implementations • 29 Sep 2021 • Izzeddin Gur, Ofir Nachum, Aleksandra Faust

We formalize our approach as offline targeted environment design(OTED), which automatically learns a distribution over simulator parameters to match a provided offline dataset, and then uses the learned simulator to train an RL agent in standard online fashion.

Offline RL Reinforcement Learning (RL)

Paper
Add Code

TRAIL: Near-Optimal Imitation Learning with Suboptimal Data

1 code implementation • ICLR 2022 • Mengjiao Yang, Sergey Levine, Ofir Nachum

In this work, we answer this question affirmatively and present training objectives that use offline datasets to learn a factored transition model whose structure enables the extraction of a latent action space.

Imitation Learning

32,717

Paper
Code

Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions

no code implementations • 29 Nov 2021 • Bogdan Mazoure, Ilya Kostrikov, Ofir Nachum, Jonathan Tompson

We show that performance of online algorithms for generalization in RL can be hindered in the offline setting due to poor estimation of similarity between observations.

Contrastive Learning Decision Making +5

Paper
Add Code

Model Selection in Batch Policy Optimization

no code implementations • 23 Dec 2021 • Jonathan N. Lee, George Tucker, Ofir Nachum, Bo Dai

We formalize the problem in the contextual bandit setting with linear model classes by identifying three sources of error that any model selection algorithm should optimally trade-off in order to be competitive: (1) approximation error, (2) statistical complexity, and (3) coverage.

Model Selection

Paper
Add Code

Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error

no code implementations • 28 Jan 2022 • Scott Fujimoto, David Meger, Doina Precup, Ofir Nachum, Shixiang Shane Gu

In this work, we study the use of the Bellman equation as a surrogate objective for value prediction accuracy.

Value prediction

Paper
Add Code

Chain of Thought Imitation with Procedure Cloning

1 code implementation • 22 May 2022 • Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum

Imitation learning aims to extract high-performance policies from logged demonstrations of expert behavior.

Imitation Learning Robot Manipulation

32,717

Paper
Code

Why So Pessimistic? Estimating Uncertainties for Offline RL through Ensembles, and Why Their Independence Matters

2 code implementations • 27 May 2022 • Seyed Kamyar Seyed Ghasemipour, Shixiang Shane Gu, Ofir Nachum

Motivated by the success of ensembles for uncertainty estimation in supervised learning, we take a renewed look at how ensembles of $Q$-functions can be leveraged as the primary source of pessimism for offline reinforcement learning (RL).

D4RL Offline RL +1

32,717

Paper
Code

Multi-Game Decision Transformers

1 code implementation • 30 May 2022 • Kuang-Huei Lee, Ofir Nachum, Mengjiao Yang, Lisa Lee, Daniel Freeman, Winnie Xu, Sergio Guadarrama, Ian Fischer, Eric Jang, Henryk Michalewski, Igor Mordatch

Specifically, we show that a single transformer-based model - with a single set of weights - trained purely offline can play a suite of up to 46 Atari games simultaneously at close-to-human performance.

Atari Games Offline RL

32,713

Paper
Code

A Mixture-of-Expert Approach to RL-based Dialogue Management

no code implementations • 31 May 2022 • Yinlam Chow, Aza Tulepbergenov, Ofir Nachum, MoonKyung Ryu, Mohammad Ghavamzadeh, Craig Boutilier

Despite recent advancements in language models (LMs), their application to dialogue management (DM) problems and ability to carry on rich conversations remain a challenge.

Attribute Dialogue Management +3

Paper
Add Code

Joint Representation Training in Sequential Tasks with Shared Structure

no code implementations • 24 Jun 2022 • Aldo Pacchiano, Ofir Nachum, Nilseh Tripuraneni, Peter Bartlett

In contrast with previous work that have studied multi task RL in other function approximation models, we show that in the presence of bilinear optimization oracle and finite state action spaces there exists a computationally efficient algorithm for multitask MatrixRL via a reduction to quadratic programming.

Multi-Armed Bandits Reinforcement Learning (RL)

Paper
Add Code

PI-ARS: Accelerating Evolution-Learned Visual-Locomotion with Predictive Information Representations

no code implementations • 27 Jul 2022 • Kuang-Huei Lee, Ofir Nachum, Tingnan Zhang, Sergio Guadarrama, Jie Tan, Wenhao Yu

Evolution Strategy (ES) algorithms have shown promising results in training complex robotic control policies due to their massive parallelism capability, simple implementation, effective parameter-space exploration, and fast training time.

Representation Learning

Paper
Add Code

Understanding HTML with Large Language Models

no code implementations • 8 Oct 2022 • Izzeddin Gur, Ofir Nachum, Yingjie Miao, Mustafa Safdari, Austin Huang, Aakanksha Chowdhery, Sharan Narang, Noah Fiedel, Aleksandra Faust

We contribute HTML understanding models (fine-tuned LLMs) and an in-depth analysis of their capabilities under three tasks: (i) Semantic Classification of HTML elements, (ii) Description Generation for HTML inputs, and (iii) Autonomous Web Navigation of HTML pages.

Autonomous Web Navigation Retrieval

Paper
Add Code

Dichotomy of Control: Separating What You Can Control from What You Cannot

1 code implementation • 24 Oct 2022 • Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum

While return-conditioning is at the heart of popular algorithms such as decision transformer (DT), these methods tend to perform poorly in highly stochastic environments, where an occasional high return can arise from randomness in the environment rather than the actions themselves.

Reinforcement Learning (RL)

32,717

Paper
Code

Oracle Inequalities for Model Selection in Offline Reinforcement Learning

no code implementations • 3 Nov 2022 • Jonathan N. Lee, George Tucker, Ofir Nachum, Bo Dai, Emma Brunskill

We propose the first model selection algorithm for offline RL that achieves minimax rate-optimal oracle inequalities up to logarithmic factors.

Model Selection Offline RL +2

Paper
Add Code

Contrastive Value Learning: Implicit Models for Simple Offline RL

no code implementations • 3 Nov 2022 • Bogdan Mazoure, Benjamin Eysenbach, Ofir Nachum, Jonathan Tompson

In this paper, we propose Contrastive Value Learning (CVL), which learns an implicit, multi-step model of the environment dynamics.

Continuous Control Model-based Reinforcement Learning +2

Paper
Add Code

Multi-Environment Pretraining Enables Transfer to Action Limited Datasets

no code implementations • 23 Nov 2022 • David Venuto, Sherry Yang, Pieter Abbeel, Doina Precup, Igor Mordatch, Ofir Nachum

Using massive datasets to train large-scale models has emerged as a dominant approach for broad generalization in natural language and vision applications.

Decision Making

Paper
Add Code

RT-1: Robotics Transformer for Real-World Control at Scale

1 code implementation • 13 Dec 2022 • Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Tomas Jackson, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Kuang-Huei Lee, Sergey Levine, Yao Lu, Utsav Malla, Deeksha Manjunath, Igor Mordatch, Ofir Nachum, Carolina Parada, Jodilyn Peralta, Emily Perez, Karl Pertsch, Jornell Quiambao, Kanishka Rao, Michael Ryoo, Grecia Salazar, Pannag Sanketi, Kevin Sayed, Jaspiar Singh, Sumedh Sontakke, Austin Stone, Clayton Tan, Huong Tran, Vincent Vanhoucke, Steve Vega, Quan Vuong, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Tianhe Yu, Brianna Zitkovich

By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance.

1,174

Paper
Code

Foundation Models for Decision Making: Problems, Methods, and Opportunities

no code implementations • 7 Mar 2023 • Sherry Yang, Ofir Nachum, Yilun Du, Jason Wei, Pieter Abbeel, Dale Schuurmans

In response to these developments, new paradigms are emerging for training foundation models to interact with other agents and perform long-term reasoning.

Autonomous Driving Decision Making +1

Paper
Add Code

Multimodal Web Navigation with Instruction-Finetuned Foundation Models

no code implementations • 19 May 2023 • Hiroki Furuta, Kuang-Huei Lee, Ofir Nachum, Yutaka Matsuo, Aleksandra Faust, Shixiang Shane Gu, Izzeddin Gur

The progress of autonomous web navigation has been hindered by the dependence on billions of exploratory interactions via online reinforcement learning, and domain-specific model designs that make it difficult to leverage generalization from rich out-of-domain data.

Autonomous Web Navigation Instruction Following +1

Paper
Add Code

Barkour: Benchmarking Animal-level Agility with Quadruped Robots

no code implementations • 24 May 2023 • Ken Caluwaerts, Atil Iscen, J. Chase Kew, Wenhao Yu, Tingnan Zhang, Daniel Freeman, Kuang-Huei Lee, Lisa Lee, Stefano Saliceti, Vincent Zhuang, Nathan Batchelor, Steven Bohez, Federico Casarini, Jose Enrique Chen, Omar Cortes, Erwin Coumans, Adil Dostmohamed, Gabriel Dulac-Arnold, Alejandro Escontrela, Erik Frey, Roland Hafner, Deepali Jain, Bauyrjan Jyenis, Yuheng Kuang, Edward Lee, Linda Luu, Ofir Nachum, Ken Oslund, Jason Powell, Diego Reyes, Francesco Romano, Feresteh Sadeghi, Ron Sloat, Baruch Tabanpour, Daniel Zheng, Michael Neunert, Raia Hadsell, Nicolas Heess, Francesco Nori, Jeff Seto, Carolina Parada, Vikas Sindhwani, Vincent Vanhoucke, Jie Tan

In the second approach, we distill the specialist skills into a Transformer-based generalist locomotion policy, named Locomotion-Transformer, that can handle various terrains and adjust the robot's gait based on the perceived environment and robot states.

Benchmarking Navigate

Paper
Add Code

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

no code implementations • 18 Sep 2023 • Yevgen Chebotar, Quan Vuong, Alex Irpan, Karol Hausman, Fei Xia, Yao Lu, Aviral Kumar, Tianhe Yu, Alexander Herzog, Karl Pertsch, Keerthana Gopalakrishnan, Julian Ibarz, Ofir Nachum, Sumedh Sontakke, Grecia Salazar, Huong T Tran, Jodilyn Peralta, Clayton Tan, Deeksha Manjunath, Jaspiar Singht, Brianna Zitkovich, Tomas Jackson, Kanishka Rao, Chelsea Finn, Sergey Levine

In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data.

Imitation Learning Offline RL +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.