no code implementations • 30 Oct 2024 • Edward S. Hu, Kwangjun Ahn, Qinghua Liu, Haoran Xu, Manan Tomar, Ada Langford, Dinesh Jayaraman, Alex Lamb, John Langford
We introduce the "Belief State Transformer", a next-token predictor that takes both a prefix and suffix as inputs, with a novel objective of predicting both the next token for the prefix and the previous token for the suffix.
1 code implementation • 25 Jun 2024 • Manan Tomar, Philippe Hansen-Estruch, Philip Bachman, Alex Lamb, John Langford, Matthew E. Taylor, Sergey Levine
We introduce a new family of video prediction models designed to support downstream control tasks.
2 code implementations • 25 Jun 2024 • Philippe Hansen-Estruch, Sriram Vishwanath, Amy Zhang, Manan Tomar
At the core of both successful generative and self-supervised representation learning models there is a reconstruction objective that incorporates some form of image corruption.
2 code implementations • 16 May 2024 • Abhishek Naik, Yi Wan, Manan Tomar, Richard S. Sutton
We show that discounted methods for solving continuing reinforcement learning problems can perform significantly better if they center their rewards by subtracting out the rewards' empirical average.
no code implementations • 22 Sep 2023 • Chethan Bhateja, Derek Guo, Dibya Ghosh, Anikait Singh, Manan Tomar, Quan Vuong, Yevgen Chebotar, Sergey Levine, Aviral Kumar
Our system, called V-PTR, combines the benefits of pre-training on video data with robotic offline RL approaches that train on diverse robot data, resulting in value functions and policies for manipulation tasks that perform better, act robustly, and generalize broadly.
no code implementations • NeurIPS 2023 • Manan Tomar, Riashat Islam, Matthew E. Taylor, Sergey Levine, Philip Bachman
We propose \textit{information gating} as a way to learn parsimonious representations that identify the minimal information required for a task.
no code implementations • 28 Dec 2022 • Riashat Islam, Hongyu Zang, Manan Tomar, Aniket Didolkar, Md Mofijul Islam, Samin Yeasar Arnob, Tariq Iqbal, Xin Li, Anirudh Goyal, Nicolas Heess, Alex Lamb
Several self-supervised representation learning methods have been proposed for reinforcement learning (RL) with rich observations.
2 code implementations • 31 Oct 2022 • Riashat Islam, Manan Tomar, Alex Lamb, Yonathan Efroni, Hongyu Zang, Aniket Didolkar, Dipendra Misra, Xin Li, Harm van Seijen, Remi Tachet des Combes, John Langford
We find that contemporary representation learning techniques can fail on datasets where the noise is a complex and time dependent process, which is prevalent in practical applications.
no code implementations • 15 Nov 2021 • Manan Tomar, Utkarsh A. Mishra, Amy Zhang, Matthew E. Taylor
A wide range of methods have been proposed to enable efficient learning, leading to sample complexities similar to those in the full state setting.
no code implementations • 29 Sep 2021 • Manan Tomar, Amy Zhang, Matthew E. Taylor
The common representation acts as a implicit invariance objective to avoid the different spurious correlations captured by individual predictors.
no code implementations • ICLR Workshop SSL-RL 2021 • Manan Tomar, Amy Zhang, Roberto Calandra, Matthew E. Taylor, Joelle Pineau
Unlike previous forms of state abstractions, a model-invariance state abstraction leverages causal sparsity over state variables.
1 code implementation • ICLR 2022 • Manan Tomar, Lior Shani, Yonathan Efroni, Mohammad Ghavamzadeh
Overall, MDPO is derived from the MD principles, offers a unified approach to viewing a number of popular RL algorithms, and performs better than or on-par with TRPO, PPO, and SAC in a number of continuous control tasks.
no code implementations • ICML 2020 • Manan Tomar, Yonathan Efroni, Mohammad Ghavamzadeh
We derive model-free RL algorithms based on $\kappa$-PI and $\kappa$-VI in which the surrogate problem can be solved by any discrete or continuous action RL method, such as DQN and TRPO.
no code implementations • 25 Sep 2019 • Yonathan Efroni, Manan Tomar, Mohammad Ghavamzadeh
In this work, we explore the benefits of multi-step greedy policies in model-free RL when employed in the framework of multi-step Dynamic Programming (DP): multi-step Policy and Value Iteration.
no code implementations • 17 May 2019 • Manan Tomar, Akhil Sathuluri, Balaraman Ravindran
Shaping in humans and animals has been shown to be a powerful tool for learning complex tasks as compared to learning in a randomized fashion.
1 code implementation • 14 May 2019 • Rahul Ramesh, Manan Tomar, Balaraman Ravindran
This work adopts a complementary approach, where we attempt to discover options that navigate to landmark states.