no code implementations • 14 Apr 2024 • Dipendra Misra, Aldo Pacchiano, Robert E. Schapire
We study interactive learning in a setting where the agent has to generate a response (e. g., an action or trajectory) given a context and an instruction.
2 code implementations • 12 Apr 2024 • Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Kianté Brantley, Dipendra Misra, Jason D. Lee, Wen Sun
Motivated by the fact that offline preference dataset provides informative states (i. e., data that is preferred by the labelers), our new algorithm, Dataset Reset Policy Optimization (DR-PO), integrates the existing offline preference dataset into the online policy training procedure via dataset reset: it directly resets the policy optimizer to the states in the offline dataset, instead of always starting from the initial state distribution.
no code implementations • 20 Mar 2024 • Dipendra Misra, Akanksha Saran, Tengyang Xie, Alex Lamb, John Langford
We study two types of settings: one where there is iid noise in the observation, and a more challenging setting where there is also the presence of exogenous noise, which is non-iid noise that is temporally correlated, such as the motion of people or cars in the background.
no code implementations • 12 Feb 2024 • Victor Zhong, Dipendra Misra, Xingdi Yuan, Marc-Alexandre Côté
We introduce Language Feedback Models (LFMs) that identify desirable behaviour - actions that help achieve tasks specified in the instruction - for imitation learning in instruction following.
1 code implementation • 21 Dec 2023 • Pratyusha Sharma, Jordan T. Ash, Dipendra Misra
Transformer-based Large Language Models (LLMs) have become a fixture in modern machine learning.
no code implementations • 11 Dec 2023 • Ching-An Cheng, Andrey Kolobov, Dipendra Misra, Allen Nie, Adith Swaminathan
We introduce a new benchmark, LLF-Bench (Learning from Language Feedback Benchmark; pronounced as "elf-bench"), to evaluate the ability of AI agents to interactively learn from natural language feedback and instructions.
1 code implementation • 20 Jun 2023 • Jonathan D. Chang, Kiante Brantley, Rajkumar Ramamurthy, Dipendra Misra, Wen Sun
In particular, we extend RL algorithms to allow them to interact with a dynamic black-box guide LLM and propose RL with guided feedback (RLGF), a suite of RL algorithms for LLM fine-tuning.
1 code implementation • 14 Nov 2022 • Shengpu Tang, Felipe Vieira Frujeri, Dipendra Misra, Alex Lamb, John Langford, Paul Mineiro, Sebastian Kochman
Modern decision-making systems, from robots to web recommendation engines, are expected to adapt: to user preferences, changing circumstances or even new tasks.
1 code implementation • 31 Oct 2022 • Riashat Islam, Manan Tomar, Alex Lamb, Yonathan Efroni, Hongyu Zang, Aniket Didolkar, Dipendra Misra, Xin Li, Harm van Seijen, Remi Tachet des Combes, John Langford
We find that contemporary representation learning techniques can fail on datasets where the noise is a complex and time dependent process, which is prevalent in practical applications.
1 code implementation • 26 Oct 2022 • Andrew Bennett, Dipendra Misra, Nathan Kallus
Many existing approaches to safe RL rely on receiving numeric safety feedback, but in many cases this feedback can only take binary values; that is, whether an action in a given state is safe or unsafe.
no code implementations • 17 Jul 2022 • Alex Lamb, Riashat Islam, Yonathan Efroni, Aniket Didolkar, Dipendra Misra, Dylan Foster, Lekan Molu, Rajan Chari, Akshay Krishnamurthy, John Langford
In many sequential decision-making tasks, the agent is not able to model the full complexity of the world, which consists of multitudes of relevant and irrelevant information.
no code implementations • 9 Jun 2022 • Yonathan Efroni, Dylan J. Foster, Dipendra Misra, Akshay Krishnamurthy, John Langford
In real-world reinforcement learning applications the learner's observation space is ubiquitously high-dimensional with both relevant and irrelevant information about the task at hand.
no code implementations • 27 May 2022 • Yao Liu, Dipendra Misra, Miro Dudík, Robert E. Schapire
We study reinforcement learning (RL) in settings where observations are high-dimensional, but where an RL agent has access to abstract knowledge about the structure of the state space, as is the case, for example, when a robot is tasked to go to a specific room in a building using observations from its own camera, while having access to the floor plan.
no code implementations • 28 Feb 2022 • Nikunj Saunshi, Jordan Ash, Surbhi Goel, Dipendra Misra, Cyril Zhang, Sanjeev Arora, Sham Kakade, Akshay Krishnamurthy
Contrastive learning is a popular form of self-supervised learning that encourages augmentations (views) of the same input to have more similar representations compared to augmentations of different inputs.
no code implementations • 17 Oct 2021 • Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal, John Langford
We initiate the formal study of latent state discovery in the presence of such exogenous noise sources by proposing a new model, the Exogenous Block MDP (EX-BMDP), for rich observation RL.
no code implementations • ICLR 2022 • Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal, John Langford
We initiate the formal study of latent state discovery in the presence of such exogenous noise sources by proposing a new model, the Exogenous Block MDP (EX-BMDP), for rich observation RL.
no code implementations • 18 Jun 2021 • Jordan T. Ash, Surbhi Goel, Akshay Krishnamurthy, Dipendra Misra
We focus on disambiguating the role of one of these parameters: the number of negative examples.
no code implementations • 21 May 2021 • Andrew Bennett, Dipendra Misra, Nga Than
Topic models are widely used in studying social phenomena.
1 code implementation • 13 Feb 2021 • Khanh Nguyen, Dipendra Misra, Robert Schapire, Miro Dudík, Patrick Shafto
We present a novel interactive learning protocol that enables training request-fulfilling agents by verbally describing their activities.
General Reinforcement Learning Grounded language learning +2
no code implementations • ICLR 2021 • Dipendra Misra, Qinghua Liu, Chi Jin, John Langford
We propose a novel setting for reinforcement learning that combines two common real-world difficulties: presence of observations (such as camera images) and factored states (such as location of objects).
no code implementations • NeurIPS 2020 • Zakaria Mhammedi, Dylan J. Foster, Max Simchowitz, Dipendra Misra, Wen Sun, Akshay Krishnamurthy, Alexander Rakhlin, John Langford
We introduce a new algorithm, RichID, which learns a near-optimal policy for the RichLQR with sample complexity scaling only with the dimension of the latent state space and the capacity of the decoder function class.
no code implementations • ICML 2020 • Dipendra Misra, Mikael Henaff, Akshay Krishnamurthy, John Langford
We present an algorithm, HOMER, for exploration and reinforcement learning in rich observation environments that are summarizable by an unknown latent state space.
no code implementations • 30 May 2019 • Kavosh Asadi, Dipendra Misra, Seungchan Kim, Michel L. Littman
In this paper, we address the compounding-error problem by introducing a multi-step model that directly outputs the outcome of executing a sequence of actions.
Model-based Reinforcement Learning reinforcement-learning +1
4 code implementations • CVPR 2019 • Howard Chen, Alane Suhr, Dipendra Misra, Noah Snavely, Yoav Artzi
We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task.
Ranked #10 on Vision and Language Navigation on Touchdown Dataset
no code implementations • 21 Nov 2018 • Aaron Walsman, Yonatan Bisk, Saadia Gabriel, Dipendra Misra, Yoav Artzi, Yejin Choi, Dieter Fox
Building perceptual systems for robotics which perform well under tight computational budgets requires novel architectures which rethink the traditional computer vision pipeline.
1 code implementation • 10 Nov 2018 • Valts Blukis, Dipendra Misra, Ross A. Knepper, Yoav Artzi
We propose an approach for mapping natural language instructions and raw observations to continuous control of a quadcopter drone.
no code implementations • 31 Oct 2018 • Kavosh Asadi, Evan Cater, Dipendra Misra, Michael L. Littman
When environmental interaction is expensive, model-based reinforcement learning offers a solution by planning ahead and avoiding costly mistakes.
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • EMNLP 2018 • Dipendra Misra, Ming-Wei Chang, Xiaodong He, Wen-tau Yih
Semantic parsing from denotations faces two key challenges in model training: (1) given only the denotations (e. g., answers), search for good candidate semantic parses, and (2) choose the best model update algorithm.
5 code implementations • EMNLP 2018 • Dipendra Misra, Andrew Bennett, Valts Blukis, Eyvind Niklasson, Max Shatkhin, Yoav Artzi
We propose to decompose instruction execution to goal prediction and action generation.
no code implementations • 1 Jun 2018 • Kavosh Asadi, Evan Cater, Dipendra Misra, Michael L. Littman
Learning a generative model is a key component of model-based reinforcement learning.
Model-based Reinforcement Learning reinforcement-learning +1
1 code implementation • ICML 2018 • Kavosh Asadi, Dipendra Misra, Michael L. Littman
We go on to prove an error bound for the value-function estimate arising from Lipschitz models and show that the estimated value function is itself Lipschitz.
Model-based Reinforcement Learning reinforcement-learning +1
2 code implementations • 23 Jan 2018 • Claudia Yan, Dipendra Misra, Andrew Bennnett, Aaron Walsman, Yonatan Bisk, Yoav Artzi
We present CHALET, a 3D house simulator with support for navigation and manipulation.
1 code implementation • EMNLP 2017 • Dipendra Misra, John Langford, Yoav Artzi
We propose to directly map raw visual observations and text input to actions for instruction execution.