no code implementations • 11 Jun 2024 • Yuda Song, J. Andrew Bagnell, Aarti Singh
Under the admissibility assumptions -- that the offline data could actually be produced by the policy class we consider -- we propose the first algorithm in the trace model setting that provably matches the performance of algorithms that leverage a reset model.
no code implementations • 3 Jun 2024 • Yuda Song, Gokul Swamy, Aarti Singh, J. Andrew Bagnell, Wen Sun
The two most common families of techniques -- online reinforcement learning (RL) such as Proximal Policy Optimization (PPO) and offline contrastive methods such as Direct Preference Optimization (DPO) -- were positioned as equivalent in prior work due to the fact that both have to start from the same offline preference dataset.
3 code implementations • 25 Apr 2024 • Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun
While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models.
1 code implementation • 13 Feb 2024 • Juntao Ren, Gokul Swamy, Zhiwei Steven Wu, J. Andrew Bagnell, Sanjiban Choudhury
In this work, we propose using hybrid RL -- training on a mixture of online and expert data -- to curtail unnecessary exploration.
no code implementations • 4 Feb 2024 • David Wu, Gokul Swamy, J. Andrew Bagnell, Zhiwei Steven Wu, Sanjiban Choudhury
Inverse Reinforcement Learning (IRL) is a powerful framework for learning complex behaviors from expert demonstrations.
1 code implementation • 26 Mar 2023 • Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu
In this work, we demonstrate for the first time a more informed imitation learning reduction where we utilize the state distribution of the expert to alleviate the global exploration component of the RL subroutine, providing an exponential speedup in theory.
1 code implementation • 1 Mar 2023 • Anirudh Vemula, Yuda Song, Aarti Singh, J. Andrew Bagnell, Sanjiban Choudhury
We propose a novel approach to addressing two fundamental challenges in Model-based Reinforcement Learning (MBRL): the computational expense of repeatedly finding a good policy in the learned model, and the objective mismatch between model fitting and policy computation.
1 code implementation • 13 Oct 2022 • Yuda Song, Yifei Zhou, Ayush Sekhari, J. Andrew Bagnell, Akshay Krishnamurthy, Wen Sun
We consider a hybrid reinforcement learning setting (Hybrid RL), in which an agent has access to an offline dataset and the ability to collect experience via real-world online interaction.
no code implementations • 19 Aug 2022 • Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu
A variety of problems in econometrics and machine learning, including instrumental variable regression and Bellman residual minimization, can be formulated as satisfying a set of conditional moment restrictions (CMR).
1 code implementation • 3 Aug 2022 • Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu
We consider imitation learning problems where the learner's ability to mimic the expert increases throughout the course of an episode as more information is revealed.
1 code implementation • 30 May 2022 • Gokul Swamy, Nived Rajaraman, Matthew Peng, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu, Jiantao Jiao, Kannan Ramchandran
In the tabular setting or with linear function approximation, our meta theorem shows that the performance gap incurred by our approach achieves the optimal $\widetilde{O} \left( \min({H^{3/2}} / {N}, {H} / {\sqrt{N}} \right)$ dependency, under significantly weaker assumptions compared to prior work.
1 code implementation • 2 Feb 2022 • Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu
We develop algorithms for imitation learning from policy data that was corrupted by temporally correlated noise in expert actions.
1 code implementation • 17 Nov 2021 • Anirudh Vemula, Wen Sun, Maxim Likhachev, J. Andrew Bagnell
However, there is little prior theoretical work that explains the effectiveness of ILC even in the presence of large modeling errors, where optimal control methods using the misspecified model (MM) often perform poorly.
no code implementations • 5 Oct 2021 • Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu
Recent work by Jarrett et al. attempts to frame the problem of offline imitation learning (IL) as one of learning a joint energy-based model, with the hope of out-performing standard behavioral cloning.
3 code implementations • 4 Mar 2021 • Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu
We provide a unifying view of a large family of previous imitation learning algorithms through the lens of moment matching.
no code implementations • 4 Feb 2021 • Jonathan Spencer, Sanjiban Choudhury, Arun Venkatraman, Brian Ziebart, J. Andrew Bagnell
The learner often comes to rely on features that are strongly predictive of decisions, but are subject to strong covariate shift.
1 code implementation • 21 Sep 2020 • Anirudh Vemula, J. Andrew Bagnell, Maxim Likhachev
In this paper we propose CMAX++, an approach that leverages real-world experience to improve the quality of resulting plans over successive repetitions of a robotic task.
1 code implementation • 31 Mar 2020 • Anirudh Vemula, Wen Sun, J. Andrew Bagnell
Parameter space exploration methods with black-box optimization have recently been shown to outperform state-of-the-art approaches in continuous control reinforcement learning domains.
1 code implementation • 31 Mar 2020 • Anirudh Vemula, J. Andrew Bagnell
TRON achieves this by exploiting the structure of the objective to adaptively smooth the cost function, resulting in a sequence of objectives that can be efficiently optimized.
Robotics Systems and Control Systems and Control
1 code implementation • 9 Mar 2020 • Anirudh Vemula, Yash Oza, J. Andrew Bagnell, Maxim Likhachev
In this paper, we propose CMAX an approach for interleaving planning and execution.
1 code implementation • 27 May 2019 • Wen Sun, Anirudh Vemula, Byron Boots, J. Andrew Bagnell
We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing an Integral Probability Metric between the observation distributions of the expert policy and the learner.
1 code implementation • 31 Jan 2019 • Anirudh Vemula, Wen Sun, J. Andrew Bagnell
Black-box optimizers that explore in parameter space have often been shown to outperform more sophisticated action space exploration methods developed specifically for the reinforcement learning problem.
no code implementations • 16 Nov 2018 • Takayuki Osa, Joni Pajarinen, Gerhard Neumann, J. Andrew Bagnell, Pieter Abbeel, Jan Peters
This process of learning from demonstrations, and the study of algorithms to do so, is called imitation learning.
no code implementations • ICLR 2018 • Wen Sun, J. Andrew Bagnell, Byron Boots
In this paper, we propose to combine imitation and reinforcement learning via the idea of reward shaping using an oracle.
no code implementations • NeurIPS 2018 • Wen Sun, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell
Recently, a novel class of Approximate Policy Iteration (API) algorithms have demonstrated impressive practical performance (e. g., ExIt from [2], AlphaGo-Zero from [27]).
no code implementations • ICLR 2018 • Hanzhang Hu, Debadeepta Dey, Martial Hebert, J. Andrew Bagnell
We present an approach for anytime predictions in deep neural networks (DNNs).
1 code implementation • ICLR 2018 • Hanzhang Hu, Debadeepta Dey, Allison Del Giorno, Martial Hebert, J. Andrew Bagnell
Skip connections are increasingly utilized by deep neural networks to improve accuracy and cost-efficiency.
no code implementations • NeurIPS 2017 • Arun Venkatraman, Nicholas Rhinehart, Wen Sun, Lerrel Pinto, Martial Hebert, Byron Boots, Kris M. Kitani, J. Andrew Bagnell
We seek to combine the advantages of RNNs and PSRs by augmenting existing state-of-the-art recurrent neural networks with Predictive-State Decoders (PSDs), which add supervision to the network's internal state representation to target predicting future observations.
no code implementations • 13 Sep 2017 • Allison Del Giorno, J. Andrew Bagnell, Martial Hebert
We demonstrate that this method is able to remove uninformative parts of the feature space for the anomaly detection setting.
no code implementations • 22 Aug 2017 • Hanzhang Hu, Debadeepta Dey, Martial Hebert, J. Andrew Bagnell
Experimentally, the adaptive weights induce more competitive anytime predictions on multiple recognition data-sets and models than non-adaptive approaches including weighing all losses equally.
no code implementations • ICML 2017 • Wen Sun, Arun Venkatraman, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell
We demonstrate that AggreVaTeD --- a policy gradient extension of the Imitation Learning (IL) approach of (Ross & Bagnell, 2014) --- can leverage such an oracle to achieve faster and better solutions with less training data than a less-informed Reinforcement Learning (RL) technique.
no code implementations • 1 Mar 2017 • Hanzhang Hu, Wen Sun, Arun Venkatraman, Martial Hebert, J. Andrew Bagnell
To generalize from batch to online, we first introduce the definition of online weak learning edge with which for strongly convex and smooth loss functions, we present an algorithm, Streaming Gradient Boosting (SGB) with exponential shrinkage guarantees in the number of weak learners.
no code implementations • 28 Sep 2016 • Allison Del Giorno, J. Andrew Bagnell, Martial Hebert
We address an anomaly detection setting in which training sequences are unavailable and anomalies are scored independently of temporal ordering.
no code implementations • 1 Aug 2016 • Shreyansh Daftry, J. Andrew Bagnell, Martial Hebert
The ability to transfer knowledge gained in previous tasks into new contexts is one of the most important mechanisms of human learning.
no code implementations • 28 Jul 2016 • Shreyansh Daftry, Sam Zeng, J. Andrew Bagnell, Martial Hebert
As robots aspire for long-term autonomous operations in complex dynamic environments, the ability to reliably take mission-critical decisions in ambiguous situations becomes critical.
no code implementations • 30 Dec 2015 • Wen Sun, Arun Venkatraman, Byron Boots, J. Andrew Bagnell
Latent state space models are a fundamental and widely used tool for modeling dynamical systems.
no code implementations • ICCV 2015 • Debadeepta Dey, Varun Ramakrishna, Martial Hebert, J. Andrew Bagnell
We present a simple approach for producing a small number of structured visual outputs which have high recall, for a variety of tasks including monocular pose estimation and semantic scene segmentation.
no code implementations • 28 Nov 2014 • Kevin Waugh, Dustin Morrill, J. Andrew Bagnell, Michael Bowling
We propose a novel online learning method for minimizing regret in large extensive-form games.
no code implementations • 24 Nov 2014 • Debadeepta Dey, Kumar Shaurya Shankar, Sam Zeng, Rupesh Mehta, M. Talha Agcayazi, Christopher Eriksen, Shreyansh Daftry, Martial Hebert, J. Andrew Bagnell
Cameras provide a rich source of information while being passive, cheap and lightweight for small and medium Unmanned Aerial Vehicles (UAVs).
no code implementations • 18 Nov 2014 • Kevin Waugh, J. Andrew Bagnell
The task of computing approximate Nash equilibria in large zero-sum extensive-form games has received a tremendous amount of attention due mainly to the Annual Computer Poker Competition.
no code implementations • 27 Oct 2014 • Nicholas Rhinehart, Jiaji Zhou, Martial Hebert, J. Andrew Bagnell
We present an efficient algorithm with provable performance for building a high-quality list of detections from any candidate set of region-based proposals.
no code implementations • 19 Sep 2014 • Hanzhang Hu, Alexander Grubb, J. Andrew Bagnell, Martial Hebert
We theoretically guarantee that our algorithms achieve near-optimal linear predictions at each budget when a feature group is chosen.
no code implementations • 23 Jun 2014 • Stephane Ross, J. Andrew Bagnell
Recent work has demonstrated that problems-- particularly imitation learning and structured prediction-- where a learner's predictions influence the input-distribution it is tested on can be naturally addressed by an interactive approach and analyzed using no-regret online learning.
no code implementations • 24 Feb 2014 • Shervin Javdani, Yuxin Chen, Amin Karbasi, Andreas Krause, J. Andrew Bagnell, Siddhartha Srinivasa
Instead of minimizing uncertainty per se, we consider a set of overlapping decision regions of these hypotheses.
no code implementations • 2 Dec 2013 • Alexander Grubb, Daniel Munoz, J. Andrew Bagnell, Martial Hebert
Structured prediction plays a central role in machine learning applications from computational biology to computer vision.
no code implementations • 16 Aug 2013 • Jiaji Zhou, Stephane Ross, Yisong Yue, Debadeepta Dey, J. Andrew Bagnell
We study the problem of predicting a set or list of options under knapsack constraint.
no code implementations • 15 Aug 2013 • Kevin Waugh, Brian D. Ziebart, J. Andrew Bagnell
Modeling the purposeful behavior of imperfect agents from a small number of observations is a challenging task.
no code implementations • 11 May 2013 • Stephane Ross, Jiaji Zhou, Yisong Yue, Debadeepta Dey, J. Andrew Bagnell
Many prediction domains, such as ad placement, recommendation, trajectory prediction, and document summarization, require predicting a set or list of options.
3 code implementations • 2 Nov 2010 • Stephane Ross, Geoffrey J. Gordon, J. Andrew Bagnell
Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i. i. d.