Recent work by Jarrett et al. attempts to frame the problem of offline imitation learning (IL) as one of learning a joint energy-based model, with the hope of out-performing standard behavioral cloning.
We provide a unifying view of a large family of previous imitation learning algorithms through the lens of moment matching.
The learner often comes to rely on features that are strongly predictive of decisions, but are subject to strong covariate shift.
In this paper we propose CMAX++, an approach that leverages real-world experience to improve the quality of resulting plans over successive repetitions of a robotic task.
TRON achieves this by exploiting the structure of the objective to adaptively smooth the cost function, resulting in a sequence of objectives that can be efficiently optimized.
Robotics Systems and Control Systems and Control
Parameter space exploration methods with black-box optimization have recently been shown to outperform state-of-the-art approaches in continuous control reinforcement learning domains.
We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing an Integral Probability Metric between the observation distributions of the expert policy and the learner.
Black-box optimizers that explore in parameter space have often been shown to outperform more sophisticated action space exploration methods developed specifically for the reinforcement learning problem.
This process of learning from demonstrations, and the study of algorithms to do so, is called imitation learning.
In this paper, we propose to combine imitation and reinforcement learning via the idea of reward shaping using an oracle.
Recently, a novel class of Approximate Policy Iteration (API) algorithms have demonstrated impressive practical performance (e. g., ExIt from , AlphaGo-Zero from ).
We seek to combine the advantages of RNNs and PSRs by augmenting existing state-of-the-art recurrent neural networks with Predictive-State Decoders (PSDs), which add supervision to the network's internal state representation to target predicting future observations.
We demonstrate that this method is able to remove uninformative parts of the feature space for the anomaly detection setting.
Experimentally, the adaptive weights induce more competitive anytime predictions on multiple recognition data-sets and models than non-adaptive approaches including weighing all losses equally.
We demonstrate that AggreVaTeD --- a policy gradient extension of the Imitation Learning (IL) approach of (Ross & Bagnell, 2014) --- can leverage such an oracle to achieve faster and better solutions with less training data than a less-informed Reinforcement Learning (RL) technique.
To generalize from batch to online, we first introduce the definition of online weak learning edge with which for strongly convex and smooth loss functions, we present an algorithm, Streaming Gradient Boosting (SGB) with exponential shrinkage guarantees in the number of weak learners.
We address an anomaly detection setting in which training sequences are unavailable and anomalies are scored independently of temporal ordering.
The ability to transfer knowledge gained in previous tasks into new contexts is one of the most important mechanisms of human learning.
As robots aspire for long-term autonomous operations in complex dynamic environments, the ability to reliably take mission-critical decisions in ambiguous situations becomes critical.
We present a simple approach for producing a small number of structured visual outputs which have high recall, for a variety of tasks including monocular pose estimation and semantic scene segmentation.
Cameras provide a rich source of information while being passive, cheap and lightweight for small and medium Unmanned Aerial Vehicles (UAVs).
The task of computing approximate Nash equilibria in large zero-sum extensive-form games has received a tremendous amount of attention due mainly to the Annual Computer Poker Competition.
We present an efficient algorithm with provable performance for building a high-quality list of detections from any candidate set of region-based proposals.
We theoretically guarantee that our algorithms achieve near-optimal linear predictions at each budget when a feature group is chosen.
Recent work has demonstrated that problems-- particularly imitation learning and structured prediction-- where a learner's predictions influence the input-distribution it is tested on can be naturally addressed by an interactive approach and analyzed using no-regret online learning.
Instead of minimizing uncertainty per se, we consider a set of overlapping decision regions of these hypotheses.
Structured prediction plays a central role in machine learning applications from computational biology to computer vision.
We study the problem of predicting a set or list of options under knapsack constraint.
Many prediction domains, such as ad placement, recommendation, trajectory prediction, and document summarization, require predicting a set or list of options.
Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i. i. d.