Across applications spanning supervised classification and sequential control, deep learning has been reported to find "shortcut" solutions that fail catastrophically under minor changes in the data distribution.
Offline goal-conditioned reinforcement learning (GCRL) promises general-purpose skill learning in the form of reaching diverse goals from purely offline datasets.
We propose State Matching Offline DIstribution Correction Estimation (SMODICE), a novel and versatile regression-based offline imitation learning (IL) algorithm derived via state-occupancy matching.
no code implementations • 19 Jan 2022 • Joshua T. Vogelstein, Timothy Verstynen, Konrad P. Kording, Leyla Isik, John W. Krakauer, Ralph Etienne-Cummings, Elizabeth L. Ogburn, Carey E. Priebe, Randal Burns, Kwame Kutten, James J. Knierim, James B. Potash, Thomas Hartung, Lena Smirnova, Paul Worley, Alena Savonenko, Ian Phillips, Michael I. Miller, Rene Vidal, Jeremias Sulam, Adam Charles, Noah J. Cowan, Maxim Bichuch, Archana Venkataraman, Chen Li, Nitish Thakor, Justus M Kebschull, Marilyn Albert, Jinchong Xu, Marshall Hussain Shuler, Brian Caffo, Tilak Ratnanather, Ali Geisa, Seung-Eon Roh, Eva Yezerets, Meghana Madhyastha, Javier J. How, Tyler M. Tomita, Jayanta Dey, Ningyuan, Huang, Jong M. Shin, Kaleab Alemayehu Kinfu, Pratik Chaudhari, Ben Baker, Anna Schapiro, Dinesh Jayaraman, Eric Eaton, Michael Platt, Lyle Ungar, Leila Wehbe, Adam Kepecs, Amy Christensen, Onyema Osuagwu, Bing Brunton, Brett Mensh, Alysson R. Muotri, Gabriel Silva, Francesca Puppo, Florian Engert, Elizabeth Hillman, Julia Brown, Chris White, Weiwei Yang
We call this 'retrospective learning'.
Further, CAP adaptively tunes this penalty during training using true cost feedback from the environment.
When operating in partially observed settings, it is important for a control policy to fuse information from a history of observations.
Our experiments on tabletop manipulation tasks in simulation and on real robots demonstrate that these plug-in improvements dramatically boost the transferability of visuomotor controllers, even permitting zero-shot transfer onto new robots for the very first time.
We prove that CODAC learns a conservative return distribution -- in particular, for finite MDPs, CODAC converges to an uniform lower bound on the quantiles of the return distribution; our proof relies on a novel analysis of the distributional Bellman operator.
The difficulty of optimal control problems has classically been characterized in terms of system properties such as minimum eigenvalues of controllability/observability gramians.
We propose Likelihood-Based Diverse Sampling (LDS), a method for improving the quality and the diversity of trajectory samples from a pre-trained flow model.
Imitation learning trains policies to map from input observations to the actions that an expert would choose.
Scaling model-based inverse reinforcement learning (IRL) to real robotic manipulation tasks with unknown dynamics remains an open problem.
Reinforcement learning (RL) in real-world safety-critical target settings like urban driving is hazardous, imperiling the RL agent, other agents, and the environment.
In this work we propose a framework for visual prediction and planning that is able to overcome both of these limitations.
no code implementations • 29 May 2020 • Mike Lambeta, Po-Wei Chou, Stephen Tian, Brian Yang, Benjamin Maloon, Victoria Rose Most, Dave Stroud, Raymond Santos, Ahmad Byagowi, Gregg Kammerer, Dinesh Jayaraman, Roberto Calandra
Despite decades of research, general purpose in-hand manipulation remains one of the unsolved challenges of robotics.
Existing approaches for visuomotor robotic control typically require characterizing the robot in advance by calibrating the camera or performing system identification.
Every living organism struggles against disruptive environmental forces to carve out and maintain an orderly niche.
We study the problem of safe adaptation: given a model trained on a variety of past experiences for some task, can this model learn to perform that task in a new situation while avoiding catastrophic failure?
All living organisms struggle against the forces of nature to carve out niches where they can maintain relative stasis.
Prior work on video generation largely focuses on prediction models that only observe frames from the beginning of the video.
Standard computer vision systems assume access to intelligently captured inputs (e. g., photos from a human photographer), yet autonomously capturing good observations is a major challenge in itself.
Such discriminative models are non-causal: the training procedure is unaware of the causal structure of the interaction between the expert and the environment.
We envision REPLAB as a framework for reproducible research across manipulation tasks, and as a step in this direction, we define a template for a grasping benchmark consisting of a task definition, evaluation protocol, performance measures, and a dataset of 92k grasp attempts.
Touch sensing is widely acknowledged to be important for dexterous robotic manipulation, but exploiting tactile sensing for continuous, non-prehensile manipulation is challenging.
This model -- a deep, multimodal convolutional network -- predicts the outcome of a candidate grasp adjustment, and then executes a grasp by iteratively selecting the most promising actions.
We introduce an unsupervised feature learning approach that embeds 3D shape information into a single-view image representation.
It is common to implicitly assume access to intelligently captured inputs (e. g., photos from a human photographer), yet autonomously capturing good observations is itself a major challenge.
AutoCam leverages NFOV web video to discriminatively identify space-time "glimpses" of interest at each time instant, and then uses dynamic programming to select optimal human-like camera trajectories.
Compared to existing temporal coherence methods, our idea has the advantage of lightweight preprocessing of the unlabeled video (no tracking required) while still being able to extract object-level regions from which to learn invariances.
To verify this hypothesis, we attempt to induce this capacity in our active recognition pipeline, by simultaneously learning to forecast the effects of the agent's motions on its internal representation of the environment conditional on all past views.
While this standard approach captures the fact that high-level visual signals change slowly over time, it fails to capture *how* the visual content changes.
Understanding how images of objects and scenes behave in response to specific ego-motions is a crucial aspect of proper visual development, yet existing visual learning methods are conspicuously disconnected from the physical source of their images.
Existing methods to learn visual attributes are prone to learning the wrong thing---namely, properties that are correlated with the attribute of interest among training samples.