no code implementations • 11 Sep 2023 • Priya Sundaresan, Jiajun Wu, Dorsa Sadigh
A robot providing mealtime assistance must perform specialized maneuvers with various utensils in order to pick up and feed a range of food items.
no code implementations • 5 Sep 2023 • Jensen Gao, Bidipta Sarkar, Fei Xia, Ted Xiao, Jiajun Wu, Brian Ichter, Anirudha Majumdar, Dorsa Sadigh
However, current VLMs are limited in their understanding of the physical concepts (e. g., material, fragility) of common objects, which restricts their usefulness for robotic manipulation tasks that involve interaction and physical reasoning about such objects.
no code implementations • 3 Sep 2023 • Jennifer Grannen, Yilin Wu, Brandon Vu, Dorsa Sadigh
We counteract this challenge by drawing inspiration from humans to propose a novel role assignment framework: a stabilizing arm holds an object in place to simplify the environment while an acting arm executes the task.
no code implementations • 27 Jul 2023 • Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Biyik, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals.
no code implementations • 10 Jul 2023 • Suvir Mirchandani, Fei Xia, Pete Florence, Brian Ichter, Danny Driess, Montserrat Gonzalez Arenas, Kanishka Rao, Dorsa Sadigh, Andy Zeng
We observe that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences -- from arbitrary ones procedurally generated by probabilistic context-free grammars (PCFG), to more rich spatial patterns found in the Abstract Reasoning Corpus (ARC), a general AI benchmark, prompted in the style of ASCII art.
no code implementations • 7 Jul 2023 • Jonathan Yang, Dorsa Sadigh, Chelsea Finn
Reusing large datasets is crucial to scale vision-based robotic manipulators to everyday scenarios due to the high cost of collecting robotic datasets.
no code implementations • 4 Jul 2023 • Allen Z. Ren, Anushri Dixit, Alexandra Bodrova, Sumeet Singh, Stephen Tu, Noah Brown, Peng Xu, Leila Takayama, Fei Xia, Jake Varley, Zhenjia Xu, Dorsa Sadigh, Andy Zeng, Anirudha Majumdar
Large language models (LLMs) exhibit a wide range of promising capabilities -- from step-by-step planning to commonsense reasoning -- that may provide utility for robots, but remain prone to confidently hallucinated predictions.
no code implementations • 29 Jun 2023 • Priya Sundaresan, Suneel Belkhale, Dorsa Sadigh, Jeannette Bohg
While natural language offers a convenient shared interface for humans and robots, enabling robots to interpret and follow language commands remains a longstanding challenge in manipulation.
no code implementations • 14 Jun 2023 • Wenhao Yu, Nimrod Gileadi, Chuyuan Fu, Sean Kirmani, Kuang-Huei Lee, Montse Gonzalez Arenas, Hao-Tien Lewis Chiang, Tom Erez, Leonard Hasenclever, Jan Humplik, Brian Ichter, Ted Xiao, Peng Xu, Andy Zeng, Tingnan Zhang, Nicolas Heess, Dorsa Sadigh, Jie Tan, Yuval Tassa, Fei Xia
However, since low-level robot actions are hardware-dependent and underrepresented in LLM training corpora, existing efforts in applying LLMs to robotics have largely treated LLMs as semantic planners or relied on human-engineered control primitives to interface with the robot.
no code implementations • 14 Jun 2023 • Minae Kwon, Hengyuan Hu, Vivek Myers, Siddharth Karamcheti, Anca Dragan, Dorsa Sadigh
We additionally illustrate our approach with a robot on 2 carefully designed surfaces.
1 code implementation • 12 Jun 2023 • Megha Srivastava, Noah Goodman, Dorsa Sadigh
AI assistance continues to help advance applications in education, from language learning to intelligent tutoring systems, yet current methods for providing students feedback are still quite limited.
no code implementations • 4 Jun 2023 • Suneel Belkhale, Yuchen Cui, Dorsa Sadigh
In this work, we take the first step toward formalizing data quality for imitation learning through the lens of distribution shift: a high quality dataset encourages the policy to stay in distribution at test time.
no code implementations • 30 May 2023 • Kanishk Gandhi, Dorsa Sadigh, Noah D. Goodman
Existing approaches to solving strategic games rely on extensive training, yielding strategies that do not generalize to new scenarios or games without retraining.
1 code implementation • 25 May 2023 • Andy Shih, Suneel Belkhale, Stefano Ermon, Dorsa Sadigh, Nima Anari
Instead of reducing the number of denoising steps (trading quality for speed), in this paper we explore an orthogonal approach: can we run the denoising steps in parallel (trading compute for speed)?
1 code implementation • 24 May 2023 • Joey Hejna, Dorsa Sadigh
Using this insight, we completely eliminate the need for a learned reward function.
no code implementations • 26 Apr 2023 • Joey Hejna, Jensen Gao, Dorsa Sadigh
To bridge the gap between IL and RL, we introduce Distance Weighted Supervised Learning or DWSL, a supervised method for learning goal-conditioned policies from offline data.
no code implementations • 18 Apr 2023 • Maximilian Du, Suraj Nair, Dorsa Sadigh, Chelsea Finn
Concretely, we propose a simple approach that uses a small amount of downstream expert data to selectively query relevant behaviors from an offline, unlabeled dataset (including many sub-optimal behaviors).
no code implementations • 13 Apr 2023 • Hengyuan Hu, Dorsa Sadigh
One of the fundamental quests of AI is to produce agents that coordinate well with humans.
Multi-agent Reinforcement Learning
reinforcement-learning
+1
1 code implementation • 27 Feb 2023 • Minae Kwon, Sang Michael Xie, Kalesha Bullard, Dorsa Sadigh
During training, the LLM evaluates an RL agent's behavior against the desired behavior described by the prompt and outputs a corresponding reward signal.
no code implementations • 27 Feb 2023 • Vivek Myers, Erdem Biyik, Dorsa Sadigh
Robot policies need to adapt to human preferences and/or new environments.
2 code implementations • 24 Feb 2023 • Siddharth Karamcheti, Suraj Nair, Annie S. Chen, Thomas Kollar, Chelsea Finn, Dorsa Sadigh, Percy Liang
First, we demonstrate that existing representations yield inconsistent results across these tasks: masked autoencoding approaches pick up on low-level spatial features at the cost of high-level semantics, while contrastive learning approaches capture the opposite.
1 code implementation • 7 Feb 2023 • Andy Shih, Dorsa Sadigh, Stefano Ermon
LHTS is compatible with all likelihood-based models, and optimizes for the long-horizon likelihood of samples.
1 code implementation • 6 Jan 2023 • Yuchen Cui, Siddharth Karamcheti, Raj Palleti, Nidhya Shivakumar, Percy Liang, Dorsa Sadigh
Instead of discrete turn-taking between a human and robot, LILAC splits agency between the human and robot: language is an input to a learned model that produces a meaningful, low-dimensional control space that the human can use to guide the robot.
no code implementations • 6 Dec 2022 • Joey Hejna, Dorsa Sadigh
Contrary to most works that focus on query selection to \emph{minimize} the amount of data required for learning reward functions, we take an opposite approach: \emph{expanding} the pool of available data by viewing human-in-the-loop RL through the more flexible lens of multi-task learning.
no code implementations • 26 Nov 2022 • Priya Sundaresan, Suneel Belkhale, Dorsa Sadigh
Acquiring food items with a fork poses an immense challenge to a robot-assisted feeding system, due to the wide range of material properties and visual appearances present across food groups.
no code implementations • 26 Nov 2022 • Jennifer Grannen, Yilin Wu, Suneel Belkhale, Dorsa Sadigh
In order to acquire foods with such diverse properties, we propose stabilizing food items during scooping using a second arm, for example, by pushing peas against the spoon with a flat surface to prevent dispersion.
1 code implementation • 25 Nov 2022 • Megha Srivastava, Erdem Biyik, Suvir Mirchandani, Noah Goodman, Dorsa Sadigh
In this paper, we focus on the problem of assistive teaching of motor control tasks such as parking a car or landing an aircraft.
no code implementations • 4 Nov 2022 • Mengxi Li, Rika Antonova, Dorsa Sadigh, Jeannette Bohg
We demonstrate the effectiveness of our method for designing new tools in several scenarios, such as winding ropes, flipping a box and pushing peas onto a scoop in simulation.
no code implementations • 14 Oct 2022 • Kanishk Gandhi, Siddharth Karamcheti, Madeline Liao, Dorsa Sadigh
Imitation learning from human-provided demonstrations is a strong approach for learning policies for robot manipulation.
no code implementations • 16 Sep 2022 • Yilun Hao, Ruinan Wang, Zhangjie Cao, Zihan Wang, Yuchen Cui, Dorsa Sadigh
Specifically, we design a masked policy network with a binary mask to block certain modalities.
1 code implementation • 26 May 2022 • Andy Shih, Dorsa Sadigh, Stefano Ermon
Conditional inference on arbitrary subsets of variables is a core problem in probabilistic inference with important applications such as masked language modeling and image inpainting.
no code implementations • 8 Mar 2022 • Zhangjie Cao, Erdem Biyik, Guy Rosman, Dorsa Sadigh
At a certain time, to forecast a reasonable future trajectory, each agent needs to pay attention to the interactions with only a small group of most relevant agents instead of unnecessarily paying attention to all the other agents.
no code implementations • 2 Mar 2022 • Zihan Wang, Zhangjie Cao, Yilun Hao, Dorsa Sadigh
Correspondence learning is a fundamental problem in robotics, which aims to learn a mapping between state, action pairs of agents of different dynamics or embodiments.
no code implementations • 7 Feb 2022 • Zhangjie Cao, Zihan Wang, Dorsa Sadigh
Existing learning from demonstration algorithms usually assume access to expert demonstrations.
1 code implementation • 2 Feb 2022 • Mark Beliaev, Andy Shih, Stefano Ermon, Dorsa Sadigh, Ramtin Pedarsani
In this work, we show that unsupervised learning over demonstrator expertise can lead to a consistent boost in the performance of imitation learning algorithms.
no code implementations • 5 Jan 2022 • Andy Shih, Stefano Ermon, Dorsa Sadigh
In this work, we study the problem of conditional multi-agent imitation learning, where we have access to joint trajectory demonstrations at training time, and we must interact with and adapt to new partners at test time.
1 code implementation • 13 Dec 2021 • Bidipta Sarkar, Aditi Talati, Andy Shih, Dorsa Sadigh
We present PantheonRL, a multiagent reinforcement learning software package for dynamic training interactions such as round-robin, adaptive, and ad-hoc training.
1 code implementation • NeurIPS 2021 • Andy Shih, Dorsa Sadigh, Stefano Ermon
Probabilistic circuits (PCs) are a family of generative models which allows for the computation of exact likelihoods and marginals of its probability distributions.
1 code implementation • 5 Nov 2021 • Siddharth Karamcheti, Megha Srivastava, Percy Liang, Dorsa Sadigh
We introduce Language-Informed Latent Actions (LILA), a framework for learning natural language interfaces in the context of human-robot collaboration.
no code implementations • 28 Oct 2021 • Nicholas Roy, Ingmar Posner, Tim Barfoot, Philippe Beaudoin, Yoshua Bengio, Jeannette Bohg, Oliver Brock, Isabelle Depatie, Dieter Fox, Dan Koditschek, Tomas Lozano-Perez, Vikash Mansinghka, Christopher Pal, Blake Richards, Dorsa Sadigh, Stefan Schaal, Gaurav Sukhatme, Denis Therien, Marc Toussaint, Michiel Van de Panne
Machine learning has long since become a keystone technology, accelerating science and applications in a broad range of domains.
2 code implementations • 28 Oct 2021 • Zhangjie Cao, Yilun Hao, Mengxi Li, Dorsa Sadigh
The goal of learning from demonstrations is to learn a policy for an agent (imitator) by mimicking the behavior in the demonstrations.
2 code implementations • NeurIPS 2021 • Songyuan Zhang, Zhangjie Cao, Dorsa Sadigh, Yanan Sui
Our results show that CAIL significantly outperforms other imitation learning methods from demonstrations with varying optimality.
no code implementations • EMNLP 2021 • Julia White, Gabriel Poesia, Robert Hawkins, Dorsa Sadigh, Noah Goodman
An overarching goal of natural language processing is to enable machines to communicate seamlessly with humans.
no code implementations • 5 Oct 2021 • Woodrow Z. Wang, Andy Shih, Annie Xie, Dorsa Sadigh
Instead of reactively adapting to the other agent's (opponent or partner) behavior, we propose an algorithm to proactively influence the other agent's strategy to stabilize -- which can restrain the non-stationarity caused by the other agent.
no code implementations • 2 Oct 2021 • Erdem Biyik, Anusha Lalitha, Rajarshi Saha, Andrea Goldsmith, Dorsa Sadigh
Our results show that the proposed partner-aware strategy outperforms other known methods, and our human subject studies suggest humans prefer to collaborate with AI agents implementing our partner-aware strategy.
1 code implementation • 1 Oct 2021 • Nils Wilde, Erdem Biyik, Dorsa Sadigh, Stephen L. Smith
Today's robots are increasingly interacting with people and need to efficiently learn inexperienced user's preferences.
no code implementations • 27 Sep 2021 • Vivek Myers, Erdem Biyik, Nima Anari, Dorsa Sadigh
However, expert feedback is often assumed to be drawn from an underlying unimodal reward function.
1 code implementation • 16 Aug 2021 • Erdem Biyik, Aditi Talati, Dorsa Sadigh
Reward learning is a fundamental problem in human-robot interaction to have robots that operate in alignment with what their human user wants.
3 code implementations • 16 Aug 2021 • Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang
AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.
no code implementations • 14 Jun 2021 • Minae Kwon, Siddharth Karamcheti, Mariano-Florentino Cuellar, Dorsa Sadigh
This trend additionally holds when comparing agents using our targeted data acquisition framework to variants of agents trained with a mix of supervised learning and reinforcement learning, or to agents using tailored reward functions that explicitly optimize for utility and Pareto-optimality.
no code implementations • 13 May 2021 • Woodrow Z. Wang, Mark Beliaev, Erdem Biyik, Daniel A. Lazar, Ramtin Pedarsani, Dorsa Sadigh
Coordination is often critical to forming prosocial behaviors -- behaviors that increase the overall sum of rewards received by all agents in a multi-agent game.
no code implementations • 6 May 2021 • Erdem Biyik, Daniel A. Lazar, Ramtin Pedarsani, Dorsa Sadigh
Traffic congestion has large economic and social costs.
1 code implementation • 2 May 2021 • Siddharth Karamcheti, Albert J. Zhai, Dylan P. Losey, Dorsa Sadigh
In this work, we develop assistive robots that condition their latent embeddings on visual inputs.
1 code implementation • ICLR 2021 • Andy Shih, Arjun Sawhney, Jovana Kondic, Stefano Ermon, Dorsa Sadigh
Humans can quickly adapt to new partners in collaborative tasks (e. g. playing basketball), because they understand which fundamental skills of the task (e. g. how to dribble, how to shoot) carry over across new partners.
1 code implementation • 10 Mar 2021 • Zhangjie Cao, Dorsa Sadigh
The proposed score enables learning from more informative demonstrations, and disregarding the less relevant demonstrations.
1 code implementation • NeurIPS 2021 • Suvir Mirchandani, Siddharth Karamcheti, Dorsa Sadigh
Building agents capable of understanding language instructions is critical to effective and robust human-AI collaboration.
no code implementations • 10 Feb 2021 • Zhangjie Cao, Minae Kwon, Dorsa Sadigh
The ability for robots to transfer their learned knowledge to new tasks -- where data is scarce -- is a fundamental challenge for successful robot learning.
Transfer Reinforcement Learning
Robotics
no code implementations • 28 Dec 2020 • Mark Beliaev, Erdem Biyik, Daniel A. Lazar, Woodrow Z. Wang, Dorsa Sadigh, Ramtin Pedarsani
In turn, significant increases in traffic congestion are expected, since people are likely to prefer using their own vehicles or taxis as opposed to riskier and more crowded options such as the railway.
no code implementations • 12 Nov 2020 • Annie Xie, Dylan P. Losey, Ryan Tolsma, Chelsea Finn, Dorsa Sadigh
We propose a reinforcement learning-based framework for learning latent representations of an agent's policy, where the ego agent identifies the relationship between its behavior and the other agent's future strategy.
1 code implementation • 9 Nov 2020 • Kejun Li, Maegan Tucker, Erdem Biyik, Ellen Novoseller, Joel W. Burdick, Yanan Sui, Dorsa Sadigh, Yisong Yue, Aaron D. Ames
ROIAL learns Bayesian posteriors that predict each exoskeleton user's utility landscape across four exoskeleton gait parameters.
no code implementations • EMNLP (intexsempar) 2020 • Siddharth Karamcheti, Dorsa Sadigh, Percy Liang
Our goal is to create an interactive natural language interface that efficiently and reliably learns from users to complete tasks in simulated robotics settings.
no code implementations • 10 Aug 2020 • Zheqing Zhu, Erdem Biyik, Dorsa Sadigh
Multi-agent safe systems have become an increasingly important area of study as we can now easily have multiple AI-powered systems operating together.
no code implementations • 22 Jul 2020 • Mengxi Li, Dylan P. Losey, Jeannette Bohg, Dorsa Sadigh
Existing approaches to teleoperation typically assume a one-size-fits-all approach, where the designers pre-define a mapping between human inputs and robot actions, and every user must adapt to this mapping over repeated interactions.
1 code implementation • 1 Jul 2020 • Zhangjie Cao, Erdem Biyik, Woodrow Z. Wang, Allan Raventos, Adrien Gaidon, Guy Rosman, Dorsa Sadigh
To address driving in near-accident scenarios, we propose a hierarchical reinforcement and imitation learning (H-ReIL) approach that consists of low-level policies learned by IL for discrete driving modes, and a high-level policy learned by RL that switches between different driving modes.
no code implementations • 24 Jun 2020 • Erdem Biyik, Dylan P. Losey, Malayandi Palan, Nicholas C. Landolfi, Gleb Shevchuk, Dorsa Sadigh
As designing reward functions can be extremely challenging, a more promising approach is to directly learn reward functions from human teachers.
1 code implementation • 27 May 2020 • Shushman Choudhury, Jayesh K. Gupta, Mykel J. Kochenderfer, Dorsa Sadigh, Jeannette Bohg
We consider the problem of dynamically allocating tasks to multiple agents under time window constraints and task completion uncertainty.
1 code implementation • 6 May 2020 • Erdem Biyik, Nicolas Huynh, Mykel J. Kochenderfer, Dorsa Sadigh
Our results in simulations and a user study suggest that our approach can efficiently learn expressive reward functions for robotics tasks.
no code implementations • EMNLP (Eval4NLP) 2020 • Kawin Ethayarajh, Dorsa Sadigh
To this end, we propose BLEU Neighbors, a nearest neighbors model for estimating language quality by using the BLEU score as a kernel function.
no code implementations • 19 Mar 2020 • John Mern, Dorsa Sadigh, Mykel J. Kochenderfer
We show that our proposed representation results in an input space that is a factor of $m!$ smaller for inputs of $m$ objects.
no code implementations • 13 Jan 2020 • Minae Kwon, Erdem Biyik, Aditi Talati, Karan Bhasin, Dylan P. Losey, Dorsa Sadigh
Overall, we extend existing rational human models so that collaborative robots can anticipate and plan around suboptimal human behavior during HRI.
1 code implementation • CONLL 2020 • Robert D. Hawkins, Minae Kwon, Dorsa Sadigh, Noah D. Goodman
To communicate with new partners in new contexts, humans rapidly form new linguistic conventions.
no code implementations • 16 Oct 2019 • Dylan P. Losey, Mengxi Li, Jeannette Bohg, Dorsa Sadigh
When teams of robots collaborate to complete a task, communication is often necessary.
2 code implementations • 10 Oct 2019 • Erdem Biyik, Malayandi Palan, Nicholas C. Landolfi, Dylan P. Losey, Dorsa Sadigh
Robots can learn the right reward function by querying a human expert.
no code implementations • 20 Sep 2019 • Dylan P. Losey, Krishnan Srinivasan, Ajay Mandlekar, Animesh Garg, Dorsa Sadigh
Our insight is that we can make assistive robots easier for humans to control by leveraging latent actions.
Robotics
1 code implementation • 21 Jun 2019 • Malayandi Palan, Nicholas C. Landolfi, Gleb Shevchuk, Dorsa Sadigh
In a user study, we compare our method to a standard IRL method; we find that users rated the robot trained with DemPref as being more successful at learning their desired behavior, and preferred to use the DemPref system (over IRL) to train the robot.
1 code implementation • 19 Jun 2019 • Erdem Biyik, Kenneth Wang, Nima Anari, Dorsa Sadigh
While active learning methods attempt to tackle this issue by labeling only the data samples that give high information, they generally suffer from large computational costs and are impractical in settings where data can be collected in parallel.
no code implementations • 13 May 2019 • Ashwini Pokle, Roberto Martín-Martín, Patrick Goebel, Vincent Chow, Hans M. Ewald, Junwei Yang, Zhenkai Wang, Amir Sadeghian, Dorsa Sadigh, Silvio Savarese, Marynel Vázquez
We present a navigation system that combines ideas from hierarchical planning and machine learning.
no code implementations • 7 May 2019 • John Mern, Dorsa Sadigh, Mykel Kochenderfer
Although deep reinforcement learning has advanced significantly over the past several years, sample efficiency remains a major challenge.
no code implementations • 1 Apr 2019 • Erdem Biyik, Jonathan Margoliash, Shahrouz Ryan Alimo, Dorsa Sadigh
We propose a safe exploration algorithm for deterministic Markov Decision Processes with unknown transition models.
1 code implementation • 14 Feb 2019 • Tianhe Yu, Gleb Shevchuk, Dorsa Sadigh, Chelsea Finn
While reinforcement learning (RL) has the potential to enable robots to autonomously acquire a wide range of skills, in practice, RL usually requires manual, per-task engineering of reward functions, especially in real world settings where aspects of the environment needed to compute progress are not directly accessible.
no code implementations • 13 Oct 2018 • Jaime F. Fisac, Eli Bronstein, Elis Stefansson, Dorsa Sadigh, S. Shankar Sastry, Anca D. Dragan
This mutual dependence, best captured by dynamic game theory, creates a strong coupling between the vehicle's planning and its predictions of other drivers' behavior, and constitutes an open problem with direct implications on the safety and viability of autonomous driving technology.
1 code implementation • 10 Oct 2018 • Erdem Biyik, Dorsa Sadigh
Data generation and labeling are usually an expensive part of learning for robotics.
1 code implementation • NeurIPS 2018 • Jiaming Song, Hongyu Ren, Dorsa Sadigh, Stefano Ermon
Imitation learning algorithms can be used to learn a policy from expert demonstrations without access to a reward signal.
no code implementations • 27 Jun 2016 • Sanjit A. Seshia, Dorsa Sadigh, S. Shankar Sastry
Verified artificial intelligence (AI) is the goal of designing AI-based systems that that have strong, ideally provable, assurances of correctness with respect to mathematically-specified requirements.
no code implementations • 25 Oct 2015 • Dorsa Sadigh, Ashish Kapoor
In this paper, we propose a new logic, Probabilistic Signal Temporal Logic (PrSTL), as an expressive language to define the stochastic properties, and enforce probabilistic guarantees on them.
no code implementations • 7 Dec 2013 • Dorsa Sadigh, Henrik Ohlsson, S. Shankar Sastry, Sanjit A. Seshia
As in robust PCA, it can be problematic to find a suitable regularization parameter.