no code implementations • 7 Nov 2024 • Yecheng Jason Ma, Joey Hejna, Ayzaan Wahid, Chuyuan Fu, Dhruv Shah, Jacky Liang, Zhuo Xu, Sean Kirmani, Peng Xu, Danny Driess, Ted Xiao, Jonathan Tompson, Osbert Bastani, Dinesh Jayaraman, Wenhao Yu, Tingnan Zhang, Dorsa Sadigh, Fei Xia
Instead, GVL poses value estimation as a temporal ordering problem over shuffled video frames; this seemingly more challenging task encourages VLMs to more fully exploit their underlying semantic and temporal grounding capabilities to differentiate frames based on their perceived task progress, consequently producing significantly better value predictions.
no code implementations • 5 Nov 2024 • Soroush Nasiriany, Sean Kirmani, Tianli Ding, Laura Smith, Yuke Zhu, Danny Driess, Dorsa Sadigh, Ted Xiao
Our method, RT-Affordance, is a hierarchical model that first proposes an affordance plan given the task language, and then conditions the policy on this affordance plan to perform manipulation.
no code implementations • 4 Nov 2024 • Jennifer Grannen, Siddharth Karamcheti, Suvir Mirchandani, Percy Liang, Dorsa Sadigh
Similarly, users teach high-level planning behaviors through spoken dialogue, using pretrained language models to synthesize behaviors such as "packing an object away" as compositions of low-level skills $-$ concepts that can be reused and built upon.
no code implementations • 4 Nov 2024 • Suvir Mirchandani, Suneel Belkhale, Joey Hejna, Evelyn Choi, Md Sazzad Islam, Dorsa Sadigh
Our work suggests a negative result: that scaling up autonomous data collection for learning robot policies for real-world tasks is more challenging and impractical than what is suggested in prior work.
no code implementations • 24 Sep 2024 • Homanga Bharadhwaj, Debidatta Dwibedi, Abhinav Gupta, Shubham Tulsiani, Carl Doersch, Ted Xiao, Dhruv Shah, Fei Xia, Dorsa Sadigh, Sean Kirmani
To train the policy, we use an order of magnitude less robot interaction data compared to what the video prediction model was trained on.
no code implementations • 16 Sep 2024 • Minyoung Hwang, Joey Hejna, Dorsa Sadigh, Yonatan Bisk
MotIF assesses the success of robot motion given the image observation of the trajectory, task instruction, and motion description.
no code implementations • 29 Aug 2024 • Li-Heng Lin, Yuchen Cui, Amber Xie, Tianyu Hua, Dorsa Sadigh
We propose FlowRetrieval, an approach that leverages optical flow representations for both extracting similar motions to target tasks from prior data, and for guiding learning of a policy that can maximally benefit from such data.
1 code implementation • 26 Aug 2024 • Joey Hejna, Chethan Bhateja, Yichen Jian, Karl Pertsch, Dorsa Sadigh
Increasingly large imitation learning datasets are being collected with the goal of training foundation models for robotics.
no code implementations • 10 Jul 2024 • Rajat Kumar Jenamani, Priya Sundaresan, Maram Sakr, Tapomayukh Bhattacharjee, Dorsa Sadigh
We address this with FLAIR, a system for long-horizon feeding which leverages the commonsense and few-shot reasoning capabilities of foundation models, along with a library of parameterized skills, to plan and execute user-preferred and efficient bite sequences.
1 code implementation • 13 Jun 2024 • Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, Chelsea Finn
Large policies pretrained on a combination of Internet-scale vision-language data and diverse robot demonstrations have the potential to change how we teach robots new skills: rather than training new behaviors from scratch, we can fine-tune such vision-language-action (VLA) models to obtain robust, generalizable policies for visuomotor control.
no code implementations • 20 May 2024 • Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, Jianlan Luo, You Liang Tan, Lawrence Yunliang Chen, Pannag Sanketi, Quan Vuong, Ted Xiao, Dorsa Sadigh, Chelsea Finn, Sergey Levine
In experiments across 9 robotic platforms, we demonstrate that Octo serves as a versatile policy initialization that can be effectively finetuned to new observation and action spaces.
1 code implementation • 7 May 2024 • Megha Srivastava, Cedric Colas, Dorsa Sadigh, Jacob Andreas
Modern AI systems such as self-driving cars and game-playing agents achieve superhuman performance, but often lack human-like features such as generalization, interpretability and human inter-operability.
no code implementations • 23 Mar 2024 • Allen Z. Ren, Jaden Clark, Anushri Dixit, Masha Itkina, Anirudha Majumdar, Dorsa Sadigh
We consider the problem of Embodied Question Answering (EQA), which refers to settings where an embodied agent such as a robot needs to actively explore an environment to gather information until it is confident about the answer to a question.
no code implementations • 8 Mar 2024 • Jensen Gao, Annie Xie, Ted Xiao, Chelsea Finn, Dorsa Sadigh
If robot policies can compose environmental factors from their data to succeed when encountering unseen factor combinations, we can exploit this to avoid collecting data for situations that composition would address.
no code implementations • 4 Mar 2024 • Suneel Belkhale, Tianli Ding, Ted Xiao, Pierre Sermanet, Quon Vuong, Jonathan Tompson, Yevgen Chebotar, Debidatta Dwibedi, Dorsa Sadigh
Predicting these language motions as an intermediate step between tasks and actions forces the policy to learn the shared structure of low-level motions across seemingly disparate tasks.
no code implementations • 24 Feb 2024 • Erdem Biyik, Nima Anari, Dorsa Sadigh
Our results suggest that our batch active learning algorithm requires only a few queries that are computed in a short amount of time.
3 code implementations • 12 Feb 2024 • Siddharth Karamcheti, Suraj Nair, Ashwin Balakrishna, Percy Liang, Thomas Kollar, Dorsa Sadigh
Visually-conditioned language models (VLMs) have seen growing adoption in applications such as visual dialogue, scene understanding, and robotic task planning; adoption that has fueled a wealth of new models such as LLaVa, InstructBLIP, and PaLI-3.
no code implementations • 23 Jan 2024 • Michael Ahn, Debidatta Dwibedi, Chelsea Finn, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Karol Hausman, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Sean Kirmani, Edward Lee, Sergey Levine, Yao Lu, Isabel Leal, Sharath Maddineni, Kanishka Rao, Dorsa Sadigh, Pannag Sanketi, Pierre Sermanet, Quan Vuong, Stefan Welker, Fei Xia, Ted Xiao, Peng Xu, Steve Xu, Zhuo Xu
We experimentally show that such "in-the-wild" data collected by AutoRT is significantly more diverse, and that AutoRT's use of LLMs allows for instruction following data collection robots that can align to human preferences.
no code implementations • CVPR 2024 • Boyuan Chen, Zhuo Xu, Sean Kirmani, Brian Ichter, Danny Driess, Pete Florence, Dorsa Sadigh, Leonidas Guibas, Fei Xia
By training a VLM on such data, we significantly enhance its ability on both qualitative and quantitative spatial VQA.
1 code implementation • 7 Dec 2023 • Chengshu Li, Jacky Liang, Andy Zeng, Xinyun Chen, Karol Hausman, Dorsa Sadigh, Sergey Levine, Li Fei-Fei, Fei Xia, Brian Ichter
For example, consider prompting an LM to write code that counts the number of times it detects sarcasm in an essay: the LM may struggle to write an implementation for "detect_sarcasm(string)" that can be executed by the interpreter (handling the edge cases would be insurmountable).
1 code implementation • 17 Nov 2023 • Lihan Zha, Yuchen Cui, Li-Heng Lin, Minae Kwon, Montserrat Gonzalez Arenas, Andy Zeng, Fei Xia, Dorsa Sadigh
DROC is able to respond to a sequence of online language corrections that address failures in both high-level task plans and low-level skill primitives.
no code implementations • 3 Nov 2023 • Hengyuan Hu, Suvir Mirchandani, Dorsa Sadigh
Despite the considerable potential of reinforcement learning (RL), robotic control tasks predominantly rely on imitation learning (IL) due to its better sample efficiency.
no code implementations • NeurIPS 2023 • Bidipta Sarkar, Andy Shih, Dorsa Sadigh
Conventions are crucial for strong performance in cooperative multi-agent games, because they allow players to coordinate on a shared strategy without explicit communication.
1 code implementation • 20 Oct 2023 • Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh
Thus, learning a reward function from feedback is not only based on a flawed assumption of human preference, but also leads to unwieldy optimization challenges that stem from policy gradients or bootstrapping in the RL phase.
1 code implementation • 29 Sep 2023 • Tim Salzmann, Lewis Chiang, Markus Ryll, Dorsa Sadigh, Carolina Parada, Alex Bewley
Anticipating the motion of all humans in dynamic environments such as homes and offices is critical to enable safe and effective robot navigation.
no code implementations • 11 Sep 2023 • Priya Sundaresan, Jiajun Wu, Dorsa Sadigh
A robot providing mealtime assistance must perform specialized maneuvers with various utensils in order to pick up and feed a range of food items.
no code implementations • 5 Sep 2023 • Jensen Gao, Bidipta Sarkar, Fei Xia, Ted Xiao, Jiajun Wu, Brian Ichter, Anirudha Majumdar, Dorsa Sadigh
We incorporate this physically grounded VLM in an interactive framework with a large language model-based robotic planner, and show improved planning performance on tasks that require reasoning about physical object concepts, compared to baselines that do not leverage physically grounded VLMs.
no code implementations • 3 Sep 2023 • Jennifer Grannen, Yilin Wu, Brandon Vu, Dorsa Sadigh
We counteract this challenge by drawing inspiration from humans to propose a novel role assignment framework: a stabilizing arm holds an object in place to simplify the environment while an acting arm executes the task.
no code implementations • 27 Jul 2023 • Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Biyik, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals.
no code implementations • 10 Jul 2023 • Suvir Mirchandani, Fei Xia, Pete Florence, Brian Ichter, Danny Driess, Montserrat Gonzalez Arenas, Kanishka Rao, Dorsa Sadigh, Andy Zeng
We observe that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences -- from arbitrary ones procedurally generated by probabilistic context-free grammars (PCFG), to more rich spatial patterns found in the Abstraction and Reasoning Corpus (ARC), a general AI benchmark, prompted in the style of ASCII art.
no code implementations • 7 Jul 2023 • Jonathan Yang, Dorsa Sadigh, Chelsea Finn
Reusing large datasets is crucial to scale vision-based robotic manipulators to everyday scenarios due to the high cost of collecting robotic datasets.
no code implementations • 4 Jul 2023 • Allen Z. Ren, Anushri Dixit, Alexandra Bodrova, Sumeet Singh, Stephen Tu, Noah Brown, Peng Xu, Leila Takayama, Fei Xia, Jake Varley, Zhenjia Xu, Dorsa Sadigh, Andy Zeng, Anirudha Majumdar
Large language models (LLMs) exhibit a wide range of promising capabilities -- from step-by-step planning to commonsense reasoning -- that may provide utility for robots, but remain prone to confidently hallucinated predictions.
no code implementations • 29 Jun 2023 • Priya Sundaresan, Suneel Belkhale, Dorsa Sadigh, Jeannette Bohg
While natural language offers a convenient shared interface for humans and robots, enabling robots to interpret and follow language commands remains a longstanding challenge in manipulation.
no code implementations • 14 Jun 2023 • Minae Kwon, Hengyuan Hu, Vivek Myers, Siddharth Karamcheti, Anca Dragan, Dorsa Sadigh
We additionally illustrate our approach with a robot on 2 carefully designed surfaces.
no code implementations • 14 Jun 2023 • Wenhao Yu, Nimrod Gileadi, Chuyuan Fu, Sean Kirmani, Kuang-Huei Lee, Montse Gonzalez Arenas, Hao-Tien Lewis Chiang, Tom Erez, Leonard Hasenclever, Jan Humplik, Brian Ichter, Ted Xiao, Peng Xu, Andy Zeng, Tingnan Zhang, Nicolas Heess, Dorsa Sadigh, Jie Tan, Yuval Tassa, Fei Xia
However, since low-level robot actions are hardware-dependent and underrepresented in LLM training corpora, existing efforts in applying LLMs to robotics have largely treated LLMs as semantic planners or relied on human-engineered control primitives to interface with the robot.
1 code implementation • 12 Jun 2023 • Megha Srivastava, Noah Goodman, Dorsa Sadigh
AI assistance continues to help advance applications in education, from language learning to intelligent tutoring systems, yet current methods for providing students feedback are still quite limited.
no code implementations • 30 May 2023 • Kanishk Gandhi, Dorsa Sadigh, Noah D. Goodman
Existing approaches to solving strategic games rely on extensive training, yielding strategies that do not generalize to new scenarios or games without retraining.
1 code implementation • NeurIPS 2023 • Andy Shih, Suneel Belkhale, Stefano Ermon, Dorsa Sadigh, Nima Anari
Instead of reducing the number of denoising steps (trading quality for speed), in this paper we explore an orthogonal approach: can we run the denoising steps in parallel (trading compute for speed)?
1 code implementation • 26 Apr 2023 • Joey Hejna, Jensen Gao, Dorsa Sadigh
To bridge the gap between IL and RL, we introduce Distance Weighted Supervised Learning or DWSL, a supervised method for learning goal-conditioned policies from offline data.
no code implementations • 18 Apr 2023 • Maximilian Du, Suraj Nair, Dorsa Sadigh, Chelsea Finn
Concretely, we propose a simple approach that uses a small amount of downstream expert data to selectively query relevant behaviors from an offline, unlabeled dataset (including many sub-optimal behaviors).
no code implementations • 13 Apr 2023 • Hengyuan Hu, Dorsa Sadigh
One of the fundamental quests of AI is to produce agents that coordinate well with humans.
Multi-agent Reinforcement Learning reinforcement-learning +2
no code implementations • 27 Feb 2023 • Vivek Myers, Erdem Biyik, Dorsa Sadigh
Robot policies need to adapt to human preferences and/or new environments.
1 code implementation • 27 Feb 2023 • Minae Kwon, Sang Michael Xie, Kalesha Bullard, Dorsa Sadigh
During training, the LLM evaluates an RL agent's behavior against the desired behavior described by the prompt and outputs a corresponding reward signal.
2 code implementations • 24 Feb 2023 • Siddharth Karamcheti, Suraj Nair, Annie S. Chen, Thomas Kollar, Chelsea Finn, Dorsa Sadigh, Percy Liang
First, we demonstrate that existing representations yield inconsistent results across these tasks: masked autoencoding approaches pick up on low-level spatial features at the cost of high-level semantics, while contrastive learning approaches capture the opposite.
1 code implementation • 7 Feb 2023 • Andy Shih, Dorsa Sadigh, Stefano Ermon
LHTS is compatible with all likelihood-based models, and optimizes for the long horizon likelihood of samples.
1 code implementation • 6 Jan 2023 • Yuchen Cui, Siddharth Karamcheti, Raj Palleti, Nidhya Shivakumar, Percy Liang, Dorsa Sadigh
Instead of discrete turn-taking between a human and robot, LILAC splits agency between the human and robot: language is an input to a learned model that produces a meaningful, low-dimensional control space that the human can use to guide the robot.
no code implementations • 6 Dec 2022 • Joey Hejna, Dorsa Sadigh
Contrary to most works that focus on query selection to \emph{minimize} the amount of data required for learning reward functions, we take an opposite approach: \emph{expanding} the pool of available data by viewing human-in-the-loop RL through the more flexible lens of multi-task learning.
no code implementations • 26 Nov 2022 • Priya Sundaresan, Suneel Belkhale, Dorsa Sadigh
Acquiring food items with a fork poses an immense challenge to a robot-assisted feeding system, due to the wide range of material properties and visual appearances present across food groups.
no code implementations • 26 Nov 2022 • Jennifer Grannen, Yilin Wu, Suneel Belkhale, Dorsa Sadigh
In order to acquire foods with such diverse properties, we propose stabilizing food items during scooping using a second arm, for example, by pushing peas against the spoon with a flat surface to prevent dispersion.
1 code implementation • 25 Nov 2022 • Megha Srivastava, Erdem Biyik, Suvir Mirchandani, Noah Goodman, Dorsa Sadigh
In this paper, we focus on the problem of assistive teaching of motor control tasks such as parking a car or landing an aircraft.
no code implementations • 4 Nov 2022 • Mengxi Li, Rika Antonova, Dorsa Sadigh, Jeannette Bohg
We demonstrate the effectiveness of our method for designing new tools in several scenarios, such as winding ropes, flipping a box and pushing peas onto a scoop in simulation.
no code implementations • 14 Oct 2022 • Kanishk Gandhi, Siddharth Karamcheti, Madeline Liao, Dorsa Sadigh
Imitation learning from human-provided demonstrations is a strong approach for learning policies for robot manipulation.
no code implementations • 16 Sep 2022 • Yilun Hao, Ruinan Wang, Zhangjie Cao, Zihan Wang, Yuchen Cui, Dorsa Sadigh
Specifically, we design a masked policy network with a binary mask to block certain modalities.
1 code implementation • 26 May 2022 • Andy Shih, Dorsa Sadigh, Stefano Ermon
Conditional inference on arbitrary subsets of variables is a core problem in probabilistic inference with important applications such as masked language modeling and image inpainting.
no code implementations • 8 Mar 2022 • Zhangjie Cao, Erdem Biyik, Guy Rosman, Dorsa Sadigh
At a certain time, to forecast a reasonable future trajectory, each agent needs to pay attention to the interactions with only a small group of most relevant agents instead of unnecessarily paying attention to all the other agents.
no code implementations • 2 Mar 2022 • Zihan Wang, Zhangjie Cao, Yilun Hao, Dorsa Sadigh
Correspondence learning is a fundamental problem in robotics, which aims to learn a mapping between state, action pairs of agents of different dynamics or embodiments.
no code implementations • 7 Feb 2022 • Zhangjie Cao, Zihan Wang, Dorsa Sadigh
Existing learning from demonstration algorithms usually assume access to expert demonstrations.
1 code implementation • 2 Feb 2022 • Mark Beliaev, Andy Shih, Stefano Ermon, Dorsa Sadigh, Ramtin Pedarsani
In this work, we show that unsupervised learning over demonstrator expertise can lead to a consistent boost in the performance of imitation learning algorithms.
no code implementations • 5 Jan 2022 • Andy Shih, Stefano Ermon, Dorsa Sadigh
In this work, we study the problem of conditional multi-agent imitation learning, where we have access to joint trajectory demonstrations at training time, and we must interact with and adapt to new partners at test time.
1 code implementation • 13 Dec 2021 • Bidipta Sarkar, Aditi Talati, Andy Shih, Dorsa Sadigh
We present PantheonRL, a multiagent reinforcement learning software package for dynamic training interactions such as round-robin, adaptive, and ad-hoc training.
1 code implementation • NeurIPS 2021 • Andy Shih, Dorsa Sadigh, Stefano Ermon
Probabilistic circuits (PCs) are a family of generative models which allows for the computation of exact likelihoods and marginals of its probability distributions.
1 code implementation • 5 Nov 2021 • Siddharth Karamcheti, Megha Srivastava, Percy Liang, Dorsa Sadigh
We introduce Language-Informed Latent Actions (LILA), a framework for learning natural language interfaces in the context of human-robot collaboration.
2 code implementations • 28 Oct 2021 • Zhangjie Cao, Yilun Hao, Mengxi Li, Dorsa Sadigh
The goal of learning from demonstrations is to learn a policy for an agent (imitator) by mimicking the behavior in the demonstrations.
no code implementations • 28 Oct 2021 • Nicholas Roy, Ingmar Posner, Tim Barfoot, Philippe Beaudoin, Yoshua Bengio, Jeannette Bohg, Oliver Brock, Isabelle Depatie, Dieter Fox, Dan Koditschek, Tomas Lozano-Perez, Vikash Mansinghka, Christopher Pal, Blake Richards, Dorsa Sadigh, Stefan Schaal, Gaurav Sukhatme, Denis Therien, Marc Toussaint, Michiel Van de Panne
Machine learning has long since become a keystone technology, accelerating science and applications in a broad range of domains.
2 code implementations • NeurIPS 2021 • Songyuan Zhang, Zhangjie Cao, Dorsa Sadigh, Yanan Sui
Our results show that CAIL significantly outperforms other imitation learning methods from demonstrations with varying optimality.
no code implementations • EMNLP 2021 • Julia White, Gabriel Poesia, Robert Hawkins, Dorsa Sadigh, Noah Goodman
An overarching goal of natural language processing is to enable machines to communicate seamlessly with humans.
no code implementations • 5 Oct 2021 • Woodrow Z. Wang, Andy Shih, Annie Xie, Dorsa Sadigh
Instead of reactively adapting to the other agent's (opponent or partner) behavior, we propose an algorithm to proactively influence the other agent's strategy to stabilize -- which can restrain the non-stationarity caused by the other agent.
no code implementations • 2 Oct 2021 • Erdem Biyik, Anusha Lalitha, Rajarshi Saha, Andrea Goldsmith, Dorsa Sadigh
Our results show that the proposed partner-aware strategy outperforms other known methods, and our human subject studies suggest humans prefer to collaborate with AI agents implementing our partner-aware strategy.
1 code implementation • 1 Oct 2021 • Nils Wilde, Erdem Biyik, Dorsa Sadigh, Stephen L. Smith
Today's robots are increasingly interacting with people and need to efficiently learn inexperienced user's preferences.
no code implementations • 27 Sep 2021 • Vivek Myers, Erdem Biyik, Nima Anari, Dorsa Sadigh
However, expert feedback is often assumed to be drawn from an underlying unimodal reward function.
1 code implementation • 16 Aug 2021 • Erdem Biyik, Aditi Talati, Dorsa Sadigh
Reward learning is a fundamental problem in human-robot interaction to have robots that operate in alignment with what their human user wants.
2 code implementations • 16 Aug 2021 • Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang
AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.
no code implementations • 14 Jun 2021 • Minae Kwon, Siddharth Karamcheti, Mariano-Florentino Cuellar, Dorsa Sadigh
This trend additionally holds when comparing agents using our targeted data acquisition framework to variants of agents trained with a mix of supervised learning and reinforcement learning, or to agents using tailored reward functions that explicitly optimize for utility and Pareto-optimality.
no code implementations • 13 May 2021 • Woodrow Z. Wang, Mark Beliaev, Erdem Biyik, Daniel A. Lazar, Ramtin Pedarsani, Dorsa Sadigh
Coordination is often critical to forming prosocial behaviors -- behaviors that increase the overall sum of rewards received by all agents in a multi-agent game.
no code implementations • 6 May 2021 • Erdem Biyik, Daniel A. Lazar, Ramtin Pedarsani, Dorsa Sadigh
Traffic congestion has large economic and social costs.
1 code implementation • 2 May 2021 • Siddharth Karamcheti, Albert J. Zhai, Dylan P. Losey, Dorsa Sadigh
In this work, we develop assistive robots that condition their latent embeddings on visual inputs.
1 code implementation • ICLR 2021 • Andy Shih, Arjun Sawhney, Jovana Kondic, Stefano Ermon, Dorsa Sadigh
Humans can quickly adapt to new partners in collaborative tasks (e. g. playing basketball), because they understand which fundamental skills of the task (e. g. how to dribble, how to shoot) carry over across new partners.
1 code implementation • NeurIPS 2021 • Suvir Mirchandani, Siddharth Karamcheti, Dorsa Sadigh
Building agents capable of understanding language instructions is critical to effective and robust human-AI collaboration.
1 code implementation • 10 Mar 2021 • Zhangjie Cao, Dorsa Sadigh
The proposed score enables learning from more informative demonstrations, and disregarding the less relevant demonstrations.
no code implementations • 10 Feb 2021 • Zhangjie Cao, Minae Kwon, Dorsa Sadigh
The ability for robots to transfer their learned knowledge to new tasks -- where data is scarce -- is a fundamental challenge for successful robot learning.
Transfer Reinforcement Learning Robotics
no code implementations • 28 Dec 2020 • Mark Beliaev, Erdem Biyik, Daniel A. Lazar, Woodrow Z. Wang, Dorsa Sadigh, Ramtin Pedarsani
In turn, significant increases in traffic congestion are expected, since people are likely to prefer using their own vehicles or taxis as opposed to riskier and more crowded options such as the railway.
no code implementations • 12 Nov 2020 • Annie Xie, Dylan P. Losey, Ryan Tolsma, Chelsea Finn, Dorsa Sadigh
We propose a reinforcement learning-based framework for learning latent representations of an agent's policy, where the ego agent identifies the relationship between its behavior and the other agent's future strategy.
1 code implementation • 9 Nov 2020 • Kejun Li, Maegan Tucker, Erdem Biyik, Ellen Novoseller, Joel W. Burdick, Yanan Sui, Dorsa Sadigh, Yisong Yue, Aaron D. Ames
ROIAL learns Bayesian posteriors that predict each exoskeleton user's utility landscape across four exoskeleton gait parameters.
no code implementations • EMNLP (intexsempar) 2020 • Siddharth Karamcheti, Dorsa Sadigh, Percy Liang
Our goal is to create an interactive natural language interface that efficiently and reliably learns from users to complete tasks in simulated robotics settings.
no code implementations • 10 Aug 2020 • Zheqing Zhu, Erdem Biyik, Dorsa Sadigh
Multi-agent safe systems have become an increasingly important area of study as we can now easily have multiple AI-powered systems operating together.
no code implementations • 22 Jul 2020 • Mengxi Li, Dylan P. Losey, Jeannette Bohg, Dorsa Sadigh
Existing approaches to teleoperation typically assume a one-size-fits-all approach, where the designers pre-define a mapping between human inputs and robot actions, and every user must adapt to this mapping over repeated interactions.
1 code implementation • 1 Jul 2020 • Zhangjie Cao, Erdem Biyik, Woodrow Z. Wang, Allan Raventos, Adrien Gaidon, Guy Rosman, Dorsa Sadigh
To address driving in near-accident scenarios, we propose a hierarchical reinforcement and imitation learning (H-ReIL) approach that consists of low-level policies learned by IL for discrete driving modes, and a high-level policy learned by RL that switches between different driving modes.
no code implementations • 24 Jun 2020 • Erdem Biyik, Dylan P. Losey, Malayandi Palan, Nicholas C. Landolfi, Gleb Shevchuk, Dorsa Sadigh
As designing reward functions can be extremely challenging, a more promising approach is to directly learn reward functions from human teachers.
1 code implementation • 27 May 2020 • Shushman Choudhury, Jayesh K. Gupta, Mykel J. Kochenderfer, Dorsa Sadigh, Jeannette Bohg
We consider the problem of dynamically allocating tasks to multiple agents under time window constraints and task completion uncertainty.
1 code implementation • 6 May 2020 • Erdem Biyik, Nicolas Huynh, Mykel J. Kochenderfer, Dorsa Sadigh
Our results in simulations and a user study suggest that our approach can efficiently learn expressive reward functions for robotics tasks.
no code implementations • EMNLP (Eval4NLP) 2020 • Kawin Ethayarajh, Dorsa Sadigh
To this end, we propose BLEU Neighbors, a nearest neighbors model for estimating language quality by using the BLEU score as a kernel function.
no code implementations • 19 Mar 2020 • John Mern, Dorsa Sadigh, Mykel J. Kochenderfer
We show that our proposed representation results in an input space that is a factor of $m!$ smaller for inputs of $m$ objects.
no code implementations • 13 Jan 2020 • Minae Kwon, Erdem Biyik, Aditi Talati, Karan Bhasin, Dylan P. Losey, Dorsa Sadigh
Overall, we extend existing rational human models so that collaborative robots can anticipate and plan around suboptimal human behavior during HRI.
1 code implementation • CONLL 2020 • Robert D. Hawkins, Minae Kwon, Dorsa Sadigh, Noah D. Goodman
To communicate with new partners in new contexts, humans rapidly form new linguistic conventions.
no code implementations • 16 Oct 2019 • Dylan P. Losey, Mengxi Li, Jeannette Bohg, Dorsa Sadigh
When teams of robots collaborate to complete a task, communication is often necessary.
2 code implementations • 10 Oct 2019 • Erdem Biyik, Malayandi Palan, Nicholas C. Landolfi, Dylan P. Losey, Dorsa Sadigh
Robots can learn the right reward function by querying a human expert.
no code implementations • 20 Sep 2019 • Dylan P. Losey, Krishnan Srinivasan, Ajay Mandlekar, Animesh Garg, Dorsa Sadigh
Our insight is that we can make assistive robots easier for humans to control by leveraging latent actions.
Robotics
1 code implementation • 21 Jun 2019 • Malayandi Palan, Nicholas C. Landolfi, Gleb Shevchuk, Dorsa Sadigh
In a user study, we compare our method to a standard IRL method; we find that users rated the robot trained with DemPref as being more successful at learning their desired behavior, and preferred to use the DemPref system (over IRL) to train the robot.
1 code implementation • 19 Jun 2019 • Erdem Biyik, Kenneth Wang, Nima Anari, Dorsa Sadigh
While active learning methods attempt to tackle this issue by labeling only the data samples that give high information, they generally suffer from large computational costs and are impractical in settings where data can be collected in parallel.
no code implementations • 13 May 2019 • Ashwini Pokle, Roberto Martín-Martín, Patrick Goebel, Vincent Chow, Hans M. Ewald, Junwei Yang, Zhenkai Wang, Amir Sadeghian, Dorsa Sadigh, Silvio Savarese, Marynel Vázquez
We present a navigation system that combines ideas from hierarchical planning and machine learning.
no code implementations • 7 May 2019 • John Mern, Dorsa Sadigh, Mykel Kochenderfer
Although deep reinforcement learning has advanced significantly over the past several years, sample efficiency remains a major challenge.
no code implementations • 1 Apr 2019 • Erdem Biyik, Jonathan Margoliash, Shahrouz Ryan Alimo, Dorsa Sadigh
We propose a safe exploration algorithm for deterministic Markov Decision Processes with unknown transition models.
1 code implementation • 14 Feb 2019 • Tianhe Yu, Gleb Shevchuk, Dorsa Sadigh, Chelsea Finn
While reinforcement learning (RL) has the potential to enable robots to autonomously acquire a wide range of skills, in practice, RL usually requires manual, per-task engineering of reward functions, especially in real world settings where aspects of the environment needed to compute progress are not directly accessible.
no code implementations • 13 Oct 2018 • Jaime F. Fisac, Eli Bronstein, Elis Stefansson, Dorsa Sadigh, S. Shankar Sastry, Anca D. Dragan
This mutual dependence, best captured by dynamic game theory, creates a strong coupling between the vehicle's planning and its predictions of other drivers' behavior, and constitutes an open problem with direct implications on the safety and viability of autonomous driving technology.
1 code implementation • 10 Oct 2018 • Erdem Biyik, Dorsa Sadigh
Data generation and labeling are usually an expensive part of learning for robotics.
1 code implementation • NeurIPS 2018 • Jiaming Song, Hongyu Ren, Dorsa Sadigh, Stefano Ermon
Imitation learning algorithms can be used to learn a policy from expert demonstrations without access to a reward signal.
no code implementations • 27 Jun 2016 • Sanjit A. Seshia, Dorsa Sadigh, S. Shankar Sastry
Verified artificial intelligence (AI) is the goal of designing AI-based systems that that have strong, ideally provable, assurances of correctness with respect to mathematically-specified requirements.
no code implementations • 25 Oct 2015 • Dorsa Sadigh, Ashish Kapoor
In this paper, we propose a new logic, Probabilistic Signal Temporal Logic (PrSTL), as an expressive language to define the stochastic properties, and enforce probabilistic guarantees on them.
no code implementations • 7 Dec 2013 • Dorsa Sadigh, Henrik Ohlsson, S. Shankar Sastry, Sanjit A. Seshia
As in robust PCA, it can be problematic to find a suitable regularization parameter.