no code implementations • 13 Jan 2025 • Joydeep Biswas, Don Fussell, Peter Stone, Kristin Patterson, Kristen Procko, Lea Sabatini, Zifan Xu
We describe the development of a one-credit course to promote AI literacy at The University of Texas at Austin.
1 code implementation • 11 Jan 2025 • Stephane Hatgis-Kessell, W. Bradley Knox, Serena Booth, Scott Niekum, Peter Stone
A preference model that poorly describes how humans generate preferences risks learning a poor approximation of the human's reward function.
no code implementations • 12 Dec 2024 • Adam Labiosa, Zhihan Wang, Siddhant Agarwal, William Cong, Geethika Hemkumar, Abhinav Narayan Harish, Benjamin Hong, Josh Kelle, Chen Li, Yuhao Li, Zisen Shao, Peter Stone, Josiah P. Hanna
Robot decision-making in partially observable, real-time, dynamic, and multi-agent environments remains a difficult and unsolved challenge.
no code implementations • 7 Dec 2024 • Harshit Sikchi, Siddhant Agarwal, Pranaya Jajoo, Samyak Parajuli, Caleb Chuck, Max Rudolph, Peter Stone, Amy Zhang, Scott Niekum
In this work, we show that we can achieve a zero-shot language-to-behavior policy by first grounding the imagined sequences in real observations of an unsupervised RL agent and using a closed-form solution to imitation learning that allows the RL agent to mimic the grounded observations.
no code implementations • 29 Nov 2024 • Siddhant Agarwal, Harshit Sikchi, Peter Stone, Amy Zhang
We present \emph{Proto Successor Measure}: the basis set for all possible solutions of Reinforcement Learning in a dynamical system.
1 code implementation • 12 Nov 2024 • William Yue, Bo Liu, Peter Stone
In Partially Observable Markov Decision Processes, integrating an agent's history into memory poses a significant challenge for decision-making.
no code implementations • 30 Oct 2024 • Luisa Mao, Garrett Warnell, Peter Stone, Joydeep Biswas
In autonomous robot navigation, terrain cost assignment is typically performed using a semantics-based paradigm in which terrain is first labeled using a pre-trained semantic classifier and costs are then assigned according to a user-defined mapping between label and cost.
no code implementations • 24 Oct 2024 • Zizhao Wang, Jiaheng Hu, Caleb Chuck, Stephen Chen, Roberto Martín-Martín, Amy Zhang, Scott Niekum, Peter Stone
However, in complex environments with many state factors (e. g., household environments with many objects), learning skills that cover all possible states is impossible, and naively encouraging state diversity often leads to simple skills that are not ideal for solving downstream tasks.
no code implementations • 24 Oct 2024 • Shivin Dass, Jiaheng Hu, Ben Abbatematteo, Peter Stone, Roberto Martín-Martín
Such tasks are ubiquitous in embodied domains, where agents must actively search for the information necessary for each stage of a task, e. g., moving the head of the robot to find information relevant to manipulation, or in multi-robot domains, where one scout robot may search for the information that another robot needs to make informed decisions.
no code implementations • 15 Oct 2024 • Jiaheng Hu, Zizhao Wang, Peter Stone, Roberto Martín-Martín
We propose Disentangled Unsupervised Skill Discovery (DUSDi), a method for learning disentangled skills that can be efficiently reused to solve downstream tasks.
3 code implementations • 13 Oct 2024 • Hojoon Lee, Dongyoon Hwang, Donghu Kim, Hyunseung Kim, Jun Jet Tai, Kaushik Subramanian, Peter R. Wurman, Jaegul Choo, Peter Stone, Takuma Seno
Recent advances in CV and NLP have been largely driven by scaling up the number of network parameters, despite traditional theories suggesting that larger networks are prone to overfitting.
1 code implementation • 3 Oct 2024 • Alexander Levine, Peter Stone, Amy Zhang
Efroni et al. (2022b) has shown that this is possible with a sample complexity that depends only on the size of the controllable latent space, and not on the size of the noise factor.
no code implementations • 1 Oct 2024 • Bo Liu, Mao Ye, Peter Stone, Qiang Liu
A fundamental challenge in continual learning is to balance the trade-off between learning new tasks and remembering the previously acquired knowledge.
no code implementations • 29 Sep 2024 • Linji Wang, Zifan Xu, Peter Stone, Xuesu Xiao
The high cost of real-world data for robotics Reinforcement Learning (RL) leads to the wide usage of simulators.
no code implementations • 25 Sep 2024 • Jiaheng Hu, Rose Hendrix, Ali Farhadi, Aniruddha Kembhavi, Roberto Martin-Martin, Peter Stone, Kuo-Hao Zeng, Kiana Ehsani
How can we break through the performance plateau of these models and elevate their capabilities to new heights?
no code implementations • 7 Aug 2024 • Chen Tang, Ben Abbatematteo, Jiaheng Hu, Rohan Chandra, Roberto Martín-Martín, Peter Stone
Reinforcement learning (RL), particularly its combination with deep neural networks referred to as deep RL (DRL), has shown tremendous promise across a wide range of applications, suggesting its potential for enabling the development of sophisticated robotic behaviors.
1 code implementation • 19 Jul 2024 • Bo Liu, Rui Wang, Lemeng Wu, Yihao Feng, Peter Stone, Qiang Liu
Our experimental results show that Longhorn outperforms state-of-the-art SSMs, including the Mamba model, on standard sequence modeling benchmarks, language modeling, and vision tasks.
no code implementations • 24 Jun 2024 • Yuxin Chen, Chen Tang, Chenran Li, Ran Tian, Wei Zhan, Peter Stone, Masayoshi Tomizuka
Instead of inferring the complete human behavior characteristics, MEReQ infers a residual reward function that captures the discrepancy between the human expert's and the prior policy's underlying reward functions.
no code implementations • 18 Jun 2024 • Miguel Vasco, Takuma Seno, Kenta Kawamoto, Kaushik Subramanian, Peter R. Wurman, Peter Stone
Racing autonomous cars faster than the best human drivers has been a longstanding grand challenge for the fields of Artificial Intelligence and robotics.
no code implementations • 30 May 2024 • Yifeng Zhu, Arisrei Lim, Peter Stone, Yuke Zhu
We present an object-centric approach to empower robots to learn vision-based manipulation skills from human videos.
no code implementations • 26 May 2024 • Rohan Chandra, Haresh Karnan, Negar Mehr, Peter Stone, Joydeep Biswas
In this paper, we present a new multi-agent maximum entropy inverse reinforcement learning algorithm for real world unstructured pedestrian crowds.
no code implementations • 6 May 2024 • Caleb Chuck, Carl Qi, Michael J. Munje, Shuozhe Li, Max Rudolph, Chang Shi, Siddhant Agarwal, Harshit Sikchi, Abhinav Peri, Sarthak Dayal, Evan Kuo, Kavan Mehta, Anthony Wang, Peter Stone, Amy Zhang, Scott Niekum
Reinforcement Learning is a promising tool for learning complex policies even in fast-moving and object-interactive domains where human teleoperation or hard-coded policies might fail.
1 code implementation • 16 Apr 2024 • Caroline Wang, Arrasy Rahman, Ishan Durugkar, Elad Liebman, Peter Stone
POAM is a policy gradient, multi-agent reinforcement learning approach to the NAHT problem, that enables adaptation to diverse teammate behaviors by learning representations of teammate behaviors.
no code implementations • 25 Mar 2024 • Saad Abdul Ghani, Zizhao Wang, Peter Stone, Xuesu Xiao
In our new Dynamic Learning from Learned Hallucination (Dyna-LfLH), we design and learn a novel latent distribution and sample dynamic obstacles from it, so the generated training data can be used to learn a motion planner to navigate in dynamic environments.
1 code implementation • 18 Mar 2024 • Alexander Levine, Peter Stone, Amy Zhang
In this work, we consider the Ex-BMDP model, first proposed by Efroni et al. (2022), which formalizes control problems where observations can be factorized into an action-dependent latent state which evolves deterministically, and action-independent time-correlated noise.
no code implementations • 12 Mar 2024 • Shivin Dass, Wensi Ai, Yuqian Jiang, Samik Singh, Jiaheng Hu, Ruohan Zhang, Peter Stone, Ben Abbatematteo, Roberto Martín-Martín
This problem is more severe in mobile manipulation, where collecting demonstrations is harder than in stationary manipulation due to the lack of available and easy-to-use teleoperation interfaces.
no code implementations • 6 Mar 2024 • Zifan Xu, Amir Hossain Raj, Xuesu Xiao, Peter Stone
To address the inefficiency of tracking distant navigation goals, we introduce a hierarchical locomotion controller that combines a classical planner tasked with planning waypoints to reach a faraway global goal location, and an RL-based policy trained to follow these waypoints by generating low-level motion commands.
no code implementations • 3 Mar 2024 • Ziping Xu, Zifan Xu, Runxuan Jiang, Peter Stone, Ambuj Tewari
Multitask Reinforcement Learning (MTRL) approaches have gained increasing attention for its wide applications in many important Reinforcement Learning (RL) tasks.
no code implementations • 23 Jan 2024 • Zizhao Wang, Caroline Wang, Xuesu Xiao, Yuke Zhu, Peter Stone
Two desiderata of reinforcement learning (RL) algorithms are the ability to learn from relatively little experience and the ability to learn policies that generalize to a range of problem specifications.
1 code implementation • 4 Jan 2024 • William Yue, Bo Liu, Peter Stone
Deep generative replay has emerged as a promising approach for continual learning in decision-making tasks.
no code implementations • 7 Dec 2023 • Zifan Xu, Haozhu Wang, Dmitriy Bespalov, Xian Wu, Peter Stone, Yanjun Qi
Instead, this paper introduces a new approach named Latent Reasoning Skills (LaRS) that employs unsupervised learning to create a latent space representation of rationales, with a latent variable called a reasoning skill.
no code implementations • 15 Nov 2023 • Sveta Paster, Kantwon Rogers, Gordon Briggs, Peter Stone, Reuth Mirsky
With the projected surge in the elderly population, service robots offer a promising avenue to enhance their well-being in elderly care homes.
no code implementations • 22 Oct 2023 • Yifeng Zhu, Zhenyu Jiang, Peter Stone, Yuke Zhu
We introduce GROOT, an imitation learning method for learning robust policies with object-centric and 3D priors.
no code implementations • 10 Oct 2023 • Siddhant Agarwal, Ishan Durugkar, Peter Stone, Amy Zhang
We further introduce an entropy-regularized policy optimization objective, that we call $state$-MaxEnt RL (or $s$-MaxEnt RL) as a special case of our objective.
no code implementations • 10 Oct 2023 • Carson Stark, Bohkyung Chun, Casey Charleston, Varsha Ravi, Luis Pabon, Surya Sunkari, Tarun Mohan, Peter Stone, Justin Hart
This work introduces a robotics platform which embeds a conversational AI agent in an embodied system for natural language understanding and intelligent decision-making for service tasks; integrating task planning and human-like conversation.
1 code implementation • 3 Oct 2023 • W. Bradley Knox, Stephane Hatgis-Kessell, Sigurdur Orn Adalgeirsson, Serena Booth, Anca Dragan, Peter Stone, Scott Niekum
Most recent work assumes that human preferences are generated based only upon the reward accrued within those segments, or their partial return.
no code implementations • 26 Sep 2023 • Haresh Karnan, Elvin Yang, Daniel Farkash, Garrett Warnell, Joydeep Biswas, Peter Stone
Terrain awareness, i. e., the ability to identify and distinguish different types of terrain, is a critical ability that robots must have to succeed at autonomous off-road navigation.
no code implementations • 18 Sep 2023 • Haresh Karnan, Elvin Yang, Garrett Warnell, Joydeep Biswas, Peter Stone
In this work, we posit that operator preferences for visually novel terrains, which the robot should adhere to, can often be extrapolated from established terrain references within the inertial, proprioceptive, and tactile domain.
no code implementations • 28 Aug 2023 • Elad Liebman, Peter Stone
This research fills this gap by reporting the results of an experiment in which human participants were required to complete a task in the presence of an autonomous agent while listening to background music.
no code implementations • 18 Aug 2023 • Arrasy Rahman, Jiaxun Cui, Peter Stone
In this work, we first propose that maximizing an AHT agent's robustness requires it to emulate policies in the minimum coverage set (MCS), the set of best-response policies to any partner policies in the environment.
no code implementations • 29 Jun 2023 • Anthony Francis, Claudia Pérez-D'Arpino, Chengshu Li, Fei Xia, Alexandre Alahi, Rachid Alami, Aniket Bera, Abhijat Biswas, Joydeep Biswas, Rohan Chandra, Hao-Tien Lewis Chiang, Michael Everett, Sehoon Ha, Justin Hart, Jonathan P. How, Haresh Karnan, Tsang-Wei Edward Lee, Luis J. Manso, Reuth Mirksy, Sören Pirk, Phani Teja Singamaneni, Peter Stone, Ada V. Taylor, Peter Trautman, Nathan Tsoi, Marynel Vázquez, Xuesu Xiao, Peng Xu, Naoki Yokoyama, Alexander Toshev, Roberto Martín-Martín
A major challenge to deploying robots widely is navigation in human-populated environments, commonly referred to as social robot navigation.
no code implementations • 12 Jun 2023 • Dustin Morrill, Thomas J. Walsh, Daniel Hernandez, Peter R. Wurman, Peter Stone
Empirical results demonstrate that RPOSST finds a small set of test cases that identify high quality policies in a toy one-shot game, poker datasets, and a high-fidelity racing simulator.
2 code implementations • NeurIPS 2023 • Bo Liu, Yihao Feng, Peter Stone, Qiang Liu
One of the grand enduring goals of AI is to create generalist agents that can learn multiple different tasks from diverse data via multitask learning (MTL).
no code implementations • 4 May 2023 • Jiaheng Hu, Peter Stone, Roberto Martín-Martín
Current approaches often segregate tasks into navigation without manipulation and stationary manipulation without locomotion by manually matching parts of the action space to MoMa sub-objectives (e. g. learning base actions for locomotion objectives and learning arm actions for manipulation).
1 code implementation • 22 Apr 2023 • Bo Liu, Yuqian Jiang, Xiaohan Zhang, Qiang Liu, Shiqi Zhang, Joydeep Biswas, Peter Stone
LLM+P takes in a natural language description of a planning problem, then returns a correct (or optimal) plan for solving that problem in natural language.
no code implementations • 18 Jan 2023 • Megan M. Baker, Alexander New, Mario Aguilar-Simon, Ziad Al-Halah, Sébastien M. R. Arnold, Ese Ben-Iwhiwhu, Andrew P. Brna, Ethan Brooks, Ryan C. Brown, Zachary Daniels, Anurag Daram, Fabien Delattre, Ryan Dellana, Eric Eaton, Haotian Fu, Kristen Grauman, Jesse Hostetler, Shariq Iqbal, Cassandra Kent, Nicholas Ketz, Soheil Kolouri, George Konidaris, Dhireesha Kudithipudi, Erik Learned-Miller, Seungwon Lee, Michael L. Littman, Sandeep Madireddy, Jorge A. Mendez, Eric Q. Nguyen, Christine D. Piatko, Praveen K. Pilly, Aswin Raghavan, Abrar Rahman, Santhosh Kumar Ramakrishnan, Neale Ratzlaff, Andrea Soltoggio, Peter Stone, Indranil Sur, Zhipeng Tang, Saket Tiwari, Kyle Vedder, Felix Wang, Zifan Xu, Angel Yanguas-Gil, Harel Yedidsion, Shangqun Yu, Gautam K. Vallabha
Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to "real world" events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and systems will instead need to adapt to novel distributions and tasks while deployed.
no code implementations • 16 Dec 2022 • Hager Radi, Josiah P. Hanna, Peter Stone, Matthew E. Taylor
In our setting, we assume a source of data, which we split into a train-set, to learn an offline policy, and a test-set, to estimate a lower-bound on the offline policy using off-policy evaluation with bootstrapping.
no code implementations • 8 Nov 2022 • Eddy Hudson, Ishan Durugkar, Garrett Warnell, Peter Stone
Given a dataset of expert agent interactions with an environment of interest, a viable method to extract an effective agent policy is to estimate the maximum likelihood policy indicated by this data.
no code implementations • 1 Nov 2022 • Varun Kompella, Thomas J. Walsh, Samuel Barrett, Peter Wurman, Peter Stone
Experience replay (ER) is a crucial component of many deep reinforcement learning (RL) systems.
no code implementations • 31 Oct 2022 • Peter Stone, Rodney Brooks, Erik Brynjolfsson, Ryan Calo, Oren Etzioni, Greg Hager, Julia Hirschberg, Shivaram Kalyanakrishnan, Ece Kamar, Sarit Kraus, Kevin Leyton-Brown, David Parkes, William Press, AnnaLee Saxenian, Julie Shah, Milind Tambe, Astro Teller
In September 2016, Stanford's "One Hundred Year Study on Artificial Intelligence" project (AI100) issued the first report of its planned long-term periodic assessment of artificial intelligence (AI) and its impact on society.
no code implementations • 26 Oct 2022 • Caroline Wang, Garrett Warnell, Peter Stone
While combining imitation learning (IL) and reinforcement learning (RL) is a promising way to address poor sample efficiency in autonomous behavior acquisition, methods that do so typically assume that the requisite behavior demonstrations are provided by an expert that behaves optimally with respect to a task reward.
1 code implementation • 20 Oct 2022 • Vaibhav Bajaj, Guni Sharon, Peter Stone
Applying reinforcement learning (RL) to sparse reward domains is notoriously challenging due to insufficient guiding signals.
1 code implementation • 10 Oct 2022 • Zifan Xu, Bo Liu, Xuesu Xiao, Anirudh Nair, Peter Stone
Deep reinforcement learning (RL) has brought many successes for autonomous robot navigation.
1 code implementation • 19 Sep 2022 • Mao Ye, Bo Liu, Stephen Wright, Peter Stone, Qiang Liu
Bilevel optimization (BO) is useful for solving a variety of important machine learning problems including but not limited to hyperparameter optimization, meta-learning, continual learning, and reinforcement learning.
2 code implementations • 17 Aug 2022 • Bo Liu, Yihao Feng, Qiang Liu, Peter Stone
Furthermore, we introduce the metric residual network (MRN) that deliberately decomposes the action-value function Q(s, a, g) into the negated summation of a metric plus a residual asymmetric component.
1 code implementation • 27 Jun 2022 • Zizhao Wang, Xuesu Xiao, Zifan Xu, Yuke Zhu, Peter Stone
Learning dynamics models accurately is an important goal for Model-Based Reinforcement Learning (MBRL), but most MBRL methods learn a dense dynamics model which is vulnerable to spurious correlations and therefore generalizes poorly to unseen states.
no code implementations • 24 Jun 2022 • James Macglashan, Evan Archer, Alisa Devlic, Takuma Seno, Craig Sherstan, Peter R. Wurman, Peter Stone
These value estimates provide insight into an agent's learning and decision-making process and enable new training methods to mitigate common problems.
no code implementations • 5 Jun 2022 • W. Bradley Knox, Stephane Hatgis-Kessell, Serena Booth, Scott Niekum, Peter Stone, Alessandro Allievi
We empirically show that our proposed regret preference model outperforms the partial return preference model with finite training data in otherwise the same setting.
1 code implementation • 1 Jun 2022 • Caroline Wang, Ishan Durugkar, Elad Liebman, Peter Stone
The theoretical analysis shows that under certain conditions, each agent minimizing its individual distribution mismatch allows the convergence to the joint policy that generated the target distribution.
Multi-agent Reinforcement Learning
reinforcement-learning
+3
1 code implementation • CVPR 2022 • Jiaxun Cui, Hang Qiu, Dian Chen, Peter Stone, Yuke Zhu
To evaluate our model, we develop AutoCastSim, a network-augmented driving simulation framework with example accident-prone scenarios.
no code implementations • 11 Apr 2022 • Akarsh Kumar, Bo Liu, Risto Miikkulainen, Peter Stone
GESMR co-evolves a population of solutions and a population of MRs, such that each MR is assigned to a group of solutions.
no code implementations • 30 Mar 2022 • Haresh Karnan, Kavan Singh Sikand, Pranav Atreya, Sadegh Rabiee, Xuesu Xiao, Garrett Warnell, Peter Stone, Joydeep Biswas
In this paper, we hypothesize that to enable accurate high-speed off-road navigation using a learned IKD model, in addition to inertial information from the past, one must also anticipate the kinodynamic interactions of the vehicle with the terrain in the future.
no code implementations • 28 Mar 2022 • Haresh Karnan, Anirudh Nair, Xuesu Xiao, Garrett Warnell, Soeren Pirk, Alexander Toshev, Justin Hart, Joydeep Biswas, Peter Stone
Social navigation is the capability of an autonomous agent, such as a robot, to navigate in a 'socially compliant' manner in the presence of other intelligent agents such as humans.
1 code implementation • 24 Mar 2022 • Bo Liu, Qiang Liu, Peter Stone
As intelligent agents become autonomous over longer periods of time, they may eventually become lifelong counterparts to specific people.
no code implementations • 19 Feb 2022 • Shahaf S. Shperberg, Bo Liu, Peter Stone
When humans make catastrophic mistakes, they are expected to learn never to repeat them, such as a toddler who touches a hot stove and immediately learns never to do so again.
no code implementations • 16 Feb 2022 • Reuth Mirsky, Ignacio Carlucho, Arrasy Rahman, Elliot Fosong, William Macke, Mohan Sridharan, Peter Stone, Stefano V. Albrecht
Ad hoc teamwork is the research problem of designing agents that can collaborate with new teammates without prior coordination.
no code implementations • 1 Feb 2022 • Haresh Karnan, Garrett Warnell, Faraz Torabi, Peter Stone
The imitation learning research community has recently made significant progress towards the goal of enabling artificial agents to imitate behaviors from video demonstrations alone.
1 code implementation • 3 Dec 2021 • Yulin Zhang, William Macke, Jiaxun Cui, Daniel Urieli, Peter Stone
This article establishes for the first time that a multiagent driving policy can be trained in such a way that it generalizes to different traffic flows, AV penetration, and road geometries, including on multi-lane roads.
no code implementations • 25 Nov 2021 • Kingsley Nweye, Bo Liu, Peter Stone, Zoltan Nagy
Building upon prior research that highlighted the need for standardizing environments for building control research, and inspired by recently introduced challenges for real life reinforcement learning control, here we propose a non-exhaustive set of nine real world challenges for reinforcement learning control in grid-interactive buildings.
Model Predictive Control
Multi-agent Reinforcement Learning
+3
4 code implementations • NeurIPS 2021 • Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, Qiang Liu
The goal of multi-task learning is to enable more efficient learning than single task learning by sharing model structures for a diverse set of tasks.
no code implementations • 28 Sep 2021 • Yifeng Zhu, Peter Stone, Yuke Zhu
From the task structures of multi-task demonstrations, we identify skills based on the recurring patterns and train goal-conditioned sensorimotor policies with hierarchical imitation learning.
no code implementations • 13 Jul 2021 • Ruohan Zhang, Faraz Torabi, Garrett Warnell, Peter Stone
A longstanding goal of artificial intelligence is to create artificial agents capable of learning to perform tasks that require sequential decision making.
no code implementations • 23 Jun 2021 • Reuth Mirsky, Xuesu Xiao, Justin Hart, Peter Stone
This survey aims to bridge this gap by introducing such a common language, using it to survey existing work, and highlighting open problems.
1 code implementation • 8 Jun 2021 • Ghada Sokar, Elena Mocanu, Decebal Constantin Mocanu, Mykola Pechenizkiy, Peter Stone
In this paper, we introduce for the first time a dynamic sparse training approach for deep reinforcement learning to accelerate the training process.
1 code implementation • NeurIPS 2021 • Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone
In this paper, we investigate whether one such objective, the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution, can be utilized effectively for reinforcement learning (RL) tasks.
no code implementations • 19 May 2021 • Haresh Karnan, Garrett Warnell, Xuesu Xiao, Peter Stone
Is imitation learning for vision based autonomous navigation even possible in such scenarios?
1 code implementation • 18 May 2021 • Bo Liu, Qiang Liu, Peter Stone, Animesh Garg, Yuke Zhu, Animashree Anandkumar
Specifically, we 1) adopt the attention mechanism for both the coach and the players; 2) propose a variational objective to regularize learning; and 3) design an adaptive communication method to let the coach decide when to communicate with the players.
Multi-agent Reinforcement Learning
reinforcement-learning
+4
no code implementations • 8 May 2021 • Eddy Hudson, Garrett Warnell, Peter Stone
While Adversarial Imitation Learning (AIL) algorithms have recently led to state-of-the-art results on various imitation learning benchmarks, it is unclear as to what impact various design decisions have on performance.
no code implementations • 28 Apr 2021 • W. Bradley Knox, Alessandro Allievi, Holger Banzhaf, Felix Schmitt, Peter Stone
This article considers the problem of diagnosing certain common errors in reward design.
no code implementations • 15 Apr 2021 • Eddy Hudson, Garrett Warnell, Faraz Torabi, Peter Stone
Learning from demonstrations in the wild (e. g. YouTube videos) is a tantalizing goal in imitation learning.
no code implementations • 9 Apr 2021 • Harel Yedidsion, Shani Alkoby, Peter Stone
Chore division is a class of fair division problems in which some undesirable "resource" must be shared among a set of participants, with each participant wanting to get as little as possible.
no code implementations • 31 Mar 2021 • Faraz Torabi, Garrett Warnell, Peter Stone
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
1 code implementation • 9 Mar 2021 • Harel Yedidsion, Jennifer Suriadinata, Zifan Xu, Stefan Debruyn, Peter Stone
In this problem, the goal is to find a set of objects as quickly as possible, given probability distributions of where they may be found.
no code implementations • 1 Mar 2021 • William Macke, Reuth Mirsky, Peter Stone
We then present a novel planning algorithm for ad hoc teamwork, determining which query to ask and planning accordingly.
1 code implementation • 26 Feb 2021 • Jiaxun Cui, William Macke, Harel Yedidsion, Daniel Urieli, Peter Stone
Next, we propose a modular transfer reinforcement learning approach, and use it to scale up a multiagent driving policy to outperform human-like traffic and existing approaches in a simulated realistic scenario, which is an order of magnitude larger than past scenarios (hundreds instead of tens of vehicles).
1 code implementation • NeurIPS 2020 • Lemeng Wu, Bo Liu, Peter Stone, Qiang Liu
We propose firefly neural architecture descent, a general framework for progressively and dynamically growing neural networks to jointly optimize the networks' parameters and architectures.
no code implementations • 1 Jan 2021 • Bo Liu, Qiang Liu, Peter Stone, Animesh Garg, Yuke Zhu, Anima Anandkumar
The performance of our method is comparable or even better than the setting where all players have a full view of the environment, but no coach.
no code implementations • NeurIPS 2021 • Sihang Guo, Ruohan Zhang, Bo Liu, Yifeng Zhu, Mary Hayhoe, Dana Ballard, Peter Stone
1) How similar are the visual representations learned by RL agents and humans when performing the same task?
1 code implementation • 20 Oct 2020 • Varun Kompella, Roberto Capobianco, Stacy Jong, Jonathan Browne, Spencer Fox, Lauren Meyers, Peter Wurman, Peter Stone
The year 2020 has seen the COVID-19 virus lead to one of the worst global pandemics in history.
1 code implementation • 29 Sep 2020 • Yunshu Du, Garrett Warnell, Assefaw Gebremedhin, Peter Stone, Matthew E. Taylor
In this work, we introduce Lucid Dreaming for Experience Replay (LiDER), a conceptually new framework that allows replay experiences to be refreshed by leveraging the agent's current policy.
1 code implementation • 28 Sep 2020 • Yuchen Cui, Qiping Zhang, Alessandro Allievi, Peter Stone, Scott Niekum, W. Bradley Knox
We train a deep neural network on this data and demonstrate its ability to (1) infer relative reward ranking of events in the training task from prerecorded human facial reactions; (2) improve the policy of an agent in the training task using live human facial reactions; and (3) transfer to a novel domain in which it evaluates robot manipulation trajectories.
Human-Computer Interaction Robotics
no code implementations • ICML 2020 • Brahma Pavse, Ishan Durugkar, Josiah Hanna, Peter Stone
In this batch setting, we show that TD(0) may converge to an inaccurate value function because the update following an action is weighted according to the number of times that action occurred in the batch -- not the true probability of the action under the given policy.
no code implementations • NeurIPS 2020 • Siddharth Desai, Ishan Durugkar, Haresh Karnan, Garrett Warnell, Josiah Hanna, Peter Stone
We examine the problem of transferring a policy learned in a source environment to a target environment with different dynamics, particularly in the case where it is critical to reduce the amount of interaction with the target environment during learning.
no code implementations • 3 Jul 2020 • Yuqian Jiang, Sudarshanan Bharadwaj, Bo Wu, Rishi Shah, Ufuk Topcu, Peter Stone
Reward shaping is a common approach for incorporating domain knowledge into reinforcement learning in order to speed up convergence to an optimal policy.
no code implementations • 17 Jun 2020 • Elad Liebman, Peter Stone
Computers have been used to analyze and create music since they were first introduced in the 1950s and 1960s.
no code implementations • ICML Workshop LifelongML 2020 • Sanmit Narvekar, Peter Stone
However, there is structure that can be exploited between tasks and agents, such that knowledge gained developing a curriculum for one task can be reused to speed up creating a curriculum for a new task.
no code implementations • 31 May 2020 • Rishi Shah, Yuqian Jiang, Justin Hart, Peter Stone
Coverage path planning is a well-studied problem in robotics in which a robot must plan a path that passes through every point in a given area repeatedly, usually with a uniform frequency.
no code implementations • SIGDIAL (ACL) 2020 • Keting Lu, Shiqi Zhang, Peter Stone, Xiaoping Chen
More interestingly, the robot was able to learn from navigation tasks to improve its dialog strategies.
no code implementations • 18 Apr 2020 • Shiqi Zhang, Piyush Khandelwal, Peter Stone
Robot sequential decision-making in the real world is a challenge because it requires the robots to simultaneously reason about the current world state and dynamics, while planning actions to accomplish complex tasks.
no code implementations • 31 Mar 2020 • Xuesu Xiao, Bo Liu, Garrett Warnell, Jonathan Fink, Peter Stone
Existing autonomous robot navigation systems allow robots to move from one point to another in a collision-free manner.
no code implementations • 10 Mar 2020 • Sanmit Narvekar, Bei Peng, Matteo Leonetti, Jivko Sinapov, Matthew E. Taylor, Peter Stone
Reinforcement learning (RL) is a popular paradigm for addressing sequential decision tasks in which the agent has only limited environmental feedback.
no code implementations • 21 Sep 2019 • Ruohan Zhang, Faraz Torabi, Lin Guan, Dana H. Ballard, Peter Stone
Reinforcement learning agents can learn to solve sequential decision tasks by interacting with the environment.
no code implementations • 14 Sep 2019 • Rishi Shah, Yuqian Jiang, Haresh Karnan, Gilberto Briscoe-Martinez, Dominick Mulder, Ryan Gupta, Rachel Schlossman, Marika Murphy, Justin W. Hart, Luis Sentis, Peter Stone
RoboCup@Home is an international robotics competition based on domestic tasks requiring autonomous capabilities pertaining to a large variety of AI technologies.
no code implementations • 18 Jun 2019 • Brahma S. Pavse, Faraz Torabi, Josiah P. Hanna, Garrett Warnell, Peter Stone
Augmenting reinforcement learning with imitation learning is often hailed as a method by which to improve upon learning from scratch.
no code implementations • 18 Jun 2019 • Faraz Torabi, Sean Geiger, Garrett Warnell, Peter Stone
We test our algorithm and conduct experiments using an imitation task on a physical robot arm and its simulated version in Gazebo and will show the improvement in learning rate and efficiency.
no code implementations • 30 May 2019 • Faraz Torabi, Garrett Warnell, Peter Stone
Imitation learning is the process by which one agent tries to learn how to perform a certain task using information generated by another, often more-expert agent performing that same task.
no code implementations • 22 May 2019 • Faraz Torabi, Garrett Warnell, Peter Stone
Classically, imitation learning algorithms have been developed for idealized situations, e. g., the demonstrations are often required to be collected in the exact same environment and usually include the demonstrator's actions.
no code implementations • ICLR 2019 • Ishan Durugkar, Bo Liu, Peter Stone
Temporal Difference learning with function approximation has been widely used recently and has led to several successful results.
1 code implementation • 1 Mar 2019 • Jesse Thomason, Aishwarya Padmakumar, Jivko Sinapov, Nick Walker, Yuqian Jiang, Harel Yedidsion, Justin Hart, Peter Stone, Raymond J. Mooney
Natural language understanding for robotics can require substantial domain- and platform-specific engineering.
no code implementations • 22 Dec 2018 • Jacob Menashe, Peter Stone
We show that the ERD presents a suite of challenges with scalable difficulty to provide a smooth learning gradient from Taxi to the Arcade Learning Environment.
1 code implementation • 1 Dec 2018 • Sanmit Narvekar, Peter Stone
Curriculum learning in reinforcement learning is a training methodology that seeks to speed up learning of a difficult target task, by first training on a series of simpler tasks and transferring the knowledge acquired to the target task.
no code implementations • 21 Nov 2018 • Yuqian Jiang, Fangkai Yang, Shiqi Zhang, Peter Stone
In the outer loop, the plan is executed, and the robot learns from the execution experience via model-free RL, to further improve its task-motion plans.
no code implementations • 28 Sep 2018 • Keting Lu, Shiqi Zhang, Peter Stone, Xiaoping Chen
In this work, we integrate logical-probabilistic KRR with model-based RL, enabling agents to simultaneously reason with declarative knowledge and learn from interaction experiences.
1 code implementation • 15 Sep 2018 • Prabhat Nagarajan, Garrett Warnell, Peter Stone
One by one, we then allow individual sources of nondeterminism to affect our otherwise deterministic implementation, and measure the impact of each source on the variance in performance.
no code implementations • EMNLP 2018 • Aishwarya Padmakumar, Peter Stone, Raymond J. Mooney
Active learning identifies data points to label that are expected to be the most useful in improving a supervised model.
no code implementations • 23 Aug 2018 • Barbara J. Grosz, Peter Stone
In September 2016, Stanford's "One Hundred Year Study on Artificial Intelligence" project (AI100) issued the first report of its planned long-term periodic assessment of artificial intelligence (AI) and its impact on society.
1 code implementation • 17 Jul 2018 • Faraz Torabi, Garrett Warnell, Peter Stone
Imitation from observation (IfO) is the problem of learning directly from state-only demonstrations without having access to the demonstrator's actions.
1 code implementation • 4 Jun 2018 • Josiah P. Hanna, Scott Niekum, Peter Stone
We find that this estimator often lowers the mean squared error of off-policy evaluation compared to importance sampling with the true behavior policy or using a behavior policy that is estimated from a separate data set.
6 code implementations • 4 May 2018 • Faraz Torabi, Garrett Warnell, Peter Stone
In this work, we propose a two-phase, autonomous imitation learning technique called behavioral cloning from observation (BCO), that aims to provide improved performance with respect to both of these aspects.
no code implementations • 23 Apr 2018 • Yuqian Jiang, Shiqi Zhang, Piyush Khandelwal, Peter Stone
PDDL is designed for task planning, and PDDL-based planners are widely used for a variety of planning problems.
no code implementations • ICLR 2018 • Ishan Durugkar, Peter Stone
In this work we propose a constraint on the TD update that minimizes change to the target values.
2 code implementations • 28 Sep 2017 • Garrett Warnell, Nicholas Waytowich, Vernon Lawhern, Peter Stone
While recent advances in deep reinforcement learning have allowed autonomous learning agents to succeed at a variety of complex tasks, existing algorithms generally require a lot of training data.
no code implementations • 27 Sep 2017 • Guni Sharon, Michael Albert, Tarun Rambha, Stephen Boyles, Peter Stone
This paper focuses on two commonly used path assignment policies for agents traversing a congested network: self-interested routing, and system-optimum routing.
no code implementations • 23 Sep 2017 • Stefano V. Albrecht, Peter Stone
Much research in artificial intelligence is concerned with the development of autonomous agents that can interact effectively with other agents.
2 code implementations • 15 Jul 2017 • Decebal Constantin Mocanu, Elena Mocanu, Peter Stone, Phuong H. Nguyen, Madeleine Gibescu, Antonio Liotta
Through the success of deep learning in various domains, artificial neural networks are currently among the most used artificial intelligence methods.
1 code implementation • ICML 2017 • Josiah P. Hanna, Philip S. Thomas, Peter Stone, Scott Niekum
The standard unbiased technique for evaluating a policy is to deploy the policy and observe its performance.
no code implementations • 18 Oct 2016 • Decebal Constantin Mocanu, Maria Torres Vega, Eric Eaton, Peter Stone, Antonio Liotta
Conceived in the early 1990s, Experience Replay (ER) has been shown to be a successful mechanism to allow online learning algorithms to reuse past experiences.
no code implementations • 20 Jun 2016 • Josiah P. Hanna, Peter Stone, Scott Niekum
In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data in both continuous and discrete state spaces.
7 code implementations • 13 Nov 2015 • Matthew Hausknecht, Peter Stone
Recent work has shown that deep neural networks are capable of approximating both value functions and policies in reinforcement learning domains featuring continuous state and action spaces.
5 code implementations • 23 Jul 2015 • Matthew Hausknecht, Peter Stone
Deep Reinforcement Learning has yielded proficient controllers for complex tasks.
no code implementations • 26 Feb 2015 • Elad Liebman, Benny Chor, Peter Stone
This paper considers the problem of representative selection: choosing a subset of data points from a dataset that best represents its overall set of elements.
no code implementations • 9 Jan 2014 • Elad Liebman, Maytal Saar-Tsechansky, Peter Stone
In this work we present DJ-MC, a novel reinforcement-learning framework for music recommendation that does not recommend songs individually but rather song sequences, or playlists, based on a model of preferences for both songs and song transitions.