Search Results for author: Peter Stone

Found 78 papers, 20 papers with code

Causal Dynamics Learning for Task-Independent State Abstraction

no code implementations27 Jun 2022 Zizhao Wang, Xuesu Xiao, Zifan Xu, Yuke Zhu, Peter Stone

Learning dynamics models accurately is an important goal for Model-Based Reinforcement Learning (MBRL), but most MBRL methods learn a dense dynamics model which is vulnerable to spurious correlations and therefore generalizes poorly to unseen states.

Model-based Reinforcement Learning reinforcement-learning

Value Function Decomposition for Iterative Design of Reinforcement Learning Agents

no code implementations24 Jun 2022 James Macglashan, Evan Archer, Alisa Devlic, Takuma Seno, Craig Sherstan, Peter R. Wurman, Peter Stone

These value estimates provide insight into an agent's learning and decision-making process and enable new training methods to mitigate common problems.

High-Speed Accurate Robot Control using Learned Forward Kinodynamics and Non-linear Least Squares Optimization

no code implementations16 Jun 2022 Pranav Atreya, Haresh Karnan, Kavan Singh Sikand, Xuesu Xiao, Garrett Warnell, Sadegh Rabiee, Peter Stone, Joydeep Biswas

However a learned inverse kinodynamic model can only be applied to a limited class of control problems, and different control problems require the learning of a new IKD model.

Models of human preference for learning reward functions

no code implementations5 Jun 2022 W. Bradley Knox, Stephane Hatgis-Kessell, Serena Booth, Scott Niekum, Peter Stone, Alessandro Allievi

One promising method for alignment is to learn the reward function from human-generated preferences between pairs of trajectory segments.

Decision Making

DM$^2$: Distributed Multi-Agent Reinforcement Learning for Distribution Matching

no code implementations1 Jun 2022 Caroline Wang, Ishan Durugkar, Elad Liebman, Peter Stone

The theoretical analysis shows that under some conditions, if each agent optimizes their individual distribution matching objective, the agents increase a lower bound on the objective of matching the joint expert policy, allowing convergence to the joint expert policy.

Multi-agent Reinforcement Learning reinforcement-learning +1

COOPERNAUT: End-to-End Driving with Cooperative Perception for Networked Vehicles

no code implementations CVPR 2022 Jiaxun Cui, Hang Qiu, Dian Chen, Peter Stone, Yuke Zhu

To evaluate our model, we develop AutoCastSim, a network-augmented driving simulation framework with example accident-prone scenarios.

Autonomous Driving

Effective Mutation Rate Adaptation through Group Elite Selection

no code implementations11 Apr 2022 Akarsh Kumar, Bo Liu, Risto Miikkulainen, Peter Stone

GESMR co-evolves a population of solutions and a population of MRs, such that each MR is assigned to a group of solutions.

Image Classification

VI-IKD: High-Speed Accurate Off-Road Navigation using Learned Visual-Inertial Inverse Kinodynamics

no code implementations30 Mar 2022 Haresh Karnan, Kavan Singh Sikand, Pranav Atreya, Sadegh Rabiee, Xuesu Xiao, Garrett Warnell, Peter Stone, Joydeep Biswas

In this paper, we hypothesize that to enable accurate high-speed off-road navigation using a learned IKD model, in addition to inertial information from the past, one must also anticipate the kinodynamic interactions of the vehicle with the terrain in the future.

Socially Compliant Navigation Dataset (SCAND): A Large-Scale Dataset of Demonstrations for Social Navigation

no code implementations28 Mar 2022 Haresh Karnan, Anirudh Nair, Xuesu Xiao, Garrett Warnell, Soeren Pirk, Alexander Toshev, Justin Hart, Joydeep Biswas, Peter Stone

Social navigation is the capability of an autonomous agent, such as a robot, to navigate in a 'socially compliant' manner in the presence of other intelligent agents such as humans.

Imitation Learning Robot Navigation

Continual Learning and Private Unlearning

no code implementations24 Mar 2022 Bo Liu, Qiang Liu, Peter Stone

As intelligent agents become autonomous over longer periods of time, they may eventually become lifelong counterparts to specific people.

Continual Learning

Learning a Shield from Catastrophic Action Effects: Never Repeat the Same Mistake

no code implementations19 Feb 2022 Shahaf S. Shperberg, Bo Liu, Peter Stone

When humans make catastrophic mistakes, they are expected to learn never to repeat them, such as a toddler who touches a hot stove and immediately learns never to do so again.

Continual Learning Safe Reinforcement Learning

A Survey of Ad Hoc Teamwork: Definitions, Methods, and Open Problems

no code implementations16 Feb 2022 Reuth Mirsky, Ignacio Carlucho, Arrasy Rahman, Elliot Fosong, William Macke, Mohan Sridharan, Peter Stone, Stefano V. Albrecht

Ad hoc teamwork is the well-established research problem of designing agents that can collaborate with new teammates without prior coordination.

Adversarial Imitation Learning from Video using a State Observer

no code implementations1 Feb 2022 Haresh Karnan, Garrett Warnell, Faraz Torabi, Peter Stone

The imitation learning research community has recently made significant progress towards the goal of enabling artificial agents to imitate behaviors from video demonstrations alone.

Continuous Control Imitation Learning

Real-world challenges for multi-agent reinforcement learning in grid-interactive buildings

no code implementations25 Nov 2021 Kingsley Nweye, Bo Liu, Peter Stone, Zoltan Nagy

Building upon prior research that highlighted the need for standardizing environments for building control research, and inspired by recently introduced challenges for real life reinforcement learning control, here we propose a non-exhaustive set of nine real world challenges for reinforcement learning control in grid-interactive buildings.

Multi-agent Reinforcement Learning reinforcement-learning

Conflict-Averse Gradient Descent for Multi-task Learning

2 code implementations NeurIPS 2021 Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, Qiang Liu

The goal of multi-task learning is to enable more efficient learning than single task learning by sharing model structures for a diverse set of tasks.

Multi-Task Learning

Bottom-Up Skill Discovery from Unsegmented Demonstrations for Long-Horizon Robot Manipulation

no code implementations28 Sep 2021 Yifeng Zhu, Peter Stone, Yuke Zhu

From the task structures of multi-task demonstrations, we identify skills based on the recurring patterns and train goal-conditioned sensorimotor policies with hierarchical imitation learning.

Imitation Learning

Recent Advances in Leveraging Human Guidance for Sequential Decision-Making Tasks

no code implementations13 Jul 2021 Ruohan Zhang, Faraz Torabi, Garrett Warnell, Peter Stone

A longstanding goal of artificial intelligence is to create artificial agents capable of learning to perform tasks that require sequential decision making.

Decision Making

Prevention and Resolution of Conflicts in Social Navigation -- a Survey

no code implementations23 Jun 2021 Reuth Mirsky, Xuesu Xiao, Justin Hart, Peter Stone

It starts by defining a conflict in social navigation, and offers a detailed taxonomy of its components.

Dynamic Sparse Training for Deep Reinforcement Learning

1 code implementation8 Jun 2021 Ghada Sokar, Elena Mocanu, Decebal Constantin Mocanu, Mykola Pechenizkiy, Peter Stone

In this paper, we introduce for the first time a dynamic sparse training approach for deep reinforcement learning to accelerate the training process.

Continuous Control Decision Making +2

Adversarial Intrinsic Motivation for Reinforcement Learning

1 code implementation NeurIPS 2021 Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone

In this paper, we investigate whether one such objective, the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution, can be utilized effectively for reinforcement learning (RL) tasks.

Multi-Goal Reinforcement Learning reinforcement-learning

Coach-Player Multi-Agent Reinforcement Learning for Dynamic Team Composition

1 code implementation18 May 2021 Bo Liu, Qiang Liu, Peter Stone, Animesh Garg, Yuke Zhu, Animashree Anandkumar

Specifically, we 1) adopt the attention mechanism for both the coach and the players; 2) propose a variational objective to regularize learning; and 3) design an adaptive communication method to let the coach decide when to communicate with the players.

Multi-agent Reinforcement Learning reinforcement-learning +1

RAIL: A modular framework for Reinforcement-learning-based Adversarial Imitation Learning

no code implementations8 May 2021 Eddy Hudson, Garrett Warnell, Peter Stone

While Adversarial Imitation Learning (AIL) algorithms have recently led to state-of-the-art results on various imitation learning benchmarks, it is unclear as to what impact various design decisions have on performance.

Imitation Learning OpenAI Gym +1

Skeletal Feature Compensation for Imitation Learning with Embodiment Mismatch

no code implementations15 Apr 2021 Eddy Hudson, Garrett Warnell, Faraz Torabi, Peter Stone

Learning from demonstrations in the wild (e. g. YouTube videos) is a tantalizing goal in imitation learning.

Imitation Learning

Sequential Online Chore Division for Autonomous Vehicle Convoy Formation

no code implementations9 Apr 2021 Harel Yedidsion, Shani Alkoby, Peter Stone

Chore division is a class of fair division problems in which some undesirable "resource" must be shared among a set of participants, with each participant wanting to get as little as possible.

DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation

no code implementations31 Mar 2021 Faraz Torabi, Garrett Warnell, Peter Stone

In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.

Imitation Learning Model-based Reinforcement Learning +1

A Scavenger Hunt for Service Robots

1 code implementation9 Mar 2021 Harel Yedidsion, Jennifer Suriadinata, Zifan Xu, Stefan Debruyn, Peter Stone

In this problem, the goal is to find a set of objects as quickly as possible, given probability distributions of where they may be found.

Expected Value of Communication for Planning in Ad Hoc Teamwork

no code implementations1 Mar 2021 William Macke, Reuth Mirsky, Peter Stone

We then present a novel planning algorithm for ad hoc teamwork, determining which query to ask and planning accordingly.

Scalable Multiagent Driving Policies For Reducing Traffic Congestion

1 code implementation26 Feb 2021 Jiaxun Cui, William Macke, Harel Yedidsion, Aastha Goyal, Daniel Urielli, Peter Stone

Next, we propose a modular transfer reinforcement learning approach, and use it to scale up a multiagent driving policy to outperform human-like traffic and existing approaches in a simulated realistic scenario, which is an order of magnitude larger than past scenarios (hundreds instead of tens of vehicles).

Transfer Reinforcement Learning

Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks

1 code implementation NeurIPS 2020 Lemeng Wu, Bo Liu, Peter Stone, Qiang Liu

We propose firefly neural architecture descent, a general framework for progressively and dynamically growing neural networks to jointly optimize the networks' parameters and architectures.

Continual Learning Image Classification +1

A Coach-Player Framework for Dynamic Team Composition

no code implementations1 Jan 2021 Bo Liu, Qiang Liu, Peter Stone, Animesh Garg, Yuke Zhu, Anima Anandkumar

The performance of our method is comparable or even better than the setting where all players have a full view of the environment, but no coach.

Lucid Dreaming for Experience Replay: Refreshing Past States with the Current Policy

1 code implementation29 Sep 2020 Yunshu Du, Garrett Warnell, Assefaw Gebremedhin, Peter Stone, Matthew E. Taylor

In this work, we introduce Lucid Dreaming for Experience Replay (LiDER), a conceptually new framework that allows replay experiences to be refreshed by leveraging the agent's current policy.

Atari Games

The EMPATHIC Framework for Task Learning from Implicit Human Feedback

1 code implementation28 Sep 2020 Yuchen Cui, Qiping Zhang, Alessandro Allievi, Peter Stone, Scott Niekum, W. Bradley Knox

We train a deep neural network on this data and demonstrate its ability to (1) infer relative reward ranking of events in the training task from prerecorded human facial reactions; (2) improve the policy of an agent in the training task using live human facial reactions; and (3) transfer to a novel domain in which it evaluates robot manipulation trajectories.

Human-Computer Interaction Robotics

Reducing Sampling Error in Batch Temporal Difference Learning

no code implementations ICML 2020 Brahma Pavse, Ishan Durugkar, Josiah Hanna, Peter Stone

In this batch setting, we show that TD(0) may converge to an inaccurate value function because the update following an action is weighted according to the number of times that action occurred in the batch -- not the true probability of the action under the given policy.

An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch

no code implementations NeurIPS 2020 Siddharth Desai, Ishan Durugkar, Haresh Karnan, Garrett Warnell, Josiah Hanna, Peter Stone

We examine the problem of transferring a policy learned in a source environment to a target environment with different dynamics, particularly in the case where it is critical to reduce the amount of interaction with the target environment during learning.

Transfer Learning

Temporal-Logic-Based Reward Shaping for Continuing Learning Tasks

no code implementations3 Jul 2020 Yuqian Jiang, Sudarshanan Bharadwaj, Bo Wu, Rishi Shah, Ufuk Topcu, Peter Stone

Reward shaping is a common approach for incorporating domain knowledge into reinforcement learning in order to speed up convergence to an optimal policy.


Artificial Musical Intelligence: A Survey

no code implementations17 Jun 2020 Elad Liebman, Peter Stone

Computers have been used to analyze and create music since they were first introduced in the 1950s and 1960s.

Generalizing Curricula for Reinforcement Learning

no code implementations ICML Workshop LifelongML 2020 Sanmit Narvekar, Peter Stone

However, there is structure that can be exploited between tasks and agents, such that knowledge gained developing a curriculum for one task can be reused to speed up creating a curriculum for a new task.


Deep R-Learning for Continual Area Sweeping

no code implementations31 May 2020 Rishi Shah, Yuqian Jiang, Justin Hart, Peter Stone

Coverage path planning is a well-studied problem in robotics in which a robot must plan a path that passes through every point in a given area repeatedly, usually with a uniform frequency.

iCORPP: Interleaved Commonsense Reasoning and Probabilistic Planning on Robots

no code implementations18 Apr 2020 Shiqi Zhang, Peter Stone

Robot sequential decision-making in the real world is a challenge because it requires the robots to simultaneously reason about the current world state and dynamics, while planning actions to accomplish complex tasks.

Decision Making

APPLD: Adaptive Planner Parameter Learning from Demonstration

no code implementations31 Mar 2020 Xuesu Xiao, Bo Liu, Garrett Warnell, Jonathan Fink, Peter Stone

Existing autonomous robot navigation systems allow robots to move from one point to another in a collision-free manner.

Robot Navigation

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

no code implementations10 Mar 2020 Sanmit Narvekar, Bei Peng, Matteo Leonetti, Jivko Sinapov, Matthew E. Taylor, Peter Stone

Reinforcement learning (RL) is a popular paradigm for addressing sequential decision tasks in which the agent has only limited environmental feedback.

reinforcement-learning Transfer Learning

Leveraging Human Guidance for Deep Reinforcement Learning Tasks

no code implementations21 Sep 2019 Ruohan Zhang, Faraz Torabi, Lin Guan, Dana H. Ballard, Peter Stone

Reinforcement learning agents can learn to solve sequential decision tasks by interacting with the environment.

Imitation Learning reinforcement-learning

Solving Service Robot Tasks: UT Austin Villa@Home 2019 Team Report

no code implementations14 Sep 2019 Rishi Shah, Yuqian Jiang, Haresh Karnan, Gilberto Briscoe-Martinez, Dominick Mulder, Ryan Gupta, Rachel Schlossman, Marika Murphy, Justin W. Hart, Luis Sentis, Peter Stone

RoboCup@Home is an international robotics competition based on domestic tasks requiring autonomous capabilities pertaining to a large variety of AI technologies.

Sample-efficient Adversarial Imitation Learning from Observation

no code implementations18 Jun 2019 Faraz Torabi, Sean Geiger, Garrett Warnell, Peter Stone

We test our algorithm and conduct experiments using an imitation task on a physical robot arm and its simulated version in Gazebo and will show the improvement in learning rate and efficiency.

Imitation Learning reinforcement-learning

RIDM: Reinforced Inverse Dynamics Modeling for Learning from a Single Observed Demonstration

no code implementations18 Jun 2019 Brahma S. Pavse, Faraz Torabi, Josiah P. Hanna, Garrett Warnell, Peter Stone

Augmenting reinforcement learning with imitation learning is often hailed as a method by which to improve upon learning from scratch.

Imitation Learning reinforcement-learning

Recent Advances in Imitation Learning from Observation

no code implementations30 May 2019 Faraz Torabi, Garrett Warnell, Peter Stone

Imitation learning is the process by which one agent tries to learn how to perform a certain task using information generated by another, often more-expert agent performing that same task.

Imitation Learning

Imitation Learning from Video by Leveraging Proprioception

no code implementations22 May 2019 Faraz Torabi, Garrett Warnell, Peter Stone

Classically, imitation learning algorithms have been developed for idealized situations, e. g., the demonstrations are often required to be collected in the exact same environment and usually include the demonstrator's actions.

Imitation Learning

HR-TD: A Regularized TD Method to Avoid Over-Generalization

no code implementations ICLR 2019 Ishan Durugkar, Bo Liu, Peter Stone

Temporal Difference learning with function approximation has been widely used recently and has led to several successful results.

Escape Room: A Configurable Testbed for Hierarchical Reinforcement Learning

no code implementations22 Dec 2018 Jacob Menashe, Peter Stone

We show that the ERD presents a suite of challenges with scalable difficulty to provide a smooth learning gradient from Taxi to the Arcade Learning Environment.

Hierarchical Reinforcement Learning Montezuma's Revenge +1

Learning Curriculum Policies for Reinforcement Learning

1 code implementation1 Dec 2018 Sanmit Narvekar, Peter Stone

Curriculum learning in reinforcement learning is a training methodology that seeks to speed up learning of a difficult target task, by first training on a series of simpler tasks and transferring the knowledge acquired to the target task.

reinforcement-learning Transfer Learning

Integrating Task-Motion Planning with Reinforcement Learning for Robust Decision Making in Mobile Robots

no code implementations21 Nov 2018 Yuqian Jiang, Fangkai Yang, Shiqi Zhang, Peter Stone

In the outer loop, the plan is executed, and the robot learns from the execution experience via model-free RL, to further improve its task-motion plans.

Decision Making Motion Planning +1

Robot Representation and Reasoning with Knowledge from Reinforcement Learning

no code implementations28 Sep 2018 Keting Lu, Shiqi Zhang, Peter Stone, Xiaoping Chen

In this work, we integrate logical-probabilistic KRR with model-based RL, enabling agents to simultaneously reason with declarative knowledge and learn from interaction experiences.


Deterministic Implementations for Reproducibility in Deep Reinforcement Learning

1 code implementation15 Sep 2018 Prabhat Nagarajan, Garrett Warnell, Peter Stone

One by one, we then allow individual sources of nondeterminism to affect our otherwise deterministic implementation, and measure the impact of each source on the variance in performance.

Q-Learning reinforcement-learning

Learning a Policy for Opportunistic Active Learning

no code implementations EMNLP 2018 Aishwarya Padmakumar, Peter Stone, Raymond J. Mooney

Active learning identifies data points to label that are expected to be the most useful in improving a supervised model.

Active Learning reinforcement-learning

A Century Long Commitment to Assessing Artificial Intelligence and its Impact on Society

no code implementations23 Aug 2018 Barbara J. Grosz, Peter Stone

In September 2016, Stanford's "One Hundred Year Study on Artificial Intelligence" project (AI100) issued the first report of its planned long-term periodic assessment of artificial intelligence (AI) and its impact on society.

Generative Adversarial Imitation from Observation

1 code implementation17 Jul 2018 Faraz Torabi, Garrett Warnell, Peter Stone

Imitation from observation (IfO) is the problem of learning directly from state-only demonstrations without having access to the demonstrator's actions.

Imitation Learning

Importance Sampling Policy Evaluation with an Estimated Behavior Policy

1 code implementation4 Jun 2018 Josiah P. Hanna, Scott Niekum, Peter Stone

We find that this estimator often lowers the mean squared error of off-policy evaluation compared to importance sampling with the true behavior policy or using a behavior policy that is estimated from a separate data set.

Behavioral Cloning from Observation

5 code implementations4 May 2018 Faraz Torabi, Garrett Warnell, Peter Stone

In this work, we propose a two-phase, autonomous imitation learning technique called behavioral cloning from observation (BCO), that aims to provide improved performance with respect to both of these aspects.

Imitation Learning

Task Planning in Robotics: an Empirical Comparison of PDDL-based and ASP-based Systems

no code implementations23 Apr 2018 Yuqian Jiang, Shiqi Zhang, Piyush Khandelwal, Peter Stone

PDDL is designed for task planning, and PDDL-based planners are widely used for a variety of planning problems.

Robot Task Planning

TD Learning with Constrained Gradients

no code implementations ICLR 2018 Ishan Durugkar, Peter Stone

In this work we propose a constraint on the TD update that minimizes change to the target values.


Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces

2 code implementations28 Sep 2017 Garrett Warnell, Nicholas Waytowich, Vernon Lawhern, Peter Stone

While recent advances in deep reinforcement learning have allowed autonomous learning agents to succeed at a variety of complex tasks, existing algorithms generally require a lot of training data.


Traffic Optimization For a Mixture of Self-interested and Compliant Agents

no code implementations27 Sep 2017 Guni Sharon, Michael Albert, Tarun Rambha, Stephen Boyles, Peter Stone

This paper focuses on two commonly used path assignment policies for agents traversing a congested network: self-interested routing, and system-optimum routing.

Autonomous Agents Modelling Other Agents: A Comprehensive Survey and Open Problems

no code implementations23 Sep 2017 Stefano V. Albrecht, Peter Stone

Much research in artificial intelligence is concerned with the development of autonomous agents that can interact effectively with other agents.

Scalable Training of Artificial Neural Networks with Adaptive Sparse Connectivity inspired by Network Science

2 code implementations15 Jul 2017 Decebal Constantin Mocanu, Elena Mocanu, Peter Stone, Phuong H. Nguyen, Madeleine Gibescu, Antonio Liotta

Through the success of deep learning in various domains, artificial neural networks are currently among the most used artificial intelligence methods.

Data-Efficient Policy Evaluation Through Behavior Policy Search

no code implementations ICML 2017 Josiah P. Hanna, Philip S. Thomas, Peter Stone, Scott Niekum

The standard unbiased technique for evaluating a policy is to deploy the policy and observe its performance.

Online Contrastive Divergence with Generative Replay: Experience Replay without Storing Data

no code implementations18 Oct 2016 Decebal Constantin Mocanu, Maria Torres Vega, Eric Eaton, Peter Stone, Antonio Liotta

Conceived in the early 1990s, Experience Replay (ER) has been shown to be a successful mechanism to allow online learning algorithms to reuse past experiences.

online learning reinforcement-learning

Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

no code implementations20 Jun 2016 Josiah P. Hanna, Peter Stone, Scott Niekum

In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data in both continuous and discrete state spaces.

Deep Reinforcement Learning in Parameterized Action Space

6 code implementations13 Nov 2015 Matthew Hausknecht, Peter Stone

Recent work has shown that deep neural networks are capable of approximating both value functions and policies in reinforcement learning domains featuring continuous state and action spaces.


Deep Recurrent Q-Learning for Partially Observable MDPs

5 code implementations23 Jul 2015 Matthew Hausknecht, Peter Stone

Deep Reinforcement Learning has yielded proficient controllers for complex tasks.

Atari Games OpenAI Gym +1

Representative Selection in Non Metric Datasets

no code implementations26 Feb 2015 Elad Liebman, Benny Chor, Peter Stone

This paper considers the problem of representative selection: choosing a subset of data points from a dataset that best represents its overall set of elements.

DJ-MC: A Reinforcement-Learning Agent for Music Playlist Recommendation

no code implementations9 Jan 2014 Elad Liebman, Maytal Saar-Tsechansky, Peter Stone

In this work we present DJ-MC, a novel reinforcement-learning framework for music recommendation that does not recommend songs individually but rather song sequences, or playlists, based on a model of preferences for both songs and song transitions.

Recommendation Systems reinforcement-learning

Cannot find the paper you are looking for? You can Submit a new open access paper.