Search Results for author: Peter Stone

Found 100 papers, 32 papers with code

ICRA Roboethics Challenge 2023: Intelligent Disobedience in an Elderly Care Home

no code implementations15 Nov 2023 Sveta Paster, Kantwon Rogers, Gordon Briggs, Peter Stone, Reuth Mirsky

With the projected surge in the elderly population, service robots offer a promising avenue to enhance their well-being in elderly care homes.

Learning Generalizable Manipulation Policies with Object-Centric 3D Representations

no code implementations22 Oct 2023 Yifeng Zhu, Zhenyu Jiang, Peter Stone, Yuke Zhu

We introduce GROOT, an imitation learning method for learning robust policies with object-centric and 3D priors.

Imitation Learning

$f$-Policy Gradients: A General Framework for Goal Conditioned RL using $f$-Divergences

no code implementations10 Oct 2023 Siddhant Agarwal, Ishan Durugkar, Peter Stone, Amy Zhang

We further introduce an entropy-regularized policy optimization objective, that we call $state$-MaxEnt RL (or $s$-MaxEnt RL) as a special case of our objective.

Efficient Exploration Policy Gradient Methods +1

Dobby: A Conversational Service Robot Driven by GPT-4

no code implementations10 Oct 2023 Carson Stark, Bohkyung Chun, Casey Charleston, Varsha Ravi, Luis Pabon, Surya Sunkari, Tarun Mohan, Peter Stone, Justin Hart

This work introduces a robotics platform which embeds a conversational AI agent in an embodied system for natural language understanding and intelligent decision-making for service tasks; integrating task planning and human-like conversation.

Decision Making General Knowledge +3

Learning Optimal Advantage from Preferences and Mistaking it for Reward

1 code implementation3 Oct 2023 W. Bradley Knox, Stephane Hatgis-Kessell, Sigurdur Orn Adalgeirsson, Serena Booth, Anca Dragan, Peter Stone, Scott Niekum

Most recent work assumes that human preferences are generated based only upon the reward accrued within those segments, or their partial return.

STERLING: Self-Supervised Terrain Representation Learning from Unconstrained Robot Experience

no code implementations26 Sep 2023 Haresh Karnan, Elvin Yang, Daniel Farkash, Garrett Warnell, Joydeep Biswas, Peter Stone

Terrain awareness, i. e., the ability to identify and distinguish different types of terrain, is a critical ability that robots must have to succeed at autonomous off-road navigation.

Representation Learning Visual Navigation

Wait, That Feels Familiar: Learning to Extrapolate Human Preferences for Preference Aligned Path Planning

no code implementations18 Sep 2023 Haresh Karnan, Elvin Yang, Garrett Warnell, Joydeep Biswas, Peter Stone

In this work, we posit that operator preferences for visually novel terrains, which the robot should adhere to, can often be extrapolated from established terrain references within the inertial, proprioceptive, and tactile domain.

Navigate Robot Navigation +1

Utilizing Mood-Inducing Background Music in Human-Robot Interaction

no code implementations28 Aug 2023 Elad Liebman, Peter Stone

This research fills this gap by reporting the results of an experiment in which human participants were required to complete a task in the presence of an autonomous agent while listening to background music.

Decision Making

Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents

no code implementations18 Aug 2023 Arrasy Rahman, Jiaxun Cui, Peter Stone

In this work, we first propose that maximizing an AHT agent's robustness requires it to emulate policies in the minimum coverage set (MCS), the set of best-response policies to any partner policies in the environment.

Composing Efficient, Robust Tests for Policy Selection

no code implementations12 Jun 2023 Dustin Morrill, Thomas J. Walsh, Daniel Hernandez, Peter R. Wurman, Peter Stone

Empirical results demonstrate that RPOSST finds a small set of test cases that identify high quality policies in a toy one-shot game, poker datasets, and a high-fidelity racing simulator.


Causal Policy Gradient for Whole-Body Mobile Manipulation

no code implementations4 May 2023 Jiaheng Hu, Peter Stone, Roberto Martín-Martín

Current approaches often segregate tasks into navigation without manipulation and stationary manipulation without locomotion by manually matching parts of the action space to MoMa sub-objectives (e. g. learning base actions for locomotion objectives and learning arm actions for manipulation).

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

1 code implementation22 Apr 2023 Bo Liu, Yuqian Jiang, Xiaohan Zhang, Qiang Liu, Shiqi Zhang, Joydeep Biswas, Peter Stone

LLM+P takes in a natural language description of a planning problem, then returns a correct (or optimal) plan for solving that problem in natural language.

Safe Evaluation For Offline Learning: Are We Ready To Deploy?

no code implementations16 Dec 2022 Hager Radi, Josiah P. Hanna, Peter Stone, Matthew E. Taylor

In our setting, we assume a source of data, which we split into a train-set, to learn an offline policy, and a test-set, to estimate a lower-bound on the offline policy using off-policy evaluation with bootstrapping.

Off-policy evaluation

ABC: Adversarial Behavioral Cloning for Offline Mode-Seeking Imitation Learning

no code implementations8 Nov 2022 Eddy Hudson, Ishan Durugkar, Garrett Warnell, Peter Stone

Given a dataset of expert agent interactions with an environment of interest, a viable method to extract an effective agent policy is to estimate the maximum likelihood policy indicated by this data.

Imitation Learning

Artificial Intelligence and Life in 2030: The One Hundred Year Study on Artificial Intelligence

no code implementations31 Oct 2022 Peter Stone, Rodney Brooks, Erik Brynjolfsson, Ryan Calo, Oren Etzioni, Greg Hager, Julia Hirschberg, Shivaram Kalyanakrishnan, Ece Kamar, Sarit Kraus, Kevin Leyton-Brown, David Parkes, William Press, AnnaLee Saxenian, Julie Shah, Milind Tambe, Astro Teller

In September 2016, Stanford's "One Hundred Year Study on Artificial Intelligence" project (AI100) issued the first report of its planned long-term periodic assessment of artificial intelligence (AI) and its impact on society.

D-Shape: Demonstration-Shaped Reinforcement Learning via Goal Conditioning

no code implementations26 Oct 2022 Caroline Wang, Garrett Warnell, Peter Stone

While combining imitation learning (IL) and reinforcement learning (RL) is a promising way to address poor sample efficiency in autonomous behavior acquisition, methods that do so typically assume that the requisite behavior demonstrations are provided by an expert that behaves optimally with respect to a task reward.

Imitation Learning reinforcement-learning +1

Task Phasing: Automated Curriculum Learning from Demonstrations

1 code implementation20 Oct 2022 Vaibhav Bajaj, Guni Sharon, Peter Stone

Applying reinforcement learning (RL) to sparse reward domains is notoriously challenging due to insufficient guiding signals.

Reinforcement Learning (RL)

BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach

no code implementations19 Sep 2022 Mao Ye, Bo Liu, Stephen Wright, Peter Stone, Qiang Liu

Bilevel optimization (BO) is useful for solving a variety of important machine learning problems including but not limited to hyperparameter optimization, meta-learning, continual learning, and reinforcement learning.

Bilevel Optimization Continual Learning +3

Metric Residual Networks for Sample Efficient Goal-Conditioned Reinforcement Learning

2 code implementations17 Aug 2022 Bo Liu, Yihao Feng, Qiang Liu, Peter Stone

Furthermore, we introduce the metric residual network (MRN) that deliberately decomposes the action-value function Q(s, a, g) into the negated summation of a metric plus a residual asymmetric component.

reinforcement-learning Reinforcement Learning (RL)

Causal Dynamics Learning for Task-Independent State Abstraction

1 code implementation27 Jun 2022 Zizhao Wang, Xuesu Xiao, Zifan Xu, Yuke Zhu, Peter Stone

Learning dynamics models accurately is an important goal for Model-Based Reinforcement Learning (MBRL), but most MBRL methods learn a dense dynamics model which is vulnerable to spurious correlations and therefore generalizes poorly to unseen states.

Model-based Reinforcement Learning

Value Function Decomposition for Iterative Design of Reinforcement Learning Agents

no code implementations24 Jun 2022 James Macglashan, Evan Archer, Alisa Devlic, Takuma Seno, Craig Sherstan, Peter R. Wurman, Peter Stone

These value estimates provide insight into an agent's learning and decision-making process and enable new training methods to mitigate common problems.

Decision Making reinforcement-learning +1

Models of human preference for learning reward functions

1 code implementation5 Jun 2022 W. Bradley Knox, Stephane Hatgis-Kessell, Serena Booth, Scott Niekum, Peter Stone, Alessandro Allievi

We empirically show that our proposed regret preference model outperforms the partial return preference model with finite training data in otherwise the same setting.

Decision Making reinforcement-learning

DM$^2$: Decentralized Multi-Agent Reinforcement Learning for Distribution Matching

1 code implementation1 Jun 2022 Caroline Wang, Ishan Durugkar, Elad Liebman, Peter Stone

The theoretical analysis shows that under certain conditions, each agent minimizing its individual distribution mismatch allows the convergence to the joint policy that generated the target distribution.

Multi-agent Reinforcement Learning reinforcement-learning +2

COOPERNAUT: End-to-End Driving with Cooperative Perception for Networked Vehicles

1 code implementation CVPR 2022 Jiaxun Cui, Hang Qiu, Dian Chen, Peter Stone, Yuke Zhu

To evaluate our model, we develop AutoCastSim, a network-augmented driving simulation framework with example accident-prone scenarios.

Autonomous Driving

Effective Mutation Rate Adaptation through Group Elite Selection

no code implementations11 Apr 2022 Akarsh Kumar, Bo Liu, Risto Miikkulainen, Peter Stone

GESMR co-evolves a population of solutions and a population of MRs, such that each MR is assigned to a group of solutions.

Evolutionary Algorithms Image Classification

VI-IKD: High-Speed Accurate Off-Road Navigation using Learned Visual-Inertial Inverse Kinodynamics

no code implementations30 Mar 2022 Haresh Karnan, Kavan Singh Sikand, Pranav Atreya, Sadegh Rabiee, Xuesu Xiao, Garrett Warnell, Peter Stone, Joydeep Biswas

In this paper, we hypothesize that to enable accurate high-speed off-road navigation using a learned IKD model, in addition to inertial information from the past, one must also anticipate the kinodynamic interactions of the vehicle with the terrain in the future.

Socially Compliant Navigation Dataset (SCAND): A Large-Scale Dataset of Demonstrations for Social Navigation

no code implementations28 Mar 2022 Haresh Karnan, Anirudh Nair, Xuesu Xiao, Garrett Warnell, Soeren Pirk, Alexander Toshev, Justin Hart, Joydeep Biswas, Peter Stone

Social navigation is the capability of an autonomous agent, such as a robot, to navigate in a 'socially compliant' manner in the presence of other intelligent agents such as humans.

Imitation Learning Navigate +1

Continual Learning and Private Unlearning

1 code implementation24 Mar 2022 Bo Liu, Qiang Liu, Peter Stone

As intelligent agents become autonomous over longer periods of time, they may eventually become lifelong counterparts to specific people.

Continual Learning

Learning a Shield from Catastrophic Action Effects: Never Repeat the Same Mistake

no code implementations19 Feb 2022 Shahaf S. Shperberg, Bo Liu, Peter Stone

When humans make catastrophic mistakes, they are expected to learn never to repeat them, such as a toddler who touches a hot stove and immediately learns never to do so again.

Continual Learning Safe Reinforcement Learning

A Survey of Ad Hoc Teamwork Research

no code implementations16 Feb 2022 Reuth Mirsky, Ignacio Carlucho, Arrasy Rahman, Elliot Fosong, William Macke, Mohan Sridharan, Peter Stone, Stefano V. Albrecht

Ad hoc teamwork is the research problem of designing agents that can collaborate with new teammates without prior coordination.

Adversarial Imitation Learning from Video using a State Observer

no code implementations1 Feb 2022 Haresh Karnan, Garrett Warnell, Faraz Torabi, Peter Stone

The imitation learning research community has recently made significant progress towards the goal of enabling artificial agents to imitate behaviors from video demonstrations alone.

Continuous Control Imitation Learning

Learning a Robust Multiagent Driving Policy for Traffic Congestion Reduction

1 code implementation3 Dec 2021 Yulin Zhang, William Macke, Jiaxun Cui, Daniel Urieli, Peter Stone

This article establishes for the first time that a multiagent driving policy can be trained in such a way that it generalizes to different traffic flows, AV penetration, and road geometries, including on multi-lane roads.

Autonomous Vehicles

Real-world challenges for multi-agent reinforcement learning in grid-interactive buildings

no code implementations25 Nov 2021 Kingsley Nweye, Bo Liu, Peter Stone, Zoltan Nagy

Building upon prior research that highlighted the need for standardizing environments for building control research, and inspired by recently introduced challenges for real life reinforcement learning control, here we propose a non-exhaustive set of nine real world challenges for reinforcement learning control in grid-interactive buildings.

Multi-agent Reinforcement Learning reinforcement-learning +1

Conflict-Averse Gradient Descent for Multi-task Learning

3 code implementations NeurIPS 2021 Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, Qiang Liu

The goal of multi-task learning is to enable more efficient learning than single task learning by sharing model structures for a diverse set of tasks.

Multi-Task Learning

Bottom-Up Skill Discovery from Unsegmented Demonstrations for Long-Horizon Robot Manipulation

no code implementations28 Sep 2021 Yifeng Zhu, Peter Stone, Yuke Zhu

From the task structures of multi-task demonstrations, we identify skills based on the recurring patterns and train goal-conditioned sensorimotor policies with hierarchical imitation learning.

Imitation Learning Robot Manipulation

Recent Advances in Leveraging Human Guidance for Sequential Decision-Making Tasks

no code implementations13 Jul 2021 Ruohan Zhang, Faraz Torabi, Garrett Warnell, Peter Stone

A longstanding goal of artificial intelligence is to create artificial agents capable of learning to perform tasks that require sequential decision making.

Decision Making

Conflict Avoidance in Social Navigation -- a Survey

no code implementations23 Jun 2021 Reuth Mirsky, Xuesu Xiao, Justin Hart, Peter Stone

This survey aims to bridge this gap by introducing such a common language, using it to survey existing work, and highlighting open problems.

Social Navigation

Dynamic Sparse Training for Deep Reinforcement Learning

1 code implementation8 Jun 2021 Ghada Sokar, Elena Mocanu, Decebal Constantin Mocanu, Mykola Pechenizkiy, Peter Stone

In this paper, we introduce for the first time a dynamic sparse training approach for deep reinforcement learning to accelerate the training process.

Continuous Control Decision Making +3

Adversarial Intrinsic Motivation for Reinforcement Learning

1 code implementation NeurIPS 2021 Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone

In this paper, we investigate whether one such objective, the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution, can be utilized effectively for reinforcement learning (RL) tasks.

Multi-Goal Reinforcement Learning reinforcement-learning +1

Coach-Player Multi-Agent Reinforcement Learning for Dynamic Team Composition

1 code implementation18 May 2021 Bo Liu, Qiang Liu, Peter Stone, Animesh Garg, Yuke Zhu, Animashree Anandkumar

Specifically, we 1) adopt the attention mechanism for both the coach and the players; 2) propose a variational objective to regularize learning; and 3) design an adaptive communication method to let the coach decide when to communicate with the players.

Multi-agent Reinforcement Learning reinforcement-learning +2

RAIL: A modular framework for Reinforcement-learning-based Adversarial Imitation Learning

no code implementations8 May 2021 Eddy Hudson, Garrett Warnell, Peter Stone

While Adversarial Imitation Learning (AIL) algorithms have recently led to state-of-the-art results on various imitation learning benchmarks, it is unclear as to what impact various design decisions have on performance.

Imitation Learning OpenAI Gym +2

Skeletal Feature Compensation for Imitation Learning with Embodiment Mismatch

no code implementations15 Apr 2021 Eddy Hudson, Garrett Warnell, Faraz Torabi, Peter Stone

Learning from demonstrations in the wild (e. g. YouTube videos) is a tantalizing goal in imitation learning.

Imitation Learning

Sequential Online Chore Division for Autonomous Vehicle Convoy Formation

no code implementations9 Apr 2021 Harel Yedidsion, Shani Alkoby, Peter Stone

Chore division is a class of fair division problems in which some undesirable "resource" must be shared among a set of participants, with each participant wanting to get as little as possible.

DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation

no code implementations31 Mar 2021 Faraz Torabi, Garrett Warnell, Peter Stone

In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.

Imitation Learning Model-based Reinforcement Learning +2

A Scavenger Hunt for Service Robots

1 code implementation9 Mar 2021 Harel Yedidsion, Jennifer Suriadinata, Zifan Xu, Stefan Debruyn, Peter Stone

In this problem, the goal is to find a set of objects as quickly as possible, given probability distributions of where they may be found.

Reinforcement Learning (RL)

Expected Value of Communication for Planning in Ad Hoc Teamwork

no code implementations1 Mar 2021 William Macke, Reuth Mirsky, Peter Stone

We then present a novel planning algorithm for ad hoc teamwork, determining which query to ask and planning accordingly.

Scalable Multiagent Driving Policies For Reducing Traffic Congestion

1 code implementation26 Feb 2021 Jiaxun Cui, William Macke, Harel Yedidsion, Daniel Urieli, Peter Stone

Next, we propose a modular transfer reinforcement learning approach, and use it to scale up a multiagent driving policy to outperform human-like traffic and existing approaches in a simulated realistic scenario, which is an order of magnitude larger than past scenarios (hundreds instead of tens of vehicles).

Transfer Reinforcement Learning

Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks

1 code implementation NeurIPS 2020 Lemeng Wu, Bo Liu, Peter Stone, Qiang Liu

We propose firefly neural architecture descent, a general framework for progressively and dynamically growing neural networks to jointly optimize the networks' parameters and architectures.

Continual Learning Image Classification +1

A Coach-Player Framework for Dynamic Team Composition

no code implementations1 Jan 2021 Bo Liu, Qiang Liu, Peter Stone, Animesh Garg, Yuke Zhu, Anima Anandkumar

The performance of our method is comparable or even better than the setting where all players have a full view of the environment, but no coach.

Lucid Dreaming for Experience Replay: Refreshing Past States with the Current Policy

1 code implementation29 Sep 2020 Yunshu Du, Garrett Warnell, Assefaw Gebremedhin, Peter Stone, Matthew E. Taylor

In this work, we introduce Lucid Dreaming for Experience Replay (LiDER), a conceptually new framework that allows replay experiences to be refreshed by leveraging the agent's current policy.

Atari Games Reinforcement Learning (RL)

The EMPATHIC Framework for Task Learning from Implicit Human Feedback

1 code implementation28 Sep 2020 Yuchen Cui, Qiping Zhang, Alessandro Allievi, Peter Stone, Scott Niekum, W. Bradley Knox

We train a deep neural network on this data and demonstrate its ability to (1) infer relative reward ranking of events in the training task from prerecorded human facial reactions; (2) improve the policy of an agent in the training task using live human facial reactions; and (3) transfer to a novel domain in which it evaluates robot manipulation trajectories.

Human-Computer Interaction Robotics

Reducing Sampling Error in Batch Temporal Difference Learning

no code implementations ICML 2020 Brahma Pavse, Ishan Durugkar, Josiah Hanna, Peter Stone

In this batch setting, we show that TD(0) may converge to an inaccurate value function because the update following an action is weighted according to the number of times that action occurred in the batch -- not the true probability of the action under the given policy.

An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch

no code implementations NeurIPS 2020 Siddharth Desai, Ishan Durugkar, Haresh Karnan, Garrett Warnell, Josiah Hanna, Peter Stone

We examine the problem of transferring a policy learned in a source environment to a target environment with different dynamics, particularly in the case where it is critical to reduce the amount of interaction with the target environment during learning.

Transfer Learning

Temporal-Logic-Based Reward Shaping for Continuing Reinforcement Learning Tasks

no code implementations3 Jul 2020 Yuqian Jiang, Sudarshanan Bharadwaj, Bo Wu, Rishi Shah, Ufuk Topcu, Peter Stone

Reward shaping is a common approach for incorporating domain knowledge into reinforcement learning in order to speed up convergence to an optimal policy.

reinforcement-learning Reinforcement Learning (RL)

Artificial Musical Intelligence: A Survey

no code implementations17 Jun 2020 Elad Liebman, Peter Stone

Computers have been used to analyze and create music since they were first introduced in the 1950s and 1960s.

BIG-bench Machine Learning Music Recommendation +1

Generalizing Curricula for Reinforcement Learning

no code implementations ICML Workshop LifelongML 2020 Sanmit Narvekar, Peter Stone

However, there is structure that can be exploited between tasks and agents, such that knowledge gained developing a curriculum for one task can be reused to speed up creating a curriculum for a new task.

reinforcement-learning Reinforcement Learning (RL)

Deep R-Learning for Continual Area Sweeping

no code implementations31 May 2020 Rishi Shah, Yuqian Jiang, Justin Hart, Peter Stone

Coverage path planning is a well-studied problem in robotics in which a robot must plan a path that passes through every point in a given area repeatedly, usually with a uniform frequency.

iCORPP: Interleaved Commonsense Reasoning and Probabilistic Planning on Robots

no code implementations18 Apr 2020 Shiqi Zhang, Piyush Khandelwal, Peter Stone

Robot sequential decision-making in the real world is a challenge because it requires the robots to simultaneously reason about the current world state and dynamics, while planning actions to accomplish complex tasks.

Decision Making Management

APPLD: Adaptive Planner Parameter Learning from Demonstration

no code implementations31 Mar 2020 Xuesu Xiao, Bo Liu, Garrett Warnell, Jonathan Fink, Peter Stone

Existing autonomous robot navigation systems allow robots to move from one point to another in a collision-free manner.

Robot Navigation

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

no code implementations10 Mar 2020 Sanmit Narvekar, Bei Peng, Matteo Leonetti, Jivko Sinapov, Matthew E. Taylor, Peter Stone

Reinforcement learning (RL) is a popular paradigm for addressing sequential decision tasks in which the agent has only limited environmental feedback.

reinforcement-learning Reinforcement Learning (RL) +1

Leveraging Human Guidance for Deep Reinforcement Learning Tasks

no code implementations21 Sep 2019 Ruohan Zhang, Faraz Torabi, Lin Guan, Dana H. Ballard, Peter Stone

Reinforcement learning agents can learn to solve sequential decision tasks by interacting with the environment.

Imitation Learning reinforcement-learning +1

Solving Service Robot Tasks: UT Austin Villa@Home 2019 Team Report

no code implementations14 Sep 2019 Rishi Shah, Yuqian Jiang, Haresh Karnan, Gilberto Briscoe-Martinez, Dominick Mulder, Ryan Gupta, Rachel Schlossman, Marika Murphy, Justin W. Hart, Luis Sentis, Peter Stone

RoboCup@Home is an international robotics competition based on domestic tasks requiring autonomous capabilities pertaining to a large variety of AI technologies.

Sample-efficient Adversarial Imitation Learning from Observation

no code implementations18 Jun 2019 Faraz Torabi, Sean Geiger, Garrett Warnell, Peter Stone

We test our algorithm and conduct experiments using an imitation task on a physical robot arm and its simulated version in Gazebo and will show the improvement in learning rate and efficiency.

Imitation Learning Reinforcement Learning (RL) +1

Recent Advances in Imitation Learning from Observation

no code implementations30 May 2019 Faraz Torabi, Garrett Warnell, Peter Stone

Imitation learning is the process by which one agent tries to learn how to perform a certain task using information generated by another, often more-expert agent performing that same task.

Imitation Learning

Imitation Learning from Video by Leveraging Proprioception

no code implementations22 May 2019 Faraz Torabi, Garrett Warnell, Peter Stone

Classically, imitation learning algorithms have been developed for idealized situations, e. g., the demonstrations are often required to be collected in the exact same environment and usually include the demonstrator's actions.

Imitation Learning Test

HR-TD: A Regularized TD Method to Avoid Over-Generalization

no code implementations ICLR 2019 Ishan Durugkar, Bo Liu, Peter Stone

Temporal Difference learning with function approximation has been widely used recently and has led to several successful results.

Escape Room: A Configurable Testbed for Hierarchical Reinforcement Learning

no code implementations22 Dec 2018 Jacob Menashe, Peter Stone

We show that the ERD presents a suite of challenges with scalable difficulty to provide a smooth learning gradient from Taxi to the Arcade Learning Environment.

Hierarchical Reinforcement Learning Montezuma's Revenge +3

Learning Curriculum Policies for Reinforcement Learning

1 code implementation1 Dec 2018 Sanmit Narvekar, Peter Stone

Curriculum learning in reinforcement learning is a training methodology that seeks to speed up learning of a difficult target task, by first training on a series of simpler tasks and transferring the knowledge acquired to the target task.

reinforcement-learning Reinforcement Learning (RL) +1

Integrating Task-Motion Planning with Reinforcement Learning for Robust Decision Making in Mobile Robots

no code implementations21 Nov 2018 Yuqian Jiang, Fangkai Yang, Shiqi Zhang, Peter Stone

In the outer loop, the plan is executed, and the robot learns from the execution experience via model-free RL, to further improve its task-motion plans.

Decision Making Motion Planning +2

Robot Representation and Reasoning with Knowledge from Reinforcement Learning

no code implementations28 Sep 2018 Keting Lu, Shiqi Zhang, Peter Stone, Xiaoping Chen

In this work, we integrate logical-probabilistic KRR with model-based RL, enabling agents to simultaneously reason with declarative knowledge and learn from interaction experiences.

reinforcement-learning Reinforcement Learning (RL)

Deterministic Implementations for Reproducibility in Deep Reinforcement Learning

1 code implementation15 Sep 2018 Prabhat Nagarajan, Garrett Warnell, Peter Stone

One by one, we then allow individual sources of nondeterminism to affect our otherwise deterministic implementation, and measure the impact of each source on the variance in performance.

Q-Learning reinforcement-learning +1

Learning a Policy for Opportunistic Active Learning

no code implementations EMNLP 2018 Aishwarya Padmakumar, Peter Stone, Raymond J. Mooney

Active learning identifies data points to label that are expected to be the most useful in improving a supervised model.

Active Learning reinforcement-learning +2

A Century Long Commitment to Assessing Artificial Intelligence and its Impact on Society

no code implementations23 Aug 2018 Barbara J. Grosz, Peter Stone

In September 2016, Stanford's "One Hundred Year Study on Artificial Intelligence" project (AI100) issued the first report of its planned long-term periodic assessment of artificial intelligence (AI) and its impact on society.

Generative Adversarial Imitation from Observation

1 code implementation17 Jul 2018 Faraz Torabi, Garrett Warnell, Peter Stone

Imitation from observation (IfO) is the problem of learning directly from state-only demonstrations without having access to the demonstrator's actions.

Imitation Learning

Importance Sampling Policy Evaluation with an Estimated Behavior Policy

1 code implementation4 Jun 2018 Josiah P. Hanna, Scott Niekum, Peter Stone

We find that this estimator often lowers the mean squared error of off-policy evaluation compared to importance sampling with the true behavior policy or using a behavior policy that is estimated from a separate data set.

Off-policy evaluation

Behavioral Cloning from Observation

5 code implementations4 May 2018 Faraz Torabi, Garrett Warnell, Peter Stone

In this work, we propose a two-phase, autonomous imitation learning technique called behavioral cloning from observation (BCO), that aims to provide improved performance with respect to both of these aspects.

Imitation Learning

Task Planning in Robotics: an Empirical Comparison of PDDL-based and ASP-based Systems

no code implementations23 Apr 2018 Yuqian Jiang, Shiqi Zhang, Piyush Khandelwal, Peter Stone

PDDL is designed for task planning, and PDDL-based planners are widely used for a variety of planning problems.

Robot Task Planning

TD Learning with Constrained Gradients

no code implementations ICLR 2018 Ishan Durugkar, Peter Stone

In this work we propose a constraint on the TD update that minimizes change to the target values.


Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces

2 code implementations28 Sep 2017 Garrett Warnell, Nicholas Waytowich, Vernon Lawhern, Peter Stone

While recent advances in deep reinforcement learning have allowed autonomous learning agents to succeed at a variety of complex tasks, existing algorithms generally require a lot of training data.

reinforcement-learning Reinforcement Learning (RL) +1

Traffic Optimization For a Mixture of Self-interested and Compliant Agents

no code implementations27 Sep 2017 Guni Sharon, Michael Albert, Tarun Rambha, Stephen Boyles, Peter Stone

This paper focuses on two commonly used path assignment policies for agents traversing a congested network: self-interested routing, and system-optimum routing.

Autonomous Agents Modelling Other Agents: A Comprehensive Survey and Open Problems

no code implementations23 Sep 2017 Stefano V. Albrecht, Peter Stone

Much research in artificial intelligence is concerned with the development of autonomous agents that can interact effectively with other agents.

Scalable Training of Artificial Neural Networks with Adaptive Sparse Connectivity inspired by Network Science

2 code implementations15 Jul 2017 Decebal Constantin Mocanu, Elena Mocanu, Peter Stone, Phuong H. Nguyen, Madeleine Gibescu, Antonio Liotta

Through the success of deep learning in various domains, artificial neural networks are currently among the most used artificial intelligence methods.

Data-Efficient Policy Evaluation Through Behavior Policy Search

1 code implementation ICML 2017 Josiah P. Hanna, Philip S. Thomas, Peter Stone, Scott Niekum

The standard unbiased technique for evaluating a policy is to deploy the policy and observe its performance.

Online Contrastive Divergence with Generative Replay: Experience Replay without Storing Data

no code implementations18 Oct 2016 Decebal Constantin Mocanu, Maria Torres Vega, Eric Eaton, Peter Stone, Antonio Liotta

Conceived in the early 1990s, Experience Replay (ER) has been shown to be a successful mechanism to allow online learning algorithms to reuse past experiences.

reinforcement-learning Reinforcement Learning (RL)

Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

no code implementations20 Jun 2016 Josiah P. Hanna, Peter Stone, Scott Niekum

In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data in both continuous and discrete state spaces.

Off-policy evaluation

Deep Reinforcement Learning in Parameterized Action Space

7 code implementations13 Nov 2015 Matthew Hausknecht, Peter Stone

Recent work has shown that deep neural networks are capable of approximating both value functions and policies in reinforcement learning domains featuring continuous state and action spaces.

reinforcement-learning Reinforcement Learning (RL)

Deep Recurrent Q-Learning for Partially Observable MDPs

5 code implementations23 Jul 2015 Matthew Hausknecht, Peter Stone

Deep Reinforcement Learning has yielded proficient controllers for complex tasks.

Atari Games OpenAI Gym +1

Representative Selection in Non Metric Datasets

no code implementations26 Feb 2015 Elad Liebman, Benny Chor, Peter Stone

This paper considers the problem of representative selection: choosing a subset of data points from a dataset that best represents its overall set of elements.


DJ-MC: A Reinforcement-Learning Agent for Music Playlist Recommendation

no code implementations9 Jan 2014 Elad Liebman, Maytal Saar-Tsechansky, Peter Stone

In this work we present DJ-MC, a novel reinforcement-learning framework for music recommendation that does not recommend songs individually but rather song sequences, or playlists, based on a model of preferences for both songs and song transitions.

Music Recommendation Recommendation Systems +2

Cannot find the paper you are looking for? You can Submit a new open access paper.