Search Results for author: Scott Niekum

Found 49 papers, 26 papers with code

Automated Discovery of Functional Actual Causes in Complex Environments

no code implementations16 Apr 2024 Caleb Chuck, Sankaran Vaidyanathan, Stephen Giguere, Amy Zhang, David Jensen, Scott Niekum

This paper introduces functional actual cause (FAC), a framework that uses context-specific independencies in the environment to restrict the set of actual causes.

Learning Action-based Representations Using Invariance

no code implementations25 Mar 2024 Max Rudolph, Caleb Chuck, Kevin Black, Misha Lvovsky, Scott Niekum, Amy Zhang

Robust reinforcement learning agents using high-dimensional observations must be able to identify relevant state features amidst many exogeneous distractors.

SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning

no code implementations3 Nov 2023 Harshit Sikchi, Rohan Chitnis, Ahmed Touati, Alborz Geramifard, Amy Zhang, Scott Niekum

Offline Goal-Conditioned Reinforcement Learning (GCRL) is tasked with learning to achieve multiple goals in an environment purely from offline datasets using sparse reward functions.

Contrastive Learning reinforcement-learning +1

Contrastive Preference Learning: Learning from Human Feedback without RL

1 code implementation20 Oct 2023 Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh

Thus, learning a reward function from feedback is not only based on a flawed assumption of human preference, but also leads to unwieldy optimization challenges that stem from policy gradients or bootstrapping in the RL phase.

reinforcement-learning Reinforcement Learning (RL)

Learning Optimal Advantage from Preferences and Mistaking it for Reward

1 code implementation3 Oct 2023 W. Bradley Knox, Stephane Hatgis-Kessell, Sigurdur Orn Adalgeirsson, Serena Booth, Anca Dragan, Peter Stone, Scott Niekum

Most recent work assumes that human preferences are generated based only upon the reward accrued within those segments, or their partial return.

Granger-Causal Hierarchical Skill Discovery

no code implementations15 Jun 2023 Caleb Chuck, Kevin Black, Aditya Arjun, Yuke Zhu, Scott Niekum

Reinforcement Learning (RL) has demonstrated promising results in learning policies for complex tasks, but it often suffers from low sample efficiency and limited transferability.

reinforcement-learning Reinforcement Learning (RL)

Dual RL: Unification and New Methods for Reinforcement and Imitation Learning

1 code implementation16 Feb 2023 Harshit Sikchi, Qinqing Zheng, Amy Zhang, Scott Niekum

For offline RL, our analysis frames a recent offline RL method XQL in the dual framework, and we further propose a new method f-DVL that provides alternative choices to the Gumbel regression loss that fixes the known training instability issue of XQL.

Imitation Learning Offline RL +2

Language-guided Task Adaptation for Imitation Learning

no code implementations24 Jan 2023 Prasoon Goyal, Raymond J. Mooney, Scott Niekum

We introduce a novel setting, wherein an agent needs to learn a task from a demonstration of a related task with the difference between the tasks communicated in natural language.

Imitation Learning

Models of human preference for learning reward functions

no code implementations5 Jun 2022 W. Bradley Knox, Stephane Hatgis-Kessell, Serena Booth, Scott Niekum, Peter Stone, Alessandro Allievi

We empirically show that our proposed regret preference model outperforms the partial return preference model with finite training data in otherwise the same setting.

Decision Making reinforcement-learning

Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL

no code implementations1 Jun 2022 Wonjoon Goo, Scott Niekum

In this work, we argue that it is not only viable but beneficial to explicitly model the behavior policy for offline RL because the constraint can be realized in a stable way with the trained model.

D4RL Offline RL +1

A Ranking Game for Imitation Learning

no code implementations7 Feb 2022 Harshit Sikchi, Akanksha Saran, Wonjoon Goo, Scott Niekum

We propose a new framework for imitation learning -- treating imitation as a two-player ranking-based game between a policy and a reward.

Imitation Learning

SOPE: Spectrum of Off-Policy Estimators

1 code implementation NeurIPS 2021 Christina J. Yuan, Yash Chandak, Stephen Giguere, Philip S. Thomas, Scott Niekum

In this paper, we present a new perspective on this bias-variance trade-off and show the existence of a spectrum of estimators whose endpoints are SIS and IS.

Decision Making Off-policy evaluation

You Only Evaluate Once: a Simple Baseline Algorithm for Offline RL

no code implementations5 Oct 2021 Wonjoon Goo, Scott Niekum

The goal of offline reinforcement learning (RL) is to find an optimal policy given prerecorded trajectories.

D4RL Offline RL +1

Fairness Guarantees under Demographic Shift

no code implementations ICLR 2022 Stephen Giguere, Blossom Metevier, Yuriy Brun, Philip S. Thomas, Scott Niekum, Bruno Castro da Silva

Recent studies have demonstrated that using machine learning for social applications can lead to injustice in the form of racist, sexist, and otherwise unfair and discriminatory outcomes.

Fairness

Distributional Depth-Based Estimation of Object Articulation Models

1 code implementation12 Aug 2021 Ajinkya Jain, Stephen Giguere, Rudolf Lioutikov, Scott Niekum

Our core contributions include a novel representation for distributions over rigid body transformations and articulation model parameters based on screw theory, von Mises-Fisher distributions, and Stiefel manifolds.

Benchmarking Object

On the Benefits of Inducing Local Lipschitzness for Robust Generative Adversarial Imitation Learning

no code implementations30 Jun 2021 Farzan Memarian, Abolfazl Hashemi, Scott Niekum, Ufuk Topcu

We explore methodologies to improve the robustness of generative adversarial imitation learning (GAIL) algorithms to observation noise.

Imitation Learning

Zero-shot Task Adaptation using Natural Language

no code implementations5 Jun 2021 Prasoon Goyal, Raymond J. Mooney, Scott Niekum

Imitation learning and instruction-following are two common approaches to communicate a user's intent to a learning agent.

Imitation Learning Instruction Following

Adversarial Intrinsic Motivation for Reinforcement Learning

1 code implementation NeurIPS 2021 Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone

In this paper, we investigate whether one such objective, the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution, can be utilized effectively for reinforcement learning (RL) tasks.

Multi-Goal Reinforcement Learning reinforcement-learning +1

Universal Off-Policy Evaluation

1 code implementation NeurIPS 2021 Yash Chandak, Scott Niekum, Bruno Castro da Silva, Erik Learned-Miller, Emma Brunskill, Philip S. Thomas

When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy.

counterfactual Decision Making +1

Self-Supervised Online Reward Shaping in Sparse-Reward Environments

1 code implementation8 Mar 2021 Farzan Memarian, Wonjoon Goo, Rudolf Lioutikov, Scott Niekum, Ufuk Topcu

We introduce Self-supervised Online Reward Shaping (SORS), which aims to improve the sample efficiency of any RL algorithm in sparse-reward environments by automatically densifying rewards.

Value Alignment Verification

1 code implementation2 Dec 2020 Daniel S. Brown, Jordan Schneider, Anca D. Dragan, Scott Niekum

In this paper we formalize and theoretically analyze the problem of efficient value alignment verification: how to efficiently test whether the behavior of another agent is aligned with a human's values.

Autonomous Driving

The EMPATHIC Framework for Task Learning from Implicit Human Feedback

1 code implementation28 Sep 2020 Yuchen Cui, Qiping Zhang, Alessandro Allievi, Peter Stone, Scott Niekum, W. Bradley Knox

We train a deep neural network on this data and demonstrate its ability to (1) infer relative reward ranking of events in the training task from prerecorded human facial reactions; (2) improve the policy of an agent in the training task using live human facial reactions; and (3) transfer to a novel domain in which it evaluates robot manipulation trajectories.

Human-Computer Interaction Robotics

ScrewNet: Category-Independent Articulation Model Estimation From Depth Images Using Screw Theory

1 code implementation24 Aug 2020 Ajinkya Jain, Rudolf Lioutikov, Caleb Chuck, Scott Niekum

Robots in human environments will need to interact with a wide variety of articulated objects such as cabinets, drawers, and dishwashers while assisting humans in performing day-to-day tasks.

Benchmarking

PixL2R: Guiding Reinforcement Learning Using Natural Language by Mapping Pixels to Rewards

1 code implementation ICML Workshop LaReL 2020 Prasoon Goyal, Scott Niekum, Raymond J. Mooney

Reinforcement learning (RL), particularly in sparse reward settings, often requires prohibitively large numbers of interactions with the environment, thereby limiting its applicability to complex problems.

reinforcement-learning Reinforcement Learning (RL) +1

Bayesian Robust Optimization for Imitation Learning

1 code implementation NeurIPS 2020 Daniel S. Brown, Scott Niekum, Marek Petrik

Existing safe imitation learning approaches based on IRL deal with this uncertainty using a maxmin framework that optimizes a policy under the assumption of an adversarial reward function, whereas risk-neutral IRL approaches either optimize a policy for the mean or MAP reward function.

Imitation Learning reinforcement-learning +1

Efficiently Guiding Imitation Learning Agents with Human Gaze

no code implementations28 Feb 2020 Akanksha Saran, Ruohan Zhang, Elaine Schaertl Short, Scott Niekum

Based on similarities between the attention of reinforcement learning agents and human gaze, we propose a novel approach for utilizing gaze data in a computationally efficient manner, as part of an auxiliary loss function, which guides a network to have higher activations in image regions where the human's gaze fixated.

Atari Games Imitation Learning

Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences

1 code implementation ICML 2020 Daniel S. Brown, Russell Coleman, Ravi Srinivasan, Scott Niekum

Bayesian REX can learn to play Atari games from demonstrations, without access to the game score and can generate 100, 000 samples from the posterior over reward functions in only 5 minutes on a personal laptop.

Atari Games Bayesian Inference +1

Local Nonparametric Meta-Learning

1 code implementation9 Feb 2020 Wonjoon Goo, Scott Niekum

A central goal of meta-learning is to find a learning rule that enables fast adaptation across a set of tasks, by learning the appropriate inductive bias for that set.

Inductive Bias Meta-Learning

Deep Bayesian Reward Learning from Preferences

no code implementations10 Dec 2019 Daniel S. Brown, Scott Niekum

Bayesian inverse reinforcement learning (IRL) methods are ideal for safe imitation learning, as they allow a learning agent to reason about reward uncertainty and the safety of a learned policy.

Atari Games Imitation Learning

A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms

no code implementations6 Jul 2019 Oliver Kroemer, Scott Niekum, George Konidaris

A key challenge in intelligent robotics is creating robots that are capable of directly interacting with the world around them to achieve their goals.

Robotics

Hypothesis-Driven Skill Discovery for Hierarchical Deep Reinforcement Learning

no code implementations27 May 2019 Caleb Chuck, Supawit Chockchowwat, Scott Niekum

Deep reinforcement learning (DRL) is capable of learning high-performing policies on a variety of complex high-dimensional tasks, ranging from video games to robotic manipulation.

reinforcement-learning Reinforcement Learning (RL)

Uncertainty-Aware Data Aggregation for Deep Imitation Learning

no code implementations7 May 2019 Yuchen Cui, David Isele, Scott Niekum, Kikuo Fujimura

Our analysis shows that UAIL outperforms existing data aggregation algorithms on a series of benchmark tasks.

Autonomous Driving Imitation Learning

Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations

3 code implementations12 Apr 2019 Daniel S. Brown, Wonjoon Goo, Prabhat Nagarajan, Scott Niekum

A critical flaw of existing inverse reinforcement learning (IRL) methods is their inability to significantly outperform the demonstrator.

Imitation Learning reinforcement-learning +1

Using Natural Language for Reward Shaping in Reinforcement Learning

1 code implementation5 Mar 2019 Prasoon Goyal, Scott Niekum, Raymond J. Mooney

A common approach to reduce interaction time with the environment is to use reward shaping, which involves carefully designing reward functions that provide the agent intermediate rewards for progress towards the goal.

Montezuma's Revenge reinforcement-learning +1

Risk-Aware Active Inverse Reinforcement Learning

2 code implementations8 Jan 2019 Daniel S. Brown, Yuchen Cui, Scott Niekum

Active learning from demonstration allows a robot to query a human for specific types of input to achieve efficient learning.

Active Learning reinforcement-learning +1

One-Shot Learning of Multi-Step Tasks from Observation via Activity Localization in Auxiliary Video

1 code implementation29 Jun 2018 Wonjoon Goo, Scott Niekum

Due to burdensome data requirements, learning from demonstration often falls short of its promise to allow users to quickly and naturally program robots.

One-Shot Learning Task 2

Importance Sampling Policy Evaluation with an Estimated Behavior Policy

1 code implementation4 Jun 2018 Josiah P. Hanna, Scott Niekum, Peter Stone

We find that this estimator often lowers the mean squared error of off-policy evaluation compared to importance sampling with the true behavior policy or using a behavior policy that is estimated from a separate data set.

Off-policy evaluation

Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications

1 code implementation20 May 2018 Daniel S. Brown, Scott Niekum

Inverse reinforcement learning (IRL) infers a reward function from demonstrations, allowing for policy improvement and generalization.

Decision Making reinforcement-learning +1

Efficient Hierarchical Robot Motion Planning Under Uncertainty and Hybrid Dynamics

1 code implementation12 Feb 2018 Ajinkya Jain, Scott Niekum

This hierarchical planning approach results in a decomposition of the POMDP planning problem into smaller sub-parts that can be solved with significantly lower computational costs.

Motion Planning

Safe Reinforcement Learning via Shielding

1 code implementation29 Aug 2017 Mohammed Alshiekh, Roderick Bloem, Ruediger Ehlers, Bettina Könighofer, Scott Niekum, Ufuk Topcu

In the first one, the shield acts each time the learning agent is about to make a decision and provides a list of safe actions.

reinforcement-learning Reinforcement Learning (RL) +1

Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning

3 code implementations3 Jul 2017 Daniel S. Brown, Scott Niekum

In the field of reinforcement learning there has been recent progress towards safety and high-confidence bounds on policy performance.

reinforcement-learning Reinforcement Learning (RL)

Data-Efficient Policy Evaluation Through Behavior Policy Search

1 code implementation ICML 2017 Josiah P. Hanna, Philip S. Thomas, Peter Stone, Scott Niekum

The standard unbiased technique for evaluating a policy is to deploy the policy and observe its performance.

Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

no code implementations20 Jun 2016 Josiah P. Hanna, Peter Stone, Scott Niekum

In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data in both continuous and discrete state spaces.

Off-policy evaluation

Policy Evaluation Using the Ω-Return

no code implementations NeurIPS 2015 Philip S. Thomas, Scott Niekum, Georgios Theocharous, George Konidaris

The benefit of the Ω-return is that it accounts for the correlation of different length returns.

Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery

no code implementations NeurIPS 2011 Scott Niekum, Andrew G. Barto

Skill discovery algorithms in reinforcement learning typically identify single states or regions in state space that correspond to task-specific subgoals.

Clustering Reinforcement Learning (RL)

TD_gamma: Re-evaluating Complex Backups in Temporal Difference Learning

no code implementations NeurIPS 2011 George Konidaris, Scott Niekum, Philip S. Thomas

We show that the lambda-return target used in the TD(lambda) family of algorithms is the maximum likelihood estimator for a specific model of how the variance of an n-step return estimate increases with n. We introduce the gamma-return estimator, an alternative target based on a more accurate model of variance, which defines the TD_gamma family of complex-backup temporal difference learning algorithms.

Cannot find the paper you are looking for? You can Submit a new open access paper.