Search Results for author: Scott Niekum

Found 49 papers, 26 papers with code

Automated Discovery of Functional Actual Causes in Complex Environments

no code implementations • 16 Apr 2024 • Caleb Chuck, Sankaran Vaidyanathan, Stephen Giguere, Amy Zhang, David Jensen, Scott Niekum

This paper introduces functional actual cause (FAC), a framework that uses context-specific independencies in the environment to restrict the set of actual causes.

Paper
Add Code

Learning Action-based Representations Using Invariance

no code implementations • 25 Mar 2024 • Max Rudolph, Caleb Chuck, Kevin Black, Misha Lvovsky, Scott Niekum, Amy Zhang

Robust reinforcement learning agents using high-dimensional observations must be able to identify relevant state features amidst many exogeneous distractors.

Paper
Add Code

SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning

no code implementations • 3 Nov 2023 • Harshit Sikchi, Rohan Chitnis, Ahmed Touati, Alborz Geramifard, Amy Zhang, Scott Niekum

Offline Goal-Conditioned Reinforcement Learning (GCRL) is tasked with learning to achieve multiple goals in an environment purely from offline datasets using sparse reward functions.

Contrastive Learning reinforcement-learning +1

Paper
Add Code

Contrastive Preference Learning: Learning from Human Feedback without RL

1 code implementation • 20 Oct 2023 • Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh

Thus, learning a reward function from feedback is not only based on a flawed assumption of human preference, but also leads to unwieldy optimization challenges that stem from policy gradients or bootstrapping in the RL phase.

reinforcement-learning Reinforcement Learning (RL)

131

Paper
Code

Learning Optimal Advantage from Preferences and Mistaking it for Reward

1 code implementation • 3 Oct 2023 • W. Bradley Knox, Stephane Hatgis-Kessell, Sigurdur Orn Adalgeirsson, Serena Booth, Anca Dragan, Peter Stone, Scott Niekum

Most recent work assumes that human preferences are generated based only upon the reward accrued within those segments, or their partial return.

Paper
Code

Hierarchical Empowerment: Towards Tractable Empowerment-Based Skill Learning

no code implementations • 6 Jul 2023 • Andrew Levy, Sreehari Rammohan, Alessandro Allievi, Scott Niekum, George Konidaris

Our framework makes two specific contributions.

Hierarchical Reinforcement Learning

Paper
Add Code

Granger-Causal Hierarchical Skill Discovery

no code implementations • 15 Jun 2023 • Caleb Chuck, Kevin Black, Aditya Arjun, Yuke Zhu, Scott Niekum

Reinforcement Learning (RL) has demonstrated promising results in learning policies for complex tasks, but it often suffers from low sample efficiency and limited transferability.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Dual RL: Unification and New Methods for Reinforcement and Imitation Learning

1 code implementation • 16 Feb 2023 • Harshit Sikchi, Qinqing Zheng, Amy Zhang, Scott Niekum

For offline RL, our analysis frames a recent offline RL method XQL in the dual framework, and we further propose a new method f-DVL that provides alternative choices to the Gumbel regression loss that fixes the known training instability issue of XQL.

Imitation Learning Offline RL +2

Paper
Code

Language-guided Task Adaptation for Imitation Learning

no code implementations • 24 Jan 2023 • Prasoon Goyal, Raymond J. Mooney, Scott Niekum

We introduce a novel setting, wherein an agent needs to learn a task from a demonstration of a related task with the difference between the tasks communicated in natural language.

Imitation Learning

Paper
Add Code

Models of human preference for learning reward functions

no code implementations • 5 Jun 2022 • W. Bradley Knox, Stephane Hatgis-Kessell, Serena Booth, Scott Niekum, Peter Stone, Alessandro Allievi

We empirically show that our proposed regret preference model outperforms the partial return preference model with finite training data in otherwise the same setting.

Decision Making reinforcement-learning

Paper
Add Code

Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL

no code implementations • 1 Jun 2022 • Wonjoon Goo, Scott Niekum

In this work, we argue that it is not only viable but beneficial to explicitly model the behavior policy for offline RL because the constraint can be realized in a stable way with the trained model.

D4RL Offline RL +1

Paper
Add Code

Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?

no code implementations • 23 Apr 2022 • Yuchen Cui, Scott Niekum, Abhinav Gupta, Vikash Kumar, Aravind Rajeswaran

Task specification is at the core of programming autonomous robots.

Robot Manipulation Scene Understanding

Paper
Add Code

A Ranking Game for Imitation Learning

no code implementations • 7 Feb 2022 • Harshit Sikchi, Akanksha Saran, Wonjoon Goo, Scott Niekum

We propose a new framework for imitation learning -- treating imitation as a two-player ranking-based game between a policy and a reward.

Imitation Learning

Paper
Add Code

SOPE: Spectrum of Off-Policy Estimators

1 code implementation • NeurIPS 2021 • Christina J. Yuan, Yash Chandak, Stephen Giguere, Philip S. Thomas, Scott Niekum

In this paper, we present a new perspective on this bias-variance trade-off and show the existence of a spectrum of estimators whose endpoints are SIS and IS.

Decision Making Off-policy evaluation

Paper
Code

You Only Evaluate Once: a Simple Baseline Algorithm for Offline RL

no code implementations • 5 Oct 2021 • Wonjoon Goo, Scott Niekum

The goal of offline reinforcement learning (RL) is to find an optimal policy given prerecorded trajectories.

D4RL Offline RL +1

Paper
Add Code

Fairness Guarantees under Demographic Shift

no code implementations • ICLR 2022 • Stephen Giguere, Blossom Metevier, Yuriy Brun, Philip S. Thomas, Scott Niekum, Bruno Castro da Silva

Recent studies have demonstrated that using machine learning for social applications can lead to injustice in the form of racist, sexist, and otherwise unfair and discriminatory outcomes.

Fairness

Paper
Add Code

Distributional Depth-Based Estimation of Object Articulation Models

1 code implementation • 12 Aug 2021 • Ajinkya Jain, Stephen Giguere, Rudolf Lioutikov, Scott Niekum

Our core contributions include a novel representation for distributions over rigid body transformations and articulation model parameters based on screw theory, von Mises-Fisher distributions, and Stiefel manifolds.

Benchmarking Object

Paper
Code

On the Benefits of Inducing Local Lipschitzness for Robust Generative Adversarial Imitation Learning

no code implementations • 30 Jun 2021 • Farzan Memarian, Abolfazl Hashemi, Scott Niekum, Ufuk Topcu

We explore methodologies to improve the robustness of generative adversarial imitation learning (GAIL) algorithms to observation noise.

Imitation Learning

Paper
Add Code

Zero-shot Task Adaptation using Natural Language

no code implementations • 5 Jun 2021 • Prasoon Goyal, Raymond J. Mooney, Scott Niekum

Imitation learning and instruction-following are two common approaches to communicate a user's intent to a learning agent.

Imitation Learning Instruction Following

Paper
Add Code

Adversarial Intrinsic Motivation for Reinforcement Learning

1 code implementation • NeurIPS 2021 • Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone

In this paper, we investigate whether one such objective, the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution, can be utilized effectively for reinforcement learning (RL) tasks.

Multi-Goal Reinforcement Learning reinforcement-learning +1

Paper
Code

Universal Off-Policy Evaluation

1 code implementation • NeurIPS 2021 • Yash Chandak, Scott Niekum, Bruno Castro da Silva, Erik Learned-Miller, Emma Brunskill, Philip S. Thomas

When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy.

counterfactual Decision Making +1

Paper
Code

Self-Supervised Online Reward Shaping in Sparse-Reward Environments

1 code implementation • 8 Mar 2021 • Farzan Memarian, Wonjoon Goo, Rudolf Lioutikov, Scott Niekum, Ufuk Topcu

We introduce Self-supervised Online Reward Shaping (SORS), which aims to improve the sample efficiency of any RL algorithm in sparse-reward environments by automatically densifying rewards.

Paper
Code

Value Alignment Verification

1 code implementation • 2 Dec 2020 • Daniel S. Brown, Jordan Schneider, Anca D. Dragan, Scott Niekum

In this paper we formalize and theoretically analyze the problem of efficient value alignment verification: how to efficiently test whether the behavior of another agent is aligned with a human's values.

Autonomous Driving

Paper
Code

The EMPATHIC Framework for Task Learning from Implicit Human Feedback

1 code implementation • 28 Sep 2020 • Yuchen Cui, Qiping Zhang, Alessandro Allievi, Peter Stone, Scott Niekum, W. Bradley Knox

We train a deep neural network on this data and demonstrate its ability to (1) infer relative reward ranking of events in the training task from prerecorded human facial reactions; (2) improve the policy of an agent in the training task using live human facial reactions; and (3) transfer to a novel domain in which it evaluates robot manipulation trajectories.

Human-Computer Interaction Robotics

Paper
Code

ScrewNet: Category-Independent Articulation Model Estimation From Depth Images Using Screw Theory

1 code implementation • 24 Aug 2020 • Ajinkya Jain, Rudolf Lioutikov, Caleb Chuck, Scott Niekum

Robots in human environments will need to interact with a wide variety of articulated objects such as cabinets, drawers, and dishwashers while assisting humans in performing day-to-day tasks.

Benchmarking

Paper
Code

PixL2R: Guiding Reinforcement Learning Using Natural Language by Mapping Pixels to Rewards

1 code implementation • ICML Workshop LaReL 2020 • Prasoon Goyal, Scott Niekum, Raymond J. Mooney

Reinforcement learning (RL), particularly in sparse reward settings, often requires prohibitively large numbers of interactions with the environment, thereby limiting its applicability to complex problems.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Code

Bayesian Robust Optimization for Imitation Learning

1 code implementation • NeurIPS 2020 • Daniel S. Brown, Scott Niekum, Marek Petrik

Existing safe imitation learning approaches based on IRL deal with this uncertainty using a maxmin framework that optimizes a policy under the assumption of an adversarial reward function, whereas risk-neutral IRL approaches either optimize a policy for the mean or MAP reward function.

Imitation Learning reinforcement-learning +1

Paper
Code

Efficiently Guiding Imitation Learning Agents with Human Gaze

no code implementations • 28 Feb 2020 • Akanksha Saran, Ruohan Zhang, Elaine Schaertl Short, Scott Niekum

Based on similarities between the attention of reinforcement learning agents and human gaze, we propose a novel approach for utilizing gaze data in a computationally efficient manner, as part of an auxiliary loss function, which guides a network to have higher activations in image regions where the human's gaze fixated.

Atari Games Imitation Learning

Paper
Add Code

Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences

1 code implementation • ICML 2020 • Daniel S. Brown, Russell Coleman, Ravi Srinivasan, Scott Niekum

Bayesian REX can learn to play Atari games from demonstrations, without access to the game score and can generate 100, 000 samples from the posterior over reward functions in only 5 minutes on a personal laptop.

Atari Games Bayesian Inference +1

Paper
Code

Local Nonparametric Meta-Learning

1 code implementation • 9 Feb 2020 • Wonjoon Goo, Scott Niekum

A central goal of meta-learning is to find a learning rule that enables fast adaptation across a set of tasks, by learning the appropriate inductive bias for that set.

Inductive Bias Meta-Learning

Paper
Code

Deep Bayesian Reward Learning from Preferences

no code implementations • 10 Dec 2019 • Daniel S. Brown, Scott Niekum

Bayesian inverse reinforcement learning (IRL) methods are ideal for safe imitation learning, as they allow a learning agent to reason about reward uncertainty and the safety of a learned policy.

Atari Games Imitation Learning

Paper
Add Code

Better-than-Demonstrator Imitation Learning via Automatically-Ranked Demonstrations

2 code implementations • 9 Jul 2019 • Daniel S. Brown, Wonjoon Goo, Scott Niekum

The performance of imitation learning is typically upper-bounded by the performance of the demonstrator.

Imitation Learning reinforcement-learning +1

2,505

Paper
Code

A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms

no code implementations • 6 Jul 2019 • Oliver Kroemer, Scott Niekum, George Konidaris

A key challenge in intelligent robotics is creating robots that are capable of directly interacting with the world around them to achieve their goals.

Robotics

Paper
Add Code

Hypothesis-Driven Skill Discovery for Hierarchical Deep Reinforcement Learning

no code implementations • 27 May 2019 • Caleb Chuck, Supawit Chockchowwat, Scott Niekum

Deep reinforcement learning (DRL) is capable of learning high-performing policies on a variety of complex high-dimensional tasks, ranging from video games to robotic manipulation.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Uncertainty-Aware Data Aggregation for Deep Imitation Learning

no code implementations • 7 May 2019 • Yuchen Cui, David Isele, Scott Niekum, Kikuo Fujimura

Our analysis shows that UAIL outperforms existing data aggregation algorithms on a series of benchmark tasks.

Autonomous Driving Imitation Learning

Paper
Add Code

Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations

3 code implementations • 12 Apr 2019 • Daniel S. Brown, Wonjoon Goo, Prabhat Nagarajan, Scott Niekum

A critical flaw of existing inverse reinforcement learning (IRL) methods is their inability to significantly outperform the demonstrator.

Imitation Learning reinforcement-learning +1

2,505

Paper
Code

Using Natural Language for Reward Shaping in Reinforcement Learning

1 code implementation • 5 Mar 2019 • Prasoon Goyal, Scott Niekum, Raymond J. Mooney

A common approach to reduce interaction time with the environment is to use reward shaping, which involves carefully designing reward functions that provide the agent intermediate rewards for progress towards the goal.

Montezuma's Revenge reinforcement-learning +1

Paper
Code

Risk-Aware Active Inverse Reinforcement Learning

2 code implementations • 8 Jan 2019 • Daniel S. Brown, Yuchen Cui, Scott Niekum

Active learning from demonstration allows a robot to query a human for specific types of input to achieve efficient learning.

Active Learning reinforcement-learning +1

Paper
Code

One-Shot Learning of Multi-Step Tasks from Observation via Activity Localization in Auxiliary Video

1 code implementation • 29 Jun 2018 • Wonjoon Goo, Scott Niekum

Due to burdensome data requirements, learning from demonstration often falls short of its promise to allow users to quickly and naturally program robots.

One-Shot Learning Task 2

Paper
Code

Importance Sampling Policy Evaluation with an Estimated Behavior Policy

1 code implementation • 4 Jun 2018 • Josiah P. Hanna, Scott Niekum, Peter Stone

We find that this estimator often lowers the mean squared error of off-policy evaluation compared to importance sampling with the true behavior policy or using a behavior policy that is estimated from a separate data set.

Off-policy evaluation

Paper
Code

Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications

1 code implementation • 20 May 2018 • Daniel S. Brown, Scott Niekum

Inverse reinforcement learning (IRL) infers a reward function from demonstrations, allowing for policy improvement and generalization.

Decision Making reinforcement-learning +1

Paper
Code

Efficient Hierarchical Robot Motion Planning Under Uncertainty and Hybrid Dynamics

1 code implementation • 12 Feb 2018 • Ajinkya Jain, Scott Niekum

This hierarchical planning approach results in a decomposition of the POMDP planning problem into smaller sub-parts that can be solved with significantly lower computational costs.

Motion Planning

Paper
Code

Safe Reinforcement Learning via Shielding

1 code implementation • 29 Aug 2017 • Mohammed Alshiekh, Roderick Bloem, Ruediger Ehlers, Bettina Könighofer, Scott Niekum, Ufuk Topcu

In the first one, the shield acts each time the learning agent is about to make a decision and provides a list of safe actions.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Code

Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning

3 code implementations • 3 Jul 2017 • Daniel S. Brown, Scott Niekum

In the field of reinforcement learning there has been recent progress towards safety and high-confidence bounds on policy performance.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Data-Efficient Policy Evaluation Through Behavior Policy Search

1 code implementation • ICML 2017 • Josiah P. Hanna, Philip S. Thomas, Peter Stone, Scott Niekum

The standard unbiased technique for evaluating a policy is to deploy the policy and observe its performance.

Paper
Code

Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

no code implementations • 20 Jun 2016 • Josiah P. Hanna, Peter Stone, Scott Niekum

In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data in both continuous and discrete state spaces.

Off-policy evaluation

Paper
Add Code

Policy Evaluation Using the Ω-Return

no code implementations • NeurIPS 2015 • Philip S. Thomas, Scott Niekum, Georgios Theocharous, George Konidaris

The benefit of the Ω-return is that it accounts for the correlation of different length returns.

Paper
Add Code

Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery

no code implementations • NeurIPS 2011 • Scott Niekum, Andrew G. Barto

Skill discovery algorithms in reinforcement learning typically identify single states or regions in state space that correspond to task-specific subgoals.

Clustering Reinforcement Learning (RL)

Paper
Add Code

TD_gamma: Re-evaluating Complex Backups in Temporal Difference Learning

no code implementations • NeurIPS 2011 • George Konidaris, Scott Niekum, Philip S. Thomas

We show that the lambda-return target used in the TD(lambda) family of algorithms is the maximum likelihood estimator for a specific model of how the variance of an n-step return estimate increases with n. We introduce the gamma-return estimator, an alternative target based on a more accurate model of variance, which defines the TD_gamma family of complex-backup temporal difference learning algorithms.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.