Search Results for author: Philip S. Thomas

Found 47 papers, 17 papers with code

From Past to Future: Rethinking Eligibility Traces

no code implementations • 20 Dec 2023 • Dhawal Gupta, Scott M. Jordan, Shreyas Chaudhari, Bo Liu, Philip S. Thomas, Bruno Castro da Silva

In this paper, we introduce a fresh perspective on the challenges of credit assignment and policy evaluation.

Paper
Add Code

Learning Fair Representations with High-Confidence Guarantees

1 code implementation • 23 Oct 2023 • Yuhong Luo, Austin Hoag, Philip S. Thomas

Representation learning is increasingly employed to generate representations that are predictive across multiple downstream tasks.

Fairness Representation Learning

Paper
Code

Coagent Networks: Generalized and Scaled

no code implementations • 16 May 2023 • James E. Kostas, Scott M. Jordan, Yash Chandak, Georgios Theocharous, Dhawal Gupta, Martha White, Bruno Castro da Silva, Philip S. Thomas

However, the coagent framework is not just an alternative to BDL; the two approaches can be blended: BDL can be combined with coagent learning rules to create architectures with the advantages of both approaches.

Reinforcement Learning (RL)

Paper
Add Code

Optimization using Parallel Gradient Evaluations on Multiple Parameters

no code implementations • 6 Feb 2023 • Yash Chandak, Shiv Shankar, Venkata Gandikota, Philip S. Thomas, Arya Mazumdar

We propose a first-order method for convex optimization, where instead of being restricted to the gradient from a single parameter, gradients from multiple parameters can be used during each step of gradient descent.

Paper
Add Code

Off-Policy Evaluation for Action-Dependent Non-Stationary Environments

1 code implementation • 24 Jan 2023 • Yash Chandak, Shiv Shankar, Nathaniel D. Bastian, Bruno Castro da Silva, Emma Brunskil, Philip S. Thomas

Methods for sequential decision-making are often built upon a foundational assumption that the underlying decision process is stationary.

counterfactual Counterfactual Reasoning +2

Paper
Code

Low Variance Off-policy Evaluation with State-based Importance Sampling

1 code implementation • 7 Dec 2022 • David M. Bossens, Philip S. Thomas

In off-policy reinforcement learning, a behaviour policy performs exploratory interactions with the environment to obtain state-action-reward samples which are then used to learn a target policy that optimises the expected return.

Density Ratio Estimation Off-policy evaluation

Paper
Code

Enforcing Delayed-Impact Fairness Guarantees

no code implementations • 24 Aug 2022 • Aline Weber, Blossom Metevier, Yuriy Brun, Philip S. Thomas, Bruno Castro da Silva

Recent research has shown that seemingly fair machine learning models, when used to inform decisions that have an impact on peoples' lives or well-being (e. g., applications involving education, employment, and lending), can inadvertently increase social inequality in the long term.

Fairness

Paper
Add Code

Adaptive Rollout Length for Model-Based RL Using Model-Free Deep RL

no code implementations • 6 Jun 2022 • Abhinav Bhatia, Philip S. Thomas, Shlomo Zilberstein

Model-based reinforcement learning promises to learn an optimal policy from fewer interactions with the environment compared to model-free reinforcement learning by learning an intermediate model of the environment in order to predict future interactions.

Decision Making Model-based Reinforcement Learning +2

Paper
Add Code

Edge-Compatible Reinforcement Learning for Recommendations

no code implementations • 10 Dec 2021 • James E. Kostas, Philip S. Thomas, Georgios Theocharous

In this work, we build on asynchronous coagent policy gradient algorithms \citep{kostas2020asynchronous} to propose a principled solution to this problem.

Edge-computing Recommendation Systems +2

Paper
Add Code

Structural Credit Assignment in Neural Networks using Reinforcement Learning

no code implementations • NeurIPS 2021 • Dhawal Gupta, Gabor Mihucz, Matthew Schlegel, James Kostas, Philip S. Thomas, Martha White

In this work, we revisit this approach and investigate if we can leverage other reinforcement learning approaches to improve learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

SOPE: Spectrum of Off-Policy Estimators

1 code implementation • NeurIPS 2021 • Christina J. Yuan, Yash Chandak, Stephen Giguere, Philip S. Thomas, Scott Niekum

In this paper, we present a new perspective on this bias-variance trade-off and show the existence of a spectrum of estimators whose endpoints are SIS and IS.

Decision Making Off-policy evaluation

Paper
Code

Fairness Guarantees under Demographic Shift

no code implementations • ICLR 2022 • Stephen Giguere, Blossom Metevier, Yuriy Brun, Philip S. Thomas, Scott Niekum, Bruno Castro da Silva

Recent studies have demonstrated that using machine learning for social applications can lead to injustice in the form of racist, sexist, and otherwise unfair and discriminatory outcomes.

Fairness

Paper
Add Code

Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs

no code implementations • NeurIPS 2021 • Harsh Satija, Philip S. Thomas, Joelle Pineau, Romain Laroche

We study the problem of Safe Policy Improvement (SPI) under constraints in the offline Reinforcement Learning (RL) setting.

Reinforcement Learning (RL)

Paper
Add Code

Universal Off-Policy Evaluation

1 code implementation • NeurIPS 2021 • Yash Chandak, Scott Niekum, Bruno Castro da Silva, Erik Learned-Miller, Emma Brunskill, Philip S. Thomas

When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy.

counterfactual Decision Making +1

Paper
Code

High-Confidence Off-Policy (or Counterfactual) Variance Estimation

no code implementations • 25 Jan 2021 • Yash Chandak, Shiv Shankar, Philip S. Thomas

Many sequential decision-making systems leverage data collected using prior policies to propose a new policy.

counterfactual Decision Making +1

Paper
Add Code

Security Analysis of Safe and Seldonian Reinforcement Learning Algorithms

no code implementations • NeurIPS 2020 • Pinar Ozisik, Philip S. Thomas

We analyze the extent to which existing methods rely on accurate training data for a specific class of reinforcement learning (RL) algorithms, known as Safe and Seldonian RL.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Towards Safe Policy Improvement for Non-Stationary MDPs

1 code implementation • NeurIPS 2020 • Yash Chandak, Scott M. Jordan, Georgios Theocharous, Martha White, Philip S. Thomas

Many real-world sequential decision-making problems involve critical systems with financial risks and human-life risks.

Decision Making reinforcement-learning +4

Paper
Code

Reinforcement Learning for Strategic Recommendations

no code implementations • 15 Sep 2020 • Georgios Theocharous, Yash Chandak, Philip S. Thomas, Frits de Nijs

Strategic recommendations (SR) refer to the problem where an intelligent agent observes the sequential behaviors and activities of users and decides when and how to interact with them to optimize some long-term objectives, both for the user and the business.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Evaluating the Performance of Reinforcement Learning Algorithms

1 code implementation • ICML 2020 • Scott M. Jordan, Yash Chandak, Daniel Cohen, Mengxue Zhang, Philip S. Thomas

Performance evaluations are critical for quantifying algorithmic advances in reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Optimizing for the Future in Non-Stationary MDPs

1 code implementation • ICML 2020 • Yash Chandak, Georgios Theocharous, Shiv Shankar, Martha White, Sridhar Mahadevan, Philip S. Thomas

Most reinforcement learning methods are based upon the key assumption that the transition dynamics and reward functions are fixed, that is, the underlying Markov decision process is stationary.

Paper
Code

Learning Reusable Options for Multi-Task Reinforcement Learning

no code implementations • 6 Jan 2020 • Francisco M. Garcia, Chris Nota, Philip S. Thomas

Reinforcement learning (RL) has become an increasingly active area of research in recent years.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Offline Contextual Bandits with High Probability Fairness Guarantees

1 code implementation • NeurIPS 2019 • Blossom Metevier, Stephen Giguere, Sarah Brockman, Ari Kobren, Yuriy Brun, Emma Brunskill, Philip S. Thomas

We present RobinHood, an ofﬂine contextual bandit algorithm designed to satisfy a broad family of fairness constraints.

Fairness Multi-Armed Bandits +1

Paper
Code

Reinforcement learning with a network of spiking agents

1 code implementation • NeurIPS Workshop Neuro_AI 2019 • Sneha Aenugu, Abhishek Sharma, Sasikiran Yelamarthi, Hananel Hazan, Philip S. Thomas, Robert Kozma

Neuroscientific theory suggests that dopaminergic neurons broadcast global reward prediction errors to large areas of the brain influencing the synaptic plasticity of the neurons in those regions.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Is the Policy Gradient a Gradient?

no code implementations • 17 Jun 2019 • Chris Nota, Philip S. Thomas

The policy gradient theorem describes the gradient of the expected discounted return with respect to an agent's policy parameters.

Open-Ended Question Answering Policy Gradient Methods

Paper
Add Code

Classical Policy Gradient: Preserving Bellman's Principle of Optimality

no code implementations • 6 Jun 2019 • Philip S. Thomas, Scott M. Jordan, Yash Chandak, Chris Nota, James Kostas

We propose a new objective function for finite-horizon episodic Markov decision processes that better captures Bellman's principle of optimality, and provide an expression for the gradient of the objective.

Paper
Add Code

Reinforcement Learning When All Actions are Not Always Available

1 code implementation • 5 Jun 2019 • Yash Chandak, Georgios Theocharous, Blossom Metevier, Philip S. Thomas

The Markov decision process (MDP) formulation used to model many real-world sequential decision making problems does not efficiently capture the setting where the set of available decisions (actions) at each time step is stochastic.

Decision Making reinforcement-learning +1

Paper
Code

Lifelong Learning with a Changing Action Set

1 code implementation • 5 Jun 2019 • Yash Chandak, Georgios Theocharous, Chris Nota, Philip S. Thomas

have been well-studied in the lifelong learning literature, the setting where the action set changes remains unaddressed.

Decision Making

Paper
Code

A New Confidence Interval for the Mean of a Bounded Random Variable

no code implementations • 15 May 2019 • Erik Learned-Miller, Philip S. Thomas

We present a new method for constructing a confidence interval for the mean of a bounded random variable from samples of the random variable.

Paper
Add Code

Asynchronous Coagent Networks

no code implementations • ICML 2020 • James E. Kostas, Chris Nota, Philip S. Thomas

Coagent policy gradient algorithms (CPGAs) are reinforcement learning algorithms for training a class of stochastic neural networks called coagent networks.

Hierarchical Reinforcement Learning reinforcement-learning +1

Paper
Add Code

A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning

1 code implementation • NeurIPS 2019 • Francisco M. Garcia, Philip S. Thomas

In this paper we consider the problem of how a reinforcement learning agent that is tasked with solving a sequence of reinforcement learning problems (a sequence of Markov decision processes) can use knowledge acquired early in its lifetime to improve its ability to solve new problems.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Privacy Preserving Off-Policy Evaluation

no code implementations • 1 Feb 2019 • Tengyang Xie, Philip S. Thomas, Gerome Miklau

Many reinforcement learning applications involve the use of data that is sensitive, such as medical records of patients or financial information.

Off-policy evaluation Privacy Preserving +1

Paper
Add Code

Learning Action Representations for Reinforcement Learning

no code implementations • 1 Feb 2019 • Yash Chandak, Georgios Theocharous, James Kostas, Scott Jordan, Philip S. Thomas

Most model-free reinforcement learning methods leverage state representations (embeddings) for generalization, but either ignore structure in the space of actions or assume the structure is provided a priori.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Natural Option Critic

no code implementations • 4 Dec 2018 • Saket Tiwari, Philip S. Thomas

In this paper we show how the option-critic architecture can be extended to estimate the natural gradient of the expected discounted return.

Hierarchical Reinforcement Learning

Paper
Add Code

A Compression-Inspired Framework for Macro Discovery

no code implementations • 24 Nov 2017 • Francisco M. Garcia, Bruno C. da Silva, Philip S. Thomas

In this paper we consider the problem of how a reinforcement learning agent tasked with solving a set of related Markov decision processes can use knowledge acquired early in its lifetime to improve its ability to more rapidly solve novel, but related, tasks.

Efficient Exploration

Paper
Add Code

On Ensuring that Intelligent Machines Are Well-Behaved

no code implementations • 17 Aug 2017 • Philip S. Thomas, Bruno Castro da Silva, Andrew G. Barto, Emma Brunskill

We propose a new framework for designing machine learning algorithms that simplifies the problem of specifying and regulating undesirable behaviors.

BIG-bench Machine Learning

Paper
Add Code

Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines

no code implementations • NeurIPS 1999 • Philip S. Thomas, Emma Brunskill

We show how an action-dependent baseline can be used by the policy gradient theorem using function approximation, originally presented with action-independent baselines by (Sutton et al. 2000).

Policy Gradient Methods reinforcement-learning +1

Paper
Add Code

Data-Efficient Policy Evaluation Through Behavior Policy Search

1 code implementation • ICML 2017 • Josiah P. Hanna, Philip S. Thomas, Peter Stone, Scott Niekum

The standard unbiased technique for evaluating a policy is to deploy the policy and observe its performance.

Paper
Code

Decoupling Learning Rules from Representations

no code implementations • 9 Jun 2017 • Philip S. Thomas, Christoph Dann, Emma Brunskill

When creating an artificial intelligence system, we must make two decisions: what representation should be used (i. e., what parameterized function should be used) and what learning rule should be used to search through the resulting set of representable functions.

Paper
Add Code

Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation

no code implementations • NeurIPS 2017 • Zhaohan Daniel Guo, Philip S. Thomas, Emma Brunskill

In addition, we can take advantage of special cases that arise due to options-based policies to further improve the performance of importance sampling.

Paper
Add Code

Importance Sampling with Unequal Support

no code implementations • 10 Nov 2016 • Philip S. Thomas, Emma Brunskill

Importance sampling is often used in machine learning when training and testing data come from different distributions.

Paper
Add Code

Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

3 code implementations • 4 Apr 2016 • Philip S. Thomas, Emma Brunskill

In this paper we present a new way of predicting the performance of a reinforcement learning policy given historical data that may have been generated by a different policy.

reinforcement-learning Reinforcement Learning (RL)

3,521

Paper
Code

A Notation for Markov Decision Processes

1 code implementation • 30 Dec 2015 • Philip S. Thomas, Billy Okal

This paper specifies a notation for Markov decision processes.

Paper
Code

Increasing the Action Gap: New Operators for Reinforcement Learning

2 code implementations • 15 Dec 2015 • Marc G. Bellemare, Georg Ostrovski, Arthur Guez, Philip S. Thomas, Rémi Munos

Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman operator.

Ranked #1 on Atari Games on Atari 2600 Elevator Action

Atari Games Q-Learning +2

4,398

Paper
Code

Policy Evaluation Using the Ω-Return

no code implementations • NeurIPS 2015 • Philip S. Thomas, Scott Niekum, Georgios Theocharous, George Konidaris

The benefit of the Ω-return is that it accounts for the correlation of different length returns.

Paper
Add Code

Projected Natural Actor-Critic

no code implementations • NeurIPS 2013 • Philip S. Thomas, William C. Dabney, Stephen Giguere, Sridhar Mahadevan

Natural actor-critics are a popular class of policy search algorithms for finding locally optimal policies for Markov decision processes.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

TD_gamma: Re-evaluating Complex Backups in Temporal Difference Learning

no code implementations • NeurIPS 2011 • George Konidaris, Scott Niekum, Philip S. Thomas

We show that the lambda-return target used in the TD(lambda) family of algorithms is the maximum likelihood estimator for a specific model of how the variance of an n-step return estimate increases with n. We introduce the gamma-return estimator, an alternative target based on a more accurate model of variance, which defines the TD_gamma family of complex-backup temporal difference learning algorithms.

Paper
Add Code

Policy Gradient Coagent Networks

no code implementations • NeurIPS 2011 • Philip S. Thomas

We present a novel class of actor-critic algorithms for actors consisting of sets of interacting modules.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.