Search Results for author: Philip S. Thomas

Found 47 papers, 17 papers with code

From Past to Future: Rethinking Eligibility Traces

no code implementations20 Dec 2023 Dhawal Gupta, Scott M. Jordan, Shreyas Chaudhari, Bo Liu, Philip S. Thomas, Bruno Castro da Silva

In this paper, we introduce a fresh perspective on the challenges of credit assignment and policy evaluation.

Learning Fair Representations with High-Confidence Guarantees

1 code implementation23 Oct 2023 Yuhong Luo, Austin Hoag, Philip S. Thomas

Representation learning is increasingly employed to generate representations that are predictive across multiple downstream tasks.

Fairness Representation Learning

Coagent Networks: Generalized and Scaled

no code implementations16 May 2023 James E. Kostas, Scott M. Jordan, Yash Chandak, Georgios Theocharous, Dhawal Gupta, Martha White, Bruno Castro da Silva, Philip S. Thomas

However, the coagent framework is not just an alternative to BDL; the two approaches can be blended: BDL can be combined with coagent learning rules to create architectures with the advantages of both approaches.

Reinforcement Learning (RL)

Optimization using Parallel Gradient Evaluations on Multiple Parameters

no code implementations6 Feb 2023 Yash Chandak, Shiv Shankar, Venkata Gandikota, Philip S. Thomas, Arya Mazumdar

We propose a first-order method for convex optimization, where instead of being restricted to the gradient from a single parameter, gradients from multiple parameters can be used during each step of gradient descent.

Off-Policy Evaluation for Action-Dependent Non-Stationary Environments

1 code implementation24 Jan 2023 Yash Chandak, Shiv Shankar, Nathaniel D. Bastian, Bruno Castro da Silva, Emma Brunskil, Philip S. Thomas

Methods for sequential decision-making are often built upon a foundational assumption that the underlying decision process is stationary.

counterfactual Counterfactual Reasoning +2

Low Variance Off-policy Evaluation with State-based Importance Sampling

1 code implementation7 Dec 2022 David M. Bossens, Philip S. Thomas

In off-policy reinforcement learning, a behaviour policy performs exploratory interactions with the environment to obtain state-action-reward samples which are then used to learn a target policy that optimises the expected return.

Density Ratio Estimation Off-policy evaluation

Enforcing Delayed-Impact Fairness Guarantees

no code implementations24 Aug 2022 Aline Weber, Blossom Metevier, Yuriy Brun, Philip S. Thomas, Bruno Castro da Silva

Recent research has shown that seemingly fair machine learning models, when used to inform decisions that have an impact on peoples' lives or well-being (e. g., applications involving education, employment, and lending), can inadvertently increase social inequality in the long term.

Fairness

Adaptive Rollout Length for Model-Based RL Using Model-Free Deep RL

no code implementations6 Jun 2022 Abhinav Bhatia, Philip S. Thomas, Shlomo Zilberstein

Model-based reinforcement learning promises to learn an optimal policy from fewer interactions with the environment compared to model-free reinforcement learning by learning an intermediate model of the environment in order to predict future interactions.

Decision Making Model-based Reinforcement Learning +2

Edge-Compatible Reinforcement Learning for Recommendations

no code implementations10 Dec 2021 James E. Kostas, Philip S. Thomas, Georgios Theocharous

In this work, we build on asynchronous coagent policy gradient algorithms \citep{kostas2020asynchronous} to propose a principled solution to this problem.

Edge-computing Recommendation Systems +2

SOPE: Spectrum of Off-Policy Estimators

1 code implementation NeurIPS 2021 Christina J. Yuan, Yash Chandak, Stephen Giguere, Philip S. Thomas, Scott Niekum

In this paper, we present a new perspective on this bias-variance trade-off and show the existence of a spectrum of estimators whose endpoints are SIS and IS.

Decision Making Off-policy evaluation

Fairness Guarantees under Demographic Shift

no code implementations ICLR 2022 Stephen Giguere, Blossom Metevier, Yuriy Brun, Philip S. Thomas, Scott Niekum, Bruno Castro da Silva

Recent studies have demonstrated that using machine learning for social applications can lead to injustice in the form of racist, sexist, and otherwise unfair and discriminatory outcomes.

Fairness

Universal Off-Policy Evaluation

1 code implementation NeurIPS 2021 Yash Chandak, Scott Niekum, Bruno Castro da Silva, Erik Learned-Miller, Emma Brunskill, Philip S. Thomas

When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy.

counterfactual Decision Making +1

High-Confidence Off-Policy (or Counterfactual) Variance Estimation

no code implementations25 Jan 2021 Yash Chandak, Shiv Shankar, Philip S. Thomas

Many sequential decision-making systems leverage data collected using prior policies to propose a new policy.

counterfactual Decision Making +1

Security Analysis of Safe and Seldonian Reinforcement Learning Algorithms

no code implementations NeurIPS 2020 Pinar Ozisik, Philip S. Thomas

We analyze the extent to which existing methods rely on accurate training data for a specific class of reinforcement learning (RL) algorithms, known as Safe and Seldonian RL.

reinforcement-learning Reinforcement Learning (RL)

Reinforcement Learning for Strategic Recommendations

no code implementations15 Sep 2020 Georgios Theocharous, Yash Chandak, Philip S. Thomas, Frits de Nijs

Strategic recommendations (SR) refer to the problem where an intelligent agent observes the sequential behaviors and activities of users and decides when and how to interact with them to optimize some long-term objectives, both for the user and the business.

reinforcement-learning Reinforcement Learning (RL)

Optimizing for the Future in Non-Stationary MDPs

1 code implementation ICML 2020 Yash Chandak, Georgios Theocharous, Shiv Shankar, Martha White, Sridhar Mahadevan, Philip S. Thomas

Most reinforcement learning methods are based upon the key assumption that the transition dynamics and reward functions are fixed, that is, the underlying Markov decision process is stationary.

Reinforcement learning with a network of spiking agents

1 code implementation NeurIPS Workshop Neuro_AI 2019 Sneha Aenugu, Abhishek Sharma, Sasikiran Yelamarthi, Hananel Hazan, Philip S. Thomas, Robert Kozma

Neuroscientific theory suggests that dopaminergic neurons broadcast global reward prediction errors to large areas of the brain influencing the synaptic plasticity of the neurons in those regions.

reinforcement-learning Reinforcement Learning (RL)

Is the Policy Gradient a Gradient?

no code implementations17 Jun 2019 Chris Nota, Philip S. Thomas

The policy gradient theorem describes the gradient of the expected discounted return with respect to an agent's policy parameters.

Open-Ended Question Answering Policy Gradient Methods

Classical Policy Gradient: Preserving Bellman's Principle of Optimality

no code implementations6 Jun 2019 Philip S. Thomas, Scott M. Jordan, Yash Chandak, Chris Nota, James Kostas

We propose a new objective function for finite-horizon episodic Markov decision processes that better captures Bellman's principle of optimality, and provide an expression for the gradient of the objective.

Reinforcement Learning When All Actions are Not Always Available

1 code implementation5 Jun 2019 Yash Chandak, Georgios Theocharous, Blossom Metevier, Philip S. Thomas

The Markov decision process (MDP) formulation used to model many real-world sequential decision making problems does not efficiently capture the setting where the set of available decisions (actions) at each time step is stochastic.

Decision Making reinforcement-learning +1

Lifelong Learning with a Changing Action Set

1 code implementation5 Jun 2019 Yash Chandak, Georgios Theocharous, Chris Nota, Philip S. Thomas

have been well-studied in the lifelong learning literature, the setting where the action set changes remains unaddressed.

Decision Making

A New Confidence Interval for the Mean of a Bounded Random Variable

no code implementations15 May 2019 Erik Learned-Miller, Philip S. Thomas

We present a new method for constructing a confidence interval for the mean of a bounded random variable from samples of the random variable.

Asynchronous Coagent Networks

no code implementations ICML 2020 James E. Kostas, Chris Nota, Philip S. Thomas

Coagent policy gradient algorithms (CPGAs) are reinforcement learning algorithms for training a class of stochastic neural networks called coagent networks.

Hierarchical Reinforcement Learning reinforcement-learning +1

A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning

1 code implementation NeurIPS 2019 Francisco M. Garcia, Philip S. Thomas

In this paper we consider the problem of how a reinforcement learning agent that is tasked with solving a sequence of reinforcement learning problems (a sequence of Markov decision processes) can use knowledge acquired early in its lifetime to improve its ability to solve new problems.

reinforcement-learning Reinforcement Learning (RL)

Privacy Preserving Off-Policy Evaluation

no code implementations1 Feb 2019 Tengyang Xie, Philip S. Thomas, Gerome Miklau

Many reinforcement learning applications involve the use of data that is sensitive, such as medical records of patients or financial information.

Off-policy evaluation Privacy Preserving +1

Learning Action Representations for Reinforcement Learning

no code implementations1 Feb 2019 Yash Chandak, Georgios Theocharous, James Kostas, Scott Jordan, Philip S. Thomas

Most model-free reinforcement learning methods leverage state representations (embeddings) for generalization, but either ignore structure in the space of actions or assume the structure is provided a priori.

reinforcement-learning Reinforcement Learning (RL)

Natural Option Critic

no code implementations4 Dec 2018 Saket Tiwari, Philip S. Thomas

In this paper we show how the option-critic architecture can be extended to estimate the natural gradient of the expected discounted return.

Hierarchical Reinforcement Learning

A Compression-Inspired Framework for Macro Discovery

no code implementations24 Nov 2017 Francisco M. Garcia, Bruno C. da Silva, Philip S. Thomas

In this paper we consider the problem of how a reinforcement learning agent tasked with solving a set of related Markov decision processes can use knowledge acquired early in its lifetime to improve its ability to more rapidly solve novel, but related, tasks.

Efficient Exploration

On Ensuring that Intelligent Machines Are Well-Behaved

no code implementations17 Aug 2017 Philip S. Thomas, Bruno Castro da Silva, Andrew G. Barto, Emma Brunskill

We propose a new framework for designing machine learning algorithms that simplifies the problem of specifying and regulating undesirable behaviors.

BIG-bench Machine Learning

Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines

no code implementations NeurIPS 1999 Philip S. Thomas, Emma Brunskill

We show how an action-dependent baseline can be used by the policy gradient theorem using function approximation, originally presented with action-independent baselines by (Sutton et al. 2000).

Policy Gradient Methods reinforcement-learning +1

Data-Efficient Policy Evaluation Through Behavior Policy Search

1 code implementation ICML 2017 Josiah P. Hanna, Philip S. Thomas, Peter Stone, Scott Niekum

The standard unbiased technique for evaluating a policy is to deploy the policy and observe its performance.

Decoupling Learning Rules from Representations

no code implementations9 Jun 2017 Philip S. Thomas, Christoph Dann, Emma Brunskill

When creating an artificial intelligence system, we must make two decisions: what representation should be used (i. e., what parameterized function should be used) and what learning rule should be used to search through the resulting set of representable functions.

Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation

no code implementations NeurIPS 2017 Zhaohan Daniel Guo, Philip S. Thomas, Emma Brunskill

In addition, we can take advantage of special cases that arise due to options-based policies to further improve the performance of importance sampling.

Importance Sampling with Unequal Support

no code implementations10 Nov 2016 Philip S. Thomas, Emma Brunskill

Importance sampling is often used in machine learning when training and testing data come from different distributions.

Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

3 code implementations4 Apr 2016 Philip S. Thomas, Emma Brunskill

In this paper we present a new way of predicting the performance of a reinforcement learning policy given historical data that may have been generated by a different policy.

reinforcement-learning Reinforcement Learning (RL)

A Notation for Markov Decision Processes

1 code implementation30 Dec 2015 Philip S. Thomas, Billy Okal

This paper specifies a notation for Markov decision processes.

Increasing the Action Gap: New Operators for Reinforcement Learning

2 code implementations15 Dec 2015 Marc G. Bellemare, Georg Ostrovski, Arthur Guez, Philip S. Thomas, Rémi Munos

Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman operator.

Atari Games Q-Learning +2

Policy Evaluation Using the Ω-Return

no code implementations NeurIPS 2015 Philip S. Thomas, Scott Niekum, Georgios Theocharous, George Konidaris

The benefit of the Ω-return is that it accounts for the correlation of different length returns.

Projected Natural Actor-Critic

no code implementations NeurIPS 2013 Philip S. Thomas, William C. Dabney, Stephen Giguere, Sridhar Mahadevan

Natural actor-critics are a popular class of policy search algorithms for finding locally optimal policies for Markov decision processes.

reinforcement-learning Reinforcement Learning (RL)

TD_gamma: Re-evaluating Complex Backups in Temporal Difference Learning

no code implementations NeurIPS 2011 George Konidaris, Scott Niekum, Philip S. Thomas

We show that the lambda-return target used in the TD(lambda) family of algorithms is the maximum likelihood estimator for a specific model of how the variance of an n-step return estimate increases with n. We introduce the gamma-return estimator, an alternative target based on a more accurate model of variance, which defines the TD_gamma family of complex-backup temporal difference learning algorithms.

Policy Gradient Coagent Networks

no code implementations NeurIPS 2011 Philip S. Thomas

We present a novel class of actor-critic algorithms for actors consisting of sets of interacting modules.

reinforcement-learning Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.