Search Results for author: Emma Brunskill

Found 68 papers, 18 papers with code

Constraint Sampling Reinforcement Learning: Incorporating Expertise For Faster Learning

1 code implementation30 Dec 2021 Tong Mu, Georgios Theocharous, David Arbour, Emma Brunskill

Online reinforcement learning (RL) algorithms are often difficult to deploy in complex human-facing applications as they may learn slowly and have poor early performance.

reinforcement-learning

Reinforcement Learning with State Observation Costs in Action-Contingent Noiselessly Observable Markov Decision Processes

no code implementations NeurIPS 2021 HyunJi Nam, Scott Fleming, Emma Brunskill

Many real-world problems that require making optimal sequences of decisions under uncertainty involve costs when the agent wishes to obtain information about its environment.

reinforcement-learning

Identification of Subgroups With Similar Benefits in Off-Policy Policy Evaluation

no code implementations28 Nov 2021 Ramtin Keramati, Omer Gottesman, Leo Anthony Celi, Finale Doshi-Velez, Emma Brunskill

Off-policy policy evaluation methods for sequential decision making can be used to help identify if a proposed decision policy is better than a current baseline policy.

Decision Making

Evaluating Treatment Prioritization Rules via Rank-Weighted Average Treatment Effects

no code implementations15 Nov 2021 Steve Yadlowsky, Scott Fleming, Nigam Shah, Emma Brunskill, Stefan Wager

On the other hand, in a large marketing trial, we find robust evidence of heterogeneity in the treatment effects of some digital advertising campaigns and demonstrate how RATEs can be used to compare targeting rules that prioritize estimated risk vs. those that prioritize estimated treatment benefit.

Play to Grade: Testing Coding Games as Classifying Markov Decision Process

1 code implementation NeurIPS 2021 Allen Nie, Emma Brunskill, Chris Piech

Contemporary coding education often presents students with the task of developing programs that have user interaction and complex dynamic systems, such as mouse based games.

Avoiding Overfitting to the Importance Weights in Offline Policy Optimization

no code implementations29 Sep 2021 Yao Liu, Emma Brunskill

Offline policy optimization has a critical impact on many real-world decision-making problems, as online learning is costly and concerning in many applications.

Decision Making online learning

Learning to be Fair: A Consequentialist Approach to Equitable Decision-Making

1 code implementation18 Sep 2021 Alex Chohlas-Wood, Madison Coots, Henry Zhu, Emma Brunskill, Sharad Goel

In the dominant paradigm for designing equitable machine learning systems, one works to ensure that model predictions satisfy various fairness criteria, such as parity in error rates across race, gender, and other legally protected traits.

Decision Making Fairness

Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning

no code implementations NeurIPS 2021 Andrea Zanette, Martin J. Wainwright, Emma Brunskill

Actor-critic methods are widely used in offline reinforcement learning practice, but are not so well-understood theoretically.

reinforcement-learning

On the Opportunities and Risks of Foundation Models

no code implementations16 Aug 2021 Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Kohd, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang

AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.

Transfer Learning

Design of Experiments for Stochastic Contextual Linear Bandits

no code implementations NeurIPS 2021 Andrea Zanette, Kefan Dong, Jonathan Lee, Emma Brunskill

In the stochastic linear contextual bandit setting there exist several minimax procedures for exploration with policies that are reactive to the data being acquired.

Universal Off-Policy Evaluation

1 code implementation NeurIPS 2021 Yash Chandak, Scott Niekum, Bruno Castro da Silva, Erik Learned-Miller, Emma Brunskill, Philip S. Thomas

When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy.

Decision Making

Play to Grade: Grading Interactive Coding Games as Classifying Markov Decision Process

no code implementations1 Jan 2021 Allen Nie, Emma Brunskill, Chris Piech

Contemporary coding education often present students with the task of developing programs that have user interaction and complex dynamic systems, such as mouse based games.

Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration

no code implementations NeurIPS 2020 Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

Doing batch RL in a way that yields a reliable new policy in large domains is challenging: a new decision policy may visit states and actions outside the support of the batch data, and function approximation and optimization with limited samples can further increase the potential of learning policies with overly optimistic estimates of their future performance.

reinforcement-learning

Online Model Selection for Reinforcement Learning with Function Approximation

no code implementations19 Nov 2020 Jonathan N. Lee, Aldo Pacchiano, Vidya Muthukumar, Weihao Kong, Emma Brunskill

Towards this end, we consider the problem of model selection in RL with function approximation, given a set of candidate RL algorithms with known regret guarantees.

Model Selection reinforcement-learning

Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration

no code implementations NeurIPS 2020 Andrea Zanette, Alessandro Lazaric, Mykel J. Kochenderfer, Emma Brunskill

There has been growing progress on theoretical analyses for provably efficient learning in MDPs with linear function approximation, but much of the existing work has made strong assumptions to enable exploration by conventional exploration frameworks.

Provably Good Batch Reinforcement Learning Without Great Exploration

1 code implementation16 Jul 2020 Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

Doing batch RL in a way that yields a reliable new policy in large domains is challenging: a new decision policy may visit states and actions outside the support of the batch data, and function approximation and optimization with limited samples can further increase the potential of learning policies with overly optimistic estimates of their future performance.

reinforcement-learning

Learning Abstract Models for Strategic Exploration and Fast Reward Transfer

1 code implementation12 Jul 2020 Evan Zheran Liu, Ramtin Keramati, Sudarshan Seshadri, Kelvin Guu, Panupong Pasupat, Emma Brunskill, Percy Liang

Model-based reinforcement learning (RL) is appealing because (i) it enables planning and thus more strategic exploration, and (ii) by decoupling dynamics from rewards, it enables fast transfer to new reward functions.

Model-based Reinforcement Learning Montezuma's Revenge

Power Constrained Bandits

1 code implementation13 Apr 2020 Jiayu Yao, Emma Brunskill, Weiwei Pan, Susan Murphy, Finale Doshi-Velez

However, when bandits are deployed in the context of a scientific study -- e. g. a clinical trial to test if a mobile health intervention is effective -- the aim is not only to personalize for an individual, but also to determine, with sufficient statistical power, whether or not the system's intervention is effective.

Decision Making Multi-Armed Bandits

Value Driven Representation for Human-in-the-Loop Reinforcement Learning

no code implementations2 Apr 2020 Ramtin Keramati, Emma Brunskill

In such systems there is typically an external human system designer that is creating, monitoring and modifying the interactive adaptive system, trying to improve its performance on the target outcomes.

reinforcement-learning

Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding

1 code implementation NeurIPS 2020 Hongseok Namkoong, Ramtin Keramati, Steve Yadlowsky, Emma Brunskill

We assess robustness of OPE methods under unobserved confounding by developing worst-case bounds on the performance of an evaluation policy.

Decision Making

Learning Near Optimal Policies with Low Inherent Bellman Error

no code implementations ICML 2020 Andrea Zanette, Alessandro Lazaric, Mykel Kochenderfer, Emma Brunskill

This has two important consequences: 1) it shows that exploration is possible using only \emph{batch assumptions} with an algorithm that achieves the optimal statistical rate for the setting we consider, which is more general than prior work on low-rank MDPs 2) the lack of closedness (measured by the inherent Bellman error) is only amplified by $\sqrt{d_t}$ despite working in the online setting.

reinforcement-learning

Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions

no code implementations ICML 2020 Omer Gottesman, Joseph Futoma, Yao Liu, Sonali Parbhoo, Leo Anthony Celi, Emma Brunskill, Finale Doshi-Velez

Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education, but safe deployment in high stakes settings requires ways of assessing its validity.

reinforcement-learning

Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning

2 code implementations31 Jan 2020 Peter Henderson, Jieru Hu, Joshua Romoff, Emma Brunskill, Dan Jurafsky, Joelle Pineau

Accurate reporting of energy and carbon usage is essential for understanding the potential climate impacts of machine learning research.

reinforcement-learning

Sublinear Optimal Policy Value Estimation in Contextual Bandits

no code implementations12 Dec 2019 Weihao Kong, Gregory Valiant, Emma Brunskill

We study the problem of estimating the expected reward of the optimal policy in the stochastic disjoint linear bandit setting.

Multi-Armed Bandits

Almost Horizon-Free Structure-Aware Best Policy Identification with a Generative Model

no code implementations NeurIPS 2019 Andrea Zanette, Mykel J. Kochenderfer, Emma Brunskill

This paper focuses on the problem of computing an $\epsilon$-optimal policy in a discounted Markov Decision Process (MDP) provided that we can access the reward and transition function through a generative model.

Limiting Extrapolation in Linear Approximate Value Iteration

no code implementations NeurIPS 2019 Andrea Zanette, Alessandro Lazaric, Mykel J. Kochenderfer, Emma Brunskill

We prove that if the features at any state can be represented as a convex combination of features at the anchor points, then errors are propagated linearly over iterations (instead of exponentially) and our method achieves a polynomial sample complexity bound in the horizon and the number of anchor points.

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

no code implementations5 Nov 2019 Ramtin Keramati, Christoph Dann, Alex Tamkin, Emma Brunskill

While maximizing expected return is the goal in most reinforcement learning approaches, risk-sensitive objectives such as conditional value at risk (CVaR) are more suitable for many high-stakes applications.

reinforcement-learning

Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling

no code implementations ICML 2020 Yao Liu, Pierre-Luc Bacon, Emma Brunskill

Surprisingly, we find that in finite horizon MDPs there is no strict variance reduction of per-decision importance sampling or stationary importance sampling, comparing with vanilla importance sampling.

Directed Exploration for Reinforcement Learning

no code implementations18 Jun 2019 Zhaohan Daniel Guo, Emma Brunskill

Efficient exploration is necessary to achieve good sample efficiency for reinforcement learning in general.

Efficient Exploration reinforcement-learning

Learning When-to-Treat Policies

1 code implementation23 May 2019 Xinkun Nie, Emma Brunskill, Stefan Wager

Many applied decision-making problems have a dynamic component: The policymaker needs not only to choose whom to treat, but also when to start which treatment.

Decision Making

Learning Abstract Models for Long-Horizon Exploration

no code implementations ICLR 2019 Evan Zheran Liu, Ramtin Keramati, Sudarshan Seshadri, Kelvin Guu, Panupong Pasupat, Emma Brunskill, Percy Liang

In our approach, a manager maintains an abstract MDP over a subset of the abstract states, which grows monotonically through targeted exploration (possible due to the abstract MDP).

Atari Games

Learning Procedural Abstractions and Evaluating Discrete Latent Temporal Structure

1 code implementation ICLR 2019 Karan Goel, Emma Brunskill

Given a dataset of time-series, the goal is to identify the latent sequence of steps common to them and label each time-series with the temporal extent of these procedural steps.

Time Series

PLOTS: Procedure Learning from Observations using Subtask Structure

no code implementations17 Apr 2019 Tong Mu, Karan Goel, Emma Brunskill

In many cases an intelligent agent may want to learn how to mimic a single observed demonstrated trajectory.

Off-Policy Policy Gradient with State Distribution Correction

no code implementations17 Apr 2019 Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

We study the problem of off-policy policy optimization in Markov decision processes, and develop a novel off-policy policy gradient method.

Separating value functions across time-scales

1 code implementation5 Feb 2019 Joshua Romoff, Peter Henderson, Ahmed Touati, Emma Brunskill, Joelle Pineau, Yann Ollivier

In settings where this bias is unacceptable - where the system must optimize for longer horizons at higher discounts - the target of the value function approximator may increase in variance leading to difficulties in learning.

Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds

no code implementations1 Jan 2019 Andrea Zanette, Emma Brunskill

Strong worst-case performance bounds for episodic reinforcement learning exist but fortunately in practice RL algorithms perform much better than such bounds would predict.

Learning Theory reinforcement-learning

Distilling Information from a Flood: A Possibility for the Use of Meta-Analysis and Systematic Review in Machine Learning Research

no code implementations3 Dec 2018 Peter Henderson, Emma Brunskill

The current flood of information in all areas of machine learning research, from computer vision to reinforcement learning, has made it difficult to make aggregate scientific inferences.

Epidemiology

Policy Certificates: Towards Accountable Reinforcement Learning

no code implementations7 Nov 2018 Christoph Dann, Lihong Li, Wei Wei, Emma Brunskill

The performance of a reinforcement learning algorithm can vary drastically during learning because of exploration.

reinforcement-learning

Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters

no code implementations3 Jul 2018 Aniruddh Raghu, Omer Gottesman, Yao Liu, Matthieu Komorowski, Aldo Faisal, Finale Doshi-Velez, Emma Brunskill

In this work, we consider the problem of estimating a behaviour policy for use in Off-Policy Policy Evaluation (OPE) when the true behaviour policy is unknown.

Decoupling Gradient-Like Learning Rules from Representations

no code implementations ICML 2018 Philip Thomas, Christoph Dann, Emma Brunskill

When creating a machine learning system, we must make two decisions: what representation should be used (i. e., what parameterized function should be used) and what learning rule should be used to search through the resulting set of representable functions.

Regret Minimization in MDPs with Options without Prior Knowledge

no code implementations NeurIPS 2017 Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, Emma Brunskill

The option framework integrates temporal abstraction into the reinforcement learning model through the introduction of macro-actions (i. e., options).

On Ensuring that Intelligent Machines Are Well-Behaved

no code implementations17 Aug 2017 Philip S. Thomas, Bruno Castro da Silva, Andrew G. Barto, Emma Brunskill

We propose a new framework for designing machine learning algorithms that simplifies the problem of specifying and regulating undesirable behaviors.

Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines

no code implementations NeurIPS 1999 Philip S. Thomas, Emma Brunskill

We show how an action-dependent baseline can be used by the policy gradient theorem using function approximation, originally presented with action-independent baselines by (Sutton et al. 2000).

Policy Gradient Methods reinforcement-learning

Decoupling Learning Rules from Representations

no code implementations9 Jun 2017 Philip S. Thomas, Christoph Dann, Emma Brunskill

When creating an artificial intelligence system, we must make two decisions: what representation should be used (i. e., what parameterized function should be used) and what learning rule should be used to search through the resulting set of representable functions.

Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

1 code implementation NeurIPS 2017 Christoph Dann, Tor Lattimore, Emma Brunskill

Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare.

reinforcement-learning

Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation

no code implementations NeurIPS 2017 Zhaohan Daniel Guo, Philip S. Thomas, Emma Brunskill

In addition, we can take advantage of special cases that arise due to options-based policies to further improve the performance of importance sampling.

Sample Efficient Feature Selection for Factored MDPs

no code implementations9 Mar 2017 Zhaohan Daniel Guo, Emma Brunskill

This can result in a much better sample complexity when the in-degree of the necessary features is smaller than the in-degree of all features.

feature selection reinforcement-learning

Sample Efficient Policy Search for Optimal Stopping Domains

no code implementations21 Feb 2017 Karan Goel, Christoph Dann, Emma Brunskill

Optimal stopping problems consider the question of deciding when to stop an observation-generating process in order to maximize a return.

Importance Sampling with Unequal Support

no code implementations10 Nov 2016 Philip S. Thomas, Emma Brunskill

Importance sampling is often used in machine learning when training and testing data come from different distributions.

A PAC RL Algorithm for Episodic POMDPs

no code implementations25 May 2016 Zhaohan Daniel Guo, Shayan Doroudi, Emma Brunskill

Many interesting real world domains involve reinforcement learning (RL) in partially observable environments.

reinforcement-learning

Latent Contextual Bandits and their Application to Personalized Recommendations for New Users

no code implementations22 Apr 2016 Li Zhou, Emma Brunskill

We consider both the benefit of leveraging a set of learned latent user classes for new users, and how we can learn such latent classes from prior users.

Multi-Armed Bandits

Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

2 code implementations4 Apr 2016 Philip S. Thomas, Emma Brunskill

In this paper we present a new way of predicting the performance of a reinforcement learning policy given historical data that may have been generated by a different policy.

reinforcement-learning

Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning

no code implementations NeurIPS 2015 Christoph Dann, Emma Brunskill

In this paper, we derive an upper PAC bound $\tilde O(\frac{|\mathcal S|^2 |\mathcal A| H^2}{\epsilon^2} \ln\frac 1 \delta)$ and a lower PAC bound $\tilde \Omega(\frac{|\mathcal S| |\mathcal A| H^2}{\epsilon^2} \ln \frac 1 {\delta + c})$ that match up to log-terms and an additional linear dependency on the number of states $|\mathcal S|$.

reinforcement-learning

The Online Coupon-Collector Problem and Its Application to Lifelong Reinforcement Learning

no code implementations10 Jun 2015 Emma Brunskill, Lihong Li

Transferring knowledge across a sequence of related tasks is an important challenge in reinforcement learning (RL).

reinforcement-learning

Online Stochastic Optimization under Correlated Bandit Feedback

no code implementations4 Feb 2014 Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill

In this paper we consider the problem of online stochastic optimization of a locally smooth function under bandit feedback.

Stochastic Optimization

Efficient Planning under Uncertainty with Macro-actions

no code implementations16 Jan 2014 Ruijie He, Emma Brunskill, Nicholas Roy

We also demonstrate our algorithm being used to control a real robotic helicopter in a target monitoring experiment, which suggests that our approach has practical potential for planning in real-world, large partially observable domains where a multi-step lookahead is required to achieve good performance.

Sample Complexity of Multi-task Reinforcement Learning

no code implementations26 Sep 2013 Emma Brunskill, Lihong Li

Transferring knowledge across a sequence of reinforcement-learning tasks is challenging, and has a number of important applications.

reinforcement-learning

Sequential Transfer in Multi-armed Bandit with Finite Set of Models

no code implementations NeurIPS 2013 Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill

Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents.

online learning reinforcement-learning

Regret Bounds for Reinforcement Learning with Policy Advice

no code implementations5 May 2013 Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill

In some reinforcement learning problems an agent may be provided with a set of input policies, perhaps learned from prior experience or provided by advisors.

reinforcement-learning

Cannot find the paper you are looking for? You can Submit a new open access paper.