Search Results for author: Emma Brunskill

Found 84 papers, 26 papers with code

Psychometric Alignment: Capturing Human Knowledge Distributions via Language Models

1 code implementation22 Jul 2024 Joy He-Yueya, Wanjing Anya Ma, Kanishk Gandhi, Benjamin W. Domingue, Emma Brunskill, Noah D. Goodman

We demonstrate that our metric can capture important variations in populations that traditional metrics, like differences in accuracy, fail to capture.

Short-Long Policy Evaluation with Novel Actions

no code implementations4 Jul 2024 Hyunji Alex Nam, Yash Chandak, Emma Brunskill

From incorporating LLMs in education, to identifying new drugs and improving ways to charge batteries, innovators constantly try new strategies in search of better long-term outcomes for students, patients and consumers.

Decision Making Sequential Decision Making

Roleplay-doh: Enabling Domain-Experts to Create LLM-simulated Patients via Eliciting and Adhering to Principles

no code implementations1 Jul 2024 Ryan Louie, Ananjan Nandi, William Fang, Cheng Chang, Emma Brunskill, Diyi Yang

To address this, we develop Roleplay-doh, a novel human-LLM collaboration pipeline that elicits qualitative feedback from a domain-expert, which is transformed into a set of principles, or natural language rules, that govern an LLM-prompted roleplay.

Data-driven Error Estimation: Upper Bounding Multiple Errors with No Technical Debt

no code implementations7 May 2024 Sanath Kumar Krishnamurthy, Susan Athey, Emma Brunskill

However, for a class of estimate-estimand-error tuples, nontrivial high probability upper bounds on the maximum error often require class complexity as input -- limiting the practicality of such methods and often resulting in loose bounds.

valid

The GPT Surprise: Offering Large Language Model Chat in a Massive Coding Class Reduced Engagement but Increased Adopters Exam Performances

no code implementations25 Apr 2024 Allen Nie, Yash Chandak, Miroslav Suzara, Malika Ali, Juliette Woodrow, Matt Peng, Mehran Sahami, Emma Brunskill, Chris Piech

To help understand the impact of generic LLM use on coding education, we conducted a large-scale randomized control trial with 5, 831 students from 146 countries in an online coding class in which we provided some students with access to a chat interface with GPT-4.

Language Modelling Large Language Model

Evaluating and Optimizing Educational Content with Large Language Model Judgments

1 code implementation5 Mar 2024 Joy He-Yueya, Noah D. Goodman, Emma Brunskill

We propose an alternative approach that uses Language Models (LMs) as educational experts to assess the impact of various instructions on learning outcomes.

Language Modelling Large Language Model +1

Experiment Planning with Function Approximation

no code implementations NeurIPS 2023 Aldo Pacchiano, Jonathan N. Lee, Emma Brunskill

We study the problem of experiment planning with function approximation in contextual bandit problems.

Model Selection

Adaptive Instrument Design for Indirect Experiments

no code implementations5 Dec 2023 Yash Chandak, Shiv Shankar, Vasilis Syrgkanis, Emma Brunskill

Indirect experiments provide a valuable framework for estimating treatment effects in situations where conducting randomized control trials (RCTs) is impractical or unethical.

Adaptive Interventions with User-Defined Goals for Health Behavior Change

1 code implementation16 Nov 2023 Aishwarya Mandyam, Matthew Jörke, William Denton, Barbara E. Engelhardt, Emma Brunskill

Tailoring advice to a person's unique goals, preferences, and life circumstances is a critical component of health coaching that has been underutilized in adaptive algorithms for mobile health interventions.

Thompson Sampling

Estimating Optimal Policy Value in General Linear Contextual Bandits

no code implementations19 Feb 2023 Jonathan N. Lee, Weihao Kong, Aldo Pacchiano, Vidya Muthukumar, Emma Brunskill

Whether this is possible for more realistic context distributions has remained an open and important question for tasks such as model selection.

Model Selection Multi-Armed Bandits

Model-based Offline Reinforcement Learning with Local Misspecification

no code implementations26 Jan 2023 Kefan Dong, Yannis Flet-Berliac, Allen Nie, Emma Brunskill

We present a model-based offline reinforcement learning policy performance lower bound that explicitly captures dynamics model misspecification and distribution mismatch and we propose an empirical algorithm for optimal offline policy selection.

D4RL reinforcement-learning +2

Giving Feedback on Interactive Student Programs with Meta-Exploration

1 code implementation16 Nov 2022 Evan Zheran Liu, Moritz Stephan, Allen Nie, Chris Piech, Emma Brunskill, Chelsea Finn

However, teaching and giving feedback on such software is time-consuming -- standard approaches require instructors to manually grade student-implemented interactive programs.

Oracle Inequalities for Model Selection in Offline Reinforcement Learning

no code implementations3 Nov 2022 Jonathan N. Lee, George Tucker, Ofir Nachum, Bo Dai, Emma Brunskill

We propose the first model selection algorithm for offline RL that achieves minimax rate-optimal oracle inequalities up to logarithmic factors.

Model Selection Offline RL +3

Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data

no code implementations16 Oct 2022 Allen Nie, Yannis Flet-Berliac, Deon R. Jordan, William Steenbergen, Emma Brunskill

Inspired by statistical model selection methods for supervised learning, we introduce a task- and method-agnostic pipeline for automatically training, comparing, selecting, and deploying the best policy when the provided dataset is limited in size.

Model Selection Offline RL +3

Offline Policy Optimization with Eligible Actions

1 code implementation1 Jul 2022 Yao Liu, Yannis Flet-Berliac, Emma Brunskill

Offline policy optimization could have a large impact on many real-world decision-making problems, as online learning may be infeasible in many applications.

continuous-control Continuous Control +1

Constraint Sampling Reinforcement Learning: Incorporating Expertise For Faster Learning

1 code implementation30 Dec 2021 Tong Mu, Georgios Theocharous, David Arbour, Emma Brunskill

Online reinforcement learning (RL) algorithms are often difficult to deploy in complex human-facing applications as they may learn slowly and have poor early performance.

reinforcement-learning Reinforcement Learning +1

Reinforcement Learning with State Observation Costs in Action-Contingent Noiselessly Observable Markov Decision Processes

1 code implementation NeurIPS 2021 HyunJi Nam, Scott Fleming, Emma Brunskill

Many real-world problems that require making optimal sequences of decisions under uncertainty involve costs when the agent wishes to obtain information about its environment.

Reinforcement Learning (RL)

Identification of Subgroups With Similar Benefits in Off-Policy Policy Evaluation

no code implementations28 Nov 2021 Ramtin Keramati, Omer Gottesman, Leo Anthony Celi, Finale Doshi-Velez, Emma Brunskill

Off-policy policy evaluation methods for sequential decision making can be used to help identify if a proposed decision policy is better than a current baseline policy.

Decision Making Sequential Decision Making

Evaluating Treatment Prioritization Rules via Rank-Weighted Average Treatment Effects

1 code implementation15 Nov 2021 Steve Yadlowsky, Scott Fleming, Nigam Shah, Emma Brunskill, Stefan Wager

We propose rank-weighted average treatment effect (RATE) metrics as a simple and general family of metrics for comparing and testing the quality of treatment prioritization rules.

Marketing

Play to Grade: Testing Coding Games as Classifying Markov Decision Process

1 code implementation NeurIPS 2021 Allen Nie, Emma Brunskill, Chris Piech

Contemporary coding education often presents students with the task of developing programs that have user interaction and complex dynamic systems, such as mouse based games.

Avoiding Overfitting to the Importance Weights in Offline Policy Optimization

no code implementations29 Sep 2021 Yao Liu, Emma Brunskill

Offline policy optimization has a critical impact on many real-world decision-making problems, as online learning is costly and concerning in many applications.

Decision Making

Learning to be Fair: A Consequentialist Approach to Equitable Decision-Making

1 code implementation18 Sep 2021 Alex Chohlas-Wood, Madison Coots, Henry Zhu, Emma Brunskill, Sharad Goel

In our approach, one first elicits stakeholder preferences over the space of possible decisions and the resulting outcomes--such as preferences for balancing spending parity against court appearance rates.

Decision Making Fairness

On the Opportunities and Risks of Foundation Models

2 code implementations16 Aug 2021 Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang

AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.

Transfer Learning

Design of Experiments for Stochastic Contextual Linear Bandits

no code implementations NeurIPS 2021 Andrea Zanette, Kefan Dong, Jonathan Lee, Emma Brunskill

In the stochastic linear contextual bandit setting there exist several minimax procedures for exploration with policies that are reactive to the data being acquired.

Universal Off-Policy Evaluation

1 code implementation NeurIPS 2021 Yash Chandak, Scott Niekum, Bruno Castro da Silva, Erik Learned-Miller, Emma Brunskill, Philip S. Thomas

When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy.

counterfactual Decision Making +2

Play to Grade: Grading Interactive Coding Games as Classifying Markov Decision Process

no code implementations1 Jan 2021 Allen Nie, Emma Brunskill, Chris Piech

Contemporary coding education often present students with the task of developing programs that have user interaction and complex dynamic systems, such as mouse based games.

Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration

no code implementations NeurIPS 2020 Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

Doing batch RL in a way that yields a reliable new policy in large domains is challenging: a new decision policy may visit states and actions outside the support of the batch data, and function approximation and optimization with limited samples can further increase the potential of learning policies with overly optimistic estimates of their future performance.

reinforcement-learning Reinforcement Learning +1

Online Model Selection for Reinforcement Learning with Function Approximation

no code implementations19 Nov 2020 Jonathan N. Lee, Aldo Pacchiano, Vidya Muthukumar, Weihao Kong, Emma Brunskill

Towards this end, we consider the problem of model selection in RL with function approximation, given a set of candidate RL algorithms with known regret guarantees.

Deep Reinforcement Learning Model Selection +2

Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration

no code implementations NeurIPS 2020 Andrea Zanette, Alessandro Lazaric, Mykel J. Kochenderfer, Emma Brunskill

There has been growing progress on theoretical analyses for provably efficient learning in MDPs with linear function approximation, but much of the existing work has made strong assumptions to enable exploration by conventional exploration frameworks.

Provably Good Batch Reinforcement Learning Without Great Exploration

1 code implementation16 Jul 2020 Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

Doing batch RL in a way that yields a reliable new policy in large domains is challenging: a new decision policy may visit states and actions outside the support of the batch data, and function approximation and optimization with limited samples can further increase the potential of learning policies with overly optimistic estimates of their future performance.

reinforcement-learning Reinforcement Learning +1

Learning Abstract Models for Strategic Exploration and Fast Reward Transfer

1 code implementation12 Jul 2020 Evan Zheran Liu, Ramtin Keramati, Sudarshan Seshadri, Kelvin Guu, Panupong Pasupat, Emma Brunskill, Percy Liang

Model-based reinforcement learning (RL) is appealing because (i) it enables planning and thus more strategic exploration, and (ii) by decoupling dynamics from rewards, it enables fast transfer to new reward functions.

Model-based Reinforcement Learning Montezuma's Revenge +2

Power Constrained Bandits

1 code implementation13 Apr 2020 Jiayu Yao, Emma Brunskill, Weiwei Pan, Susan Murphy, Finale Doshi-Velez

However, when bandits are deployed in the context of a scientific study -- e. g. a clinical trial to test if a mobile health intervention is effective -- the aim is not only to personalize for an individual, but also to determine, with sufficient statistical power, whether or not the system's intervention is effective.

Decision Making Multi-Armed Bandits

Value Driven Representation for Human-in-the-Loop Reinforcement Learning

no code implementations2 Apr 2020 Ramtin Keramati, Emma Brunskill

In such systems there is typically an external human system designer that is creating, monitoring and modifying the interactive adaptive system, trying to improve its performance on the target outcomes.

reinforcement-learning Reinforcement Learning +1

Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding

1 code implementation NeurIPS 2020 Hongseok Namkoong, Ramtin Keramati, Steve Yadlowsky, Emma Brunskill

We assess robustness of OPE methods under unobserved confounding by developing worst-case bounds on the performance of an evaluation policy.

Decision Making Management +1

Learning Near Optimal Policies with Low Inherent Bellman Error

no code implementations ICML 2020 Andrea Zanette, Alessandro Lazaric, Mykel Kochenderfer, Emma Brunskill

This has two important consequences: 1) it shows that exploration is possible using only \emph{batch assumptions} with an algorithm that achieves the optimal statistical rate for the setting we consider, which is more general than prior work on low-rank MDPs 2) the lack of closedness (measured by the inherent Bellman error) is only amplified by $\sqrt{d_t}$ despite working in the online setting.

reinforcement-learning Reinforcement Learning +1

Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions

no code implementations ICML 2020 Omer Gottesman, Joseph Futoma, Yao Liu, Sonali Parbhoo, Leo Anthony Celi, Emma Brunskill, Finale Doshi-Velez

Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education, but safe deployment in high stakes settings requires ways of assessing its validity.

Off-policy evaluation reinforcement-learning +1

Sublinear Optimal Policy Value Estimation in Contextual Bandits

no code implementations12 Dec 2019 Weihao Kong, Gregory Valiant, Emma Brunskill

We study the problem of estimating the expected reward of the optimal policy in the stochastic disjoint linear bandit setting.

Multi-Armed Bandits

Almost Horizon-Free Structure-Aware Best Policy Identification with a Generative Model

no code implementations NeurIPS 2019 Andrea Zanette, Mykel J. Kochenderfer, Emma Brunskill

This paper focuses on the problem of computing an $\epsilon$-optimal policy in a discounted Markov Decision Process (MDP) provided that we can access the reward and transition function through a generative model.

Limiting Extrapolation in Linear Approximate Value Iteration

no code implementations NeurIPS 2019 Andrea Zanette, Alessandro Lazaric, Mykel J. Kochenderfer, Emma Brunskill

We prove that if the features at any state can be represented as a convex combination of features at the anchor points, then errors are propagated linearly over iterations (instead of exponentially) and our method achieves a polynomial sample complexity bound in the horizon and the number of anchor points.

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

no code implementations5 Nov 2019 Ramtin Keramati, Christoph Dann, Alex Tamkin, Emma Brunskill

While maximizing expected return is the goal in most reinforcement learning approaches, risk-sensitive objectives such as conditional value at risk (CVaR) are more suitable for many high-stakes applications.

Reinforcement Learning

Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling

no code implementations ICML 2020 Yao Liu, Pierre-Luc Bacon, Emma Brunskill

Surprisingly, we find that in finite horizon MDPs there is no strict variance reduction of per-decision importance sampling or stationary importance sampling, comparing with vanilla importance sampling.

Off-policy evaluation Reinforcement Learning

Directed Exploration for Reinforcement Learning

no code implementations18 Jun 2019 Zhaohan Daniel Guo, Emma Brunskill

Efficient exploration is necessary to achieve good sample efficiency for reinforcement learning in general.

Efficient Exploration reinforcement-learning +2

Learning When-to-Treat Policies

1 code implementation23 May 2019 Xinkun Nie, Emma Brunskill, Stefan Wager

Many applied decision-making problems have a dynamic component: The policymaker needs not only to choose whom to treat, but also when to start which treatment.

Decision Making

Learning Abstract Models for Long-Horizon Exploration

no code implementations ICLR 2019 Evan Zheran Liu, Ramtin Keramati, Sudarshan Seshadri, Kelvin Guu, Panupong Pasupat, Emma Brunskill, Percy Liang

In our approach, a manager maintains an abstract MDP over a subset of the abstract states, which grows monotonically through targeted exploration (possible due to the abstract MDP).

Atari Games Reinforcement Learning

Learning Procedural Abstractions and Evaluating Discrete Latent Temporal Structure

1 code implementation ICLR 2019 Karan Goel, Emma Brunskill

Given a dataset of time-series, the goal is to identify the latent sequence of steps common to them and label each time-series with the temporal extent of these procedural steps.

Clustering Time Series +1

PLOTS: Procedure Learning from Observations using Subtask Structure

no code implementations17 Apr 2019 Tong Mu, Karan Goel, Emma Brunskill

In many cases an intelligent agent may want to learn how to mimic a single observed demonstrated trajectory.

Procedure Learning

Off-Policy Policy Gradient with State Distribution Correction

no code implementations17 Apr 2019 Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

We study the problem of off-policy policy optimization in Markov decision processes, and develop a novel off-policy policy gradient method.

Separating value functions across time-scales

1 code implementation5 Feb 2019 Joshua Romoff, Peter Henderson, Ahmed Touati, Emma Brunskill, Joelle Pineau, Yann Ollivier

In settings where this bias is unacceptable - where the system must optimize for longer horizons at higher discounts - the target of the value function approximator may increase in variance leading to difficulties in learning.

Reinforcement Learning Reinforcement Learning (RL)

Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds

no code implementations1 Jan 2019 Andrea Zanette, Emma Brunskill

Strong worst-case performance bounds for episodic reinforcement learning exist but fortunately in practice RL algorithms perform much better than such bounds would predict.

Learning Theory Reinforcement Learning +1

Distilling Information from a Flood: A Possibility for the Use of Meta-Analysis and Systematic Review in Machine Learning Research

no code implementations3 Dec 2018 Peter Henderson, Emma Brunskill

The current flood of information in all areas of machine learning research, from computer vision to reinforcement learning, has made it difficult to make aggregate scientific inferences.

BIG-bench Machine Learning Epidemiology +1

Policy Certificates: Towards Accountable Reinforcement Learning

no code implementations7 Nov 2018 Christoph Dann, Lihong Li, Wei Wei, Emma Brunskill

The performance of a reinforcement learning algorithm can vary drastically during learning because of exploration.

reinforcement-learning Reinforcement Learning +1

Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters

no code implementations3 Jul 2018 Aniruddh Raghu, Omer Gottesman, Yao Liu, Matthieu Komorowski, Aldo Faisal, Finale Doshi-Velez, Emma Brunskill

In this work, we consider the problem of estimating a behaviour policy for use in Off-Policy Policy Evaluation (OPE) when the true behaviour policy is unknown.

Decoupling Gradient-Like Learning Rules from Representations

no code implementations ICML 2018 Philip Thomas, Christoph Dann, Emma Brunskill

When creating a machine learning system, we must make two decisions: what representation should be used (i. e., what parameterized function should be used) and what learning rule should be used to search through the resulting set of representable functions.

BIG-bench Machine Learning Reinforcement Learning

Regret Minimization in MDPs with Options without Prior Knowledge

no code implementations NeurIPS 2017 Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, Emma Brunskill

The option framework integrates temporal abstraction into the reinforcement learning model through the introduction of macro-actions (i. e., options).

Reinforcement Learning

On Ensuring that Intelligent Machines Are Well-Behaved

no code implementations17 Aug 2017 Philip S. Thomas, Bruno Castro da Silva, Andrew G. Barto, Emma Brunskill

We propose a new framework for designing machine learning algorithms that simplifies the problem of specifying and regulating undesirable behaviors.

BIG-bench Machine Learning

Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines

no code implementations NeurIPS 1999 Philip S. Thomas, Emma Brunskill

We show how an action-dependent baseline can be used by the policy gradient theorem using function approximation, originally presented with action-independent baselines by (Sutton et al. 2000).

Policy Gradient Methods reinforcement-learning +2

Decoupling Learning Rules from Representations

no code implementations9 Jun 2017 Philip S. Thomas, Christoph Dann, Emma Brunskill

When creating an artificial intelligence system, we must make two decisions: what representation should be used (i. e., what parameterized function should be used) and what learning rule should be used to search through the resulting set of representable functions.

Reinforcement Learning

Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation

no code implementations NeurIPS 2017 Zhaohan Daniel Guo, Philip S. Thomas, Emma Brunskill

In addition, we can take advantage of special cases that arise due to options-based policies to further improve the performance of importance sampling.

Sample Efficient Feature Selection for Factored MDPs

no code implementations9 Mar 2017 Zhaohan Daniel Guo, Emma Brunskill

This can result in a much better sample complexity when the in-degree of the necessary features is smaller than the in-degree of all features.

feature selection reinforcement-learning +2

Sample Efficient Policy Search for Optimal Stopping Domains

no code implementations21 Feb 2017 Karan Goel, Christoph Dann, Emma Brunskill

Optimal stopping problems consider the question of deciding when to stop an observation-generating process in order to maximize a return.

Importance Sampling with Unequal Support

no code implementations10 Nov 2016 Philip S. Thomas, Emma Brunskill

Importance sampling is often used in machine learning when training and testing data come from different distributions.

A PAC RL Algorithm for Episodic POMDPs

no code implementations25 May 2016 Zhaohan Daniel Guo, Shayan Doroudi, Emma Brunskill

Many interesting real world domains involve reinforcement learning (RL) in partially observable environments.

reinforcement-learning Reinforcement Learning +1

Latent Contextual Bandits and their Application to Personalized Recommendations for New Users

no code implementations22 Apr 2016 Li Zhou, Emma Brunskill

We consider both the benefit of leveraging a set of learned latent user classes for new users, and how we can learn such latent classes from prior users.

Multi-Armed Bandits

Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

3 code implementations4 Apr 2016 Philip S. Thomas, Emma Brunskill

In this paper we present a new way of predicting the performance of a reinforcement learning policy given historical data that may have been generated by a different policy.

reinforcement-learning Reinforcement Learning +1

Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning

no code implementations NeurIPS 2015 Christoph Dann, Emma Brunskill

In this paper, we derive an upper PAC bound $\tilde O(\frac{|\mathcal S|^2 |\mathcal A| H^2}{\epsilon^2} \ln\frac 1 \delta)$ and a lower PAC bound $\tilde \Omega(\frac{|\mathcal S| |\mathcal A| H^2}{\epsilon^2} \ln \frac 1 {\delta + c})$ that match up to log-terms and an additional linear dependency on the number of states $|\mathcal S|$.

reinforcement-learning Reinforcement Learning +1

The Online Coupon-Collector Problem and Its Application to Lifelong Reinforcement Learning

no code implementations10 Jun 2015 Emma Brunskill, Lihong Li

Transferring knowledge across a sequence of related tasks is an important challenge in reinforcement learning (RL).

reinforcement-learning Reinforcement Learning +1

Online Stochastic Optimization under Correlated Bandit Feedback

no code implementations4 Feb 2014 Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill

In this paper we consider the problem of online stochastic optimization of a locally smooth function under bandit feedback.

Reinforcement Learning Stochastic Optimization

Efficient Planning under Uncertainty with Macro-actions

no code implementations16 Jan 2014 Ruijie He, Emma Brunskill, Nicholas Roy

We also demonstrate our algorithm being used to control a real robotic helicopter in a target monitoring experiment, which suggests that our approach has practical potential for planning in real-world, large partially observable domains where a multi-step lookahead is required to achieve good performance.

Sample Complexity of Multi-task Reinforcement Learning

no code implementations26 Sep 2013 Emma Brunskill, Lihong Li

Transferring knowledge across a sequence of reinforcement-learning tasks is challenging, and has a number of important applications.

reinforcement-learning Reinforcement Learning +1

Regret Bounds for Reinforcement Learning with Policy Advice

no code implementations5 May 2013 Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill

In some reinforcement learning problems an agent may be provided with a set of input policies, perhaps learned from prior experience or provided by advisors.

reinforcement-learning Reinforcement Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.