1 code implementation • 22 Jul 2024 • Joy He-Yueya, Wanjing Anya Ma, Kanishk Gandhi, Benjamin W. Domingue, Emma Brunskill, Noah D. Goodman
We demonstrate that our metric can capture important variations in populations that traditional metrics, like differences in accuracy, fail to capture.
no code implementations • 4 Jul 2024 • Hyunji Alex Nam, Yash Chandak, Emma Brunskill
From incorporating LLMs in education, to identifying new drugs and improving ways to charge batteries, innovators constantly try new strategies in search of better long-term outcomes for students, patients and consumers.
no code implementations • 1 Jul 2024 • Ryan Louie, Ananjan Nandi, William Fang, Cheng Chang, Emma Brunskill, Diyi Yang
To address this, we develop Roleplay-doh, a novel human-LLM collaboration pipeline that elicits qualitative feedback from a domain-expert, which is transformed into a set of principles, or natural language rules, that govern an LLM-prompted roleplay.
no code implementations • 7 May 2024 • Sanath Kumar Krishnamurthy, Susan Athey, Emma Brunskill
However, for a class of estimate-estimand-error tuples, nontrivial high probability upper bounds on the maximum error often require class complexity as input -- limiting the practicality of such methods and often resulting in loose bounds.
no code implementations • 25 Apr 2024 • Allen Nie, Yash Chandak, Miroslav Suzara, Malika Ali, Juliette Woodrow, Matt Peng, Mehran Sahami, Emma Brunskill, Chris Piech
To help understand the impact of generic LLM use on coding education, we conducted a large-scale randomized control trial with 5, 831 students from 146 countries in an online coding class in which we provided some students with access to a chat interface with GPT-4.
1 code implementation • 5 Mar 2024 • Joy He-Yueya, Noah D. Goodman, Emma Brunskill
We propose an alternative approach that uses Language Models (LMs) as educational experts to assess the impact of various instructions on learning outcomes.
no code implementations • NeurIPS 2023 • Aldo Pacchiano, Jonathan N. Lee, Emma Brunskill
We study the problem of experiment planning with function approximation in contextual bandit problems.
no code implementations • 5 Dec 2023 • Yash Chandak, Shiv Shankar, Vasilis Syrgkanis, Emma Brunskill
Indirect experiments provide a valuable framework for estimating treatment effects in situations where conducting randomized control trials (RCTs) is impractical or unethical.
1 code implementation • 16 Nov 2023 • Aishwarya Mandyam, Matthew Jörke, William Denton, Barbara E. Engelhardt, Emma Brunskill
Tailoring advice to a person's unique goals, preferences, and life circumstances is a critical component of health coaching that has been underutilized in adaptive algorithms for mobile health interventions.
no code implementations • 11 Apr 2023 • Sherry Ruan, Allen Nie, William Steenbergen, Jiayu He, JQ Zhang, Meng Guo, Yao Liu, Kyle Dang Nguyen, Catherine Y Wang, Rui Ying, James A Landay, Emma Brunskill
Resource limitations make it hard to provide all students with one of the most effective educational interventions: personalized instruction.
Deep Reinforcement Learning Explainable artificial intelligence +2
no code implementations • 19 Feb 2023 • Jonathan N. Lee, Weihao Kong, Aldo Pacchiano, Vidya Muthukumar, Emma Brunskill
Whether this is possible for more realistic context distributions has remained an open and important question for tasks such as model selection.
no code implementations • 26 Jan 2023 • Kefan Dong, Yannis Flet-Berliac, Allen Nie, Emma Brunskill
We present a model-based offline reinforcement learning policy performance lower bound that explicitly captures dynamics model misspecification and distribution mismatch and we propose an empirical algorithm for optimal offline policy selection.
1 code implementation • 16 Nov 2022 • Evan Zheran Liu, Moritz Stephan, Allen Nie, Chris Piech, Emma Brunskill, Chelsea Finn
However, teaching and giving feedback on such software is time-consuming -- standard approaches require instructors to manually grade student-implemented interactive programs.
no code implementations • 3 Nov 2022 • Jonathan N. Lee, George Tucker, Ofir Nachum, Bo Dai, Emma Brunskill
We propose the first model selection algorithm for offline RL that achieves minimax rate-optimal oracle inequalities up to logarithmic factors.
no code implementations • 16 Oct 2022 • Allen Nie, Yannis Flet-Berliac, Deon R. Jordan, William Steenbergen, Emma Brunskill
Inspired by statistical model selection methods for supervised learning, we introduce a task- and method-agnostic pipeline for automatically training, comparing, selecting, and deploying the best policy when the provided dataset is limited in size.
1 code implementation • 1 Jul 2022 • Yao Liu, Yannis Flet-Berliac, Emma Brunskill
Offline policy optimization could have a large impact on many real-world decision-making problems, as online learning may be infeasible in many applications.
1 code implementation • 30 Dec 2021 • Tong Mu, Georgios Theocharous, David Arbour, Emma Brunskill
Online reinforcement learning (RL) algorithms are often difficult to deploy in complex human-facing applications as they may learn slowly and have poor early performance.
1 code implementation • NeurIPS 2021 • HyunJi Nam, Scott Fleming, Emma Brunskill
Many real-world problems that require making optimal sequences of decisions under uncertainty involve costs when the agent wishes to obtain information about its environment.
no code implementations • 28 Nov 2021 • Ramtin Keramati, Omer Gottesman, Leo Anthony Celi, Finale Doshi-Velez, Emma Brunskill
Off-policy policy evaluation methods for sequential decision making can be used to help identify if a proposed decision policy is better than a current baseline policy.
1 code implementation • 15 Nov 2021 • Steve Yadlowsky, Scott Fleming, Nigam Shah, Emma Brunskill, Stefan Wager
We propose rank-weighted average treatment effect (RATE) metrics as a simple and general family of metrics for comparing and testing the quality of treatment prioritization rules.
1 code implementation • NeurIPS 2021 • Allen Nie, Emma Brunskill, Chris Piech
Contemporary coding education often presents students with the task of developing programs that have user interaction and complex dynamic systems, such as mouse based games.
no code implementations • 29 Sep 2021 • Yao Liu, Emma Brunskill
Offline policy optimization has a critical impact on many real-world decision-making problems, as online learning is costly and concerning in many applications.
1 code implementation • 18 Sep 2021 • Alex Chohlas-Wood, Madison Coots, Henry Zhu, Emma Brunskill, Sharad Goel
In our approach, one first elicits stakeholder preferences over the space of possible decisions and the resulting outcomes--such as preferences for balancing spending parity against court appearance rates.
no code implementations • NeurIPS 2021 • Andrea Zanette, Martin J. Wainwright, Emma Brunskill
Actor-critic methods are widely used in offline reinforcement learning practice, but are not so well-understood theoretically.
2 code implementations • 16 Aug 2021 • Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang
AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.
no code implementations • NeurIPS 2021 • Andrea Zanette, Kefan Dong, Jonathan Lee, Emma Brunskill
In the stochastic linear contextual bandit setting there exist several minimax procedures for exploration with policies that are reactive to the data being acquired.
1 code implementation • NeurIPS 2021 • Yash Chandak, Scott Niekum, Bruno Castro da Silva, Erik Learned-Miller, Emma Brunskill, Philip S. Thomas
When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy.
no code implementations • 1 Jan 2021 • Allen Nie, Emma Brunskill, Chris Piech
Contemporary coding education often present students with the task of developing programs that have user interaction and complex dynamic systems, such as mouse based games.
no code implementations • NeurIPS 2020 • Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill
Doing batch RL in a way that yields a reliable new policy in large domains is challenging: a new decision policy may visit states and actions outside the support of the batch data, and function approximation and optimization with limited samples can further increase the potential of learning policies with overly optimistic estimates of their future performance.
no code implementations • 19 Nov 2020 • Jonathan N. Lee, Aldo Pacchiano, Vidya Muthukumar, Weihao Kong, Emma Brunskill
Towards this end, we consider the problem of model selection in RL with function approximation, given a set of candidate RL algorithms with known regret guarantees.
no code implementations • NeurIPS 2020 • Andrea Zanette, Alessandro Lazaric, Mykel J. Kochenderfer, Emma Brunskill
There has been growing progress on theoretical analyses for provably efficient learning in MDPs with linear function approximation, but much of the existing work has made strong assumptions to enable exploration by conventional exploration frameworks.
1 code implementation • 16 Jul 2020 • Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill
Doing batch RL in a way that yields a reliable new policy in large domains is challenging: a new decision policy may visit states and actions outside the support of the batch data, and function approximation and optimization with limited samples can further increase the potential of learning policies with overly optimistic estimates of their future performance.
1 code implementation • 12 Jul 2020 • Evan Zheran Liu, Ramtin Keramati, Sudarshan Seshadri, Kelvin Guu, Panupong Pasupat, Emma Brunskill, Percy Liang
Model-based reinforcement learning (RL) is appealing because (i) it enables planning and thus more strategic exploration, and (ii) by decoupling dynamics from rewards, it enables fast transfer to new reward functions.
1 code implementation • 13 Apr 2020 • Jiayu Yao, Emma Brunskill, Weiwei Pan, Susan Murphy, Finale Doshi-Velez
However, when bandits are deployed in the context of a scientific study -- e. g. a clinical trial to test if a mobile health intervention is effective -- the aim is not only to personalize for an individual, but also to determine, with sufficient statistical power, whether or not the system's intervention is effective.
no code implementations • 2 Apr 2020 • Ramtin Keramati, Emma Brunskill
In such systems there is typically an external human system designer that is creating, monitoring and modifying the interactive adaptive system, trying to improve its performance on the target outcomes.
1 code implementation • NeurIPS 2020 • Hongseok Namkoong, Ramtin Keramati, Steve Yadlowsky, Emma Brunskill
We assess robustness of OPE methods under unobserved confounding by developing worst-case bounds on the performance of an evaluation policy.
no code implementations • ICML 2020 • Andrea Zanette, Alessandro Lazaric, Mykel Kochenderfer, Emma Brunskill
This has two important consequences: 1) it shows that exploration is possible using only \emph{batch assumptions} with an algorithm that achieves the optimal statistical rate for the setting we consider, which is more general than prior work on low-rank MDPs 2) the lack of closedness (measured by the inherent Bellman error) is only amplified by $\sqrt{d_t}$ despite working in the online setting.
no code implementations • ICML 2020 • Omer Gottesman, Joseph Futoma, Yao Liu, Sonali Parbhoo, Leo Anthony Celi, Emma Brunskill, Finale Doshi-Velez
Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education, but safe deployment in high stakes settings requires ways of assessing its validity.
2 code implementations • 31 Jan 2020 • Peter Henderson, Jieru Hu, Joshua Romoff, Emma Brunskill, Dan Jurafsky, Joelle Pineau
Accurate reporting of energy and carbon usage is essential for understanding the potential climate impacts of machine learning research.
no code implementations • 12 Dec 2019 • Weihao Kong, Gregory Valiant, Emma Brunskill
We study the problem of estimating the expected reward of the optimal policy in the stochastic disjoint linear bandit setting.
no code implementations • NeurIPS 2019 • Andrea Zanette, Mykel J. Kochenderfer, Emma Brunskill
This paper focuses on the problem of computing an $\epsilon$-optimal policy in a discounted Markov Decision Process (MDP) provided that we can access the reward and transition function through a generative model.
no code implementations • NeurIPS 2019 • Andrea Zanette, Alessandro Lazaric, Mykel J. Kochenderfer, Emma Brunskill
We prove that if the features at any state can be represented as a convex combination of features at the anchor points, then errors are propagated linearly over iterations (instead of exponentially) and our method achieves a polynomial sample complexity bound in the horizon and the number of anchor points.
1 code implementation • NeurIPS 2019 • Blossom Metevier, Stephen Giguere, Sarah Brockman, Ari Kobren, Yuriy Brun, Emma Brunskill, Philip S. Thomas
We present RobinHood, an offline contextual bandit algorithm designed to satisfy a broad family of fairness constraints.
1 code implementation • 16 Nov 2019 • Scott L. Fleming, Kuhan Jeyapragasan, Tony Duan, Daisy Ding, Saurabh Gombar, Nigam Shah, Emma Brunskill
There is an emerging trend in the reinforcement learning for healthcare literature.
no code implementations • 5 Nov 2019 • Ramtin Keramati, Christoph Dann, Alex Tamkin, Emma Brunskill
While maximizing expected return is the goal in most reinforcement learning approaches, risk-sensitive objectives such as conditional value at risk (CVaR) are more suitable for many high-stakes applications.
no code implementations • ICML 2018 • Andrea Zanette, Emma Brunskill
In order to make good decision under uncertainty an agent must learn from observations.
2 code implementations • 1 Nov 2019 • Andrea Zanette, David Brandfonbrener, Emma Brunskill, Matteo Pirotta, Alessandro Lazaric
We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning (RL).
no code implementations • ICML 2020 • Yao Liu, Pierre-Luc Bacon, Emma Brunskill
Surprisingly, we find that in finite horizon MDPs there is no strict variance reduction of per-decision importance sampling or stationary importance sampling, comparing with vanilla importance sampling.
no code implementations • 18 Jun 2019 • Zhaohan Daniel Guo, Emma Brunskill
Efficient exploration is necessary to achieve good sample efficiency for reinforcement learning in general.
1 code implementation • 23 May 2019 • Xinkun Nie, Emma Brunskill, Stefan Wager
Many applied decision-making problems have a dynamic component: The policymaker needs not only to choose whom to treat, but also when to start which treatment.
no code implementations • 14 May 2019 • Omer Gottesman, Yao Liu, Scott Sussex, Emma Brunskill, Finale Doshi-Velez
We consider a model-based approach to perform batch off-policy evaluation in reinforcement learning.
no code implementations • ICLR 2019 • Evan Zheran Liu, Ramtin Keramati, Sudarshan Seshadri, Kelvin Guu, Panupong Pasupat, Emma Brunskill, Percy Liang
In our approach, a manager maintains an abstract MDP over a subset of the abstract states, which grows monotonically through targeted exploration (possible due to the abstract MDP).
1 code implementation • ICLR 2019 • Karan Goel, Emma Brunskill
Given a dataset of time-series, the goal is to identify the latent sequence of steps common to them and label each time-series with the temporal extent of these procedural steps.
no code implementations • 17 Apr 2019 • Tong Mu, Karan Goel, Emma Brunskill
In many cases an intelligent agent may want to learn how to mimic a single observed demonstrated trajectory.
no code implementations • 17 Apr 2019 • Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill
We study the problem of off-policy policy optimization in Markov decision processes, and develop a novel off-policy policy gradient method.
1 code implementation • 5 Feb 2019 • Joshua Romoff, Peter Henderson, Ahmed Touati, Emma Brunskill, Joelle Pineau, Yann Ollivier
In settings where this bias is unacceptable - where the system must optimize for longer horizons at higher discounts - the target of the value function approximator may increase in variance leading to difficulties in learning.
no code implementations • 1 Jan 2019 • Andrea Zanette, Emma Brunskill
Strong worst-case performance bounds for episodic reinforcement learning exist but fortunately in practice RL algorithms perform much better than such bounds would predict.
no code implementations • 3 Dec 2018 • Peter Henderson, Emma Brunskill
The current flood of information in all areas of machine learning research, from computer vision to reinforcement learning, has made it difficult to make aggregate scientific inferences.
no code implementations • 7 Nov 2018 • Christoph Dann, Lihong Li, Wei Wei, Emma Brunskill
The performance of a reinforcement learning algorithm can vary drastically during learning because of exploration.
no code implementations • 3 Jul 2018 • Aniruddh Raghu, Omer Gottesman, Yao Liu, Matthieu Komorowski, Aldo Faisal, Finale Doshi-Velez, Emma Brunskill
In this work, we consider the problem of estimating a behaviour policy for use in Off-Policy Policy Evaluation (OPE) when the true behaviour policy is unknown.
no code implementations • ICML 2018 • Philip Thomas, Christoph Dann, Emma Brunskill
When creating a machine learning system, we must make two decisions: what representation should be used (i. e., what parameterized function should be used) and what learning rule should be used to search through the resulting set of representable functions.
no code implementations • 1 Jun 2018 • Ramtin Keramati, Jay Whang, Patrick Cho, Emma Brunskill
People seem to build simple models that are easy to learn to support planning and strategic exploration.
no code implementations • 23 May 2018 • Yao Liu, Emma Brunskill
Efficient exploration is one of the key challenges for reinforcement learning (RL) algorithms.
1 code implementation • NeurIPS 2018 • Yao Liu, Omer Gottesman, Aniruddh Raghu, Matthieu Komorowski, Aldo Faisal, Finale Doshi-Velez, Emma Brunskill
We study the problem of off-policy policy evaluation (OPPE) in RL.
no code implementations • NeurIPS 2017 • Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, Emma Brunskill
The option framework integrates temporal abstraction into the reinforcement learning model through the introduction of macro-actions (i. e., options).
no code implementations • 29 Nov 2017 • Thomas Kollar, Stefanie Tellex, Matthew Walter, Albert Huang, Abraham Bachrach, Sachi Hemachandra, Emma Brunskill, Ashis Banerjee, Deb Roy, Seth Teller, Nicholas Roy
Symbolic models capture linguistic structure but have not scaled successfully to handle the diverse language produced by untrained users.
no code implementations • 17 Aug 2017 • Philip S. Thomas, Bruno Castro da Silva, Andrew G. Barto, Emma Brunskill
We propose a new framework for designing machine learning algorithms that simplifies the problem of specifying and regulating undesirable behaviors.
no code implementations • NeurIPS 1999 • Philip S. Thomas, Emma Brunskill
We show how an action-dependent baseline can be used by the policy gradient theorem using function approximation, originally presented with action-independent baselines by (Sutton et al. 2000).
no code implementations • 9 Jun 2017 • Philip S. Thomas, Christoph Dann, Emma Brunskill
When creating an artificial intelligence system, we must make two decisions: what representation should be used (i. e., what parameterized function should be used) and what learning rule should be used to search through the resulting set of representable functions.
1 code implementation • NeurIPS 2017 • Christoph Dann, Tor Lattimore, Emma Brunskill
Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare.
no code implementations • NeurIPS 2017 • Zhaohan Daniel Guo, Philip S. Thomas, Emma Brunskill
In addition, we can take advantage of special cases that arise due to options-based policies to further improve the performance of importance sampling.
no code implementations • 9 Mar 2017 • Zhaohan Daniel Guo, Emma Brunskill
This can result in a much better sample complexity when the in-degree of the necessary features is smaller than the in-degree of all features.
no code implementations • 21 Feb 2017 • Karan Goel, Christoph Dann, Emma Brunskill
Optimal stopping problems consider the question of deciding when to stop an observation-generating process in order to maximize a return.
no code implementations • 10 Nov 2016 • Philip S. Thomas, Emma Brunskill
Importance sampling is often used in machine learning when training and testing data come from different distributions.
no code implementations • 25 May 2016 • Zhaohan Daniel Guo, Shayan Doroudi, Emma Brunskill
Many interesting real world domains involve reinforcement learning (RL) in partially observable environments.
no code implementations • 22 Apr 2016 • Li Zhou, Emma Brunskill
We consider both the benefit of leveraging a set of learned latent user classes for new users, and how we can learn such latent classes from prior users.
3 code implementations • 4 Apr 2016 • Philip S. Thomas, Emma Brunskill
In this paper we present a new way of predicting the performance of a reinforcement learning policy given historical data that may have been generated by a different policy.
no code implementations • NeurIPS 2015 • Christoph Dann, Emma Brunskill
In this paper, we derive an upper PAC bound $\tilde O(\frac{|\mathcal S|^2 |\mathcal A| H^2}{\epsilon^2} \ln\frac 1 \delta)$ and a lower PAC bound $\tilde \Omega(\frac{|\mathcal S| |\mathcal A| H^2}{\epsilon^2} \ln \frac 1 {\delta + c})$ that match up to log-terms and an additional linear dependency on the number of states $|\mathcal S|$.
no code implementations • 10 Jun 2015 • Emma Brunskill, Lihong Li
Transferring knowledge across a sequence of related tasks is an important challenge in reinforcement learning (RL).
no code implementations • 4 Feb 2014 • Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill
In this paper we consider the problem of online stochastic optimization of a locally smooth function under bandit feedback.
no code implementations • 16 Jan 2014 • Ruijie He, Emma Brunskill, Nicholas Roy
We also demonstrate our algorithm being used to control a real robotic helicopter in a target monitoring experiment, which suggests that our approach has practical potential for planning in real-world, large partially observable domains where a multi-step lookahead is required to achieve good performance.
no code implementations • 26 Sep 2013 • Emma Brunskill, Lihong Li
Transferring knowledge across a sequence of reinforcement-learning tasks is challenging, and has a number of important applications.
no code implementations • NeurIPS 2013 • Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill
Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents.
no code implementations • 5 May 2013 • Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill
In some reinforcement learning problems an agent may be provided with a set of input policies, perhaps learned from prior experience or provided by advisors.