1 code implementation • 10 Jul 2024 • Cameron Allen, Aaron Kirtland, Ruo Yu Tao, Sam Lobel, Daniel Scott, Nicholas Petrocelli, Omer Gottesman, Ronald Parr, Michael L. Littman, George Konidaris
Our metric, the $\lambda$-discrepancy, is the difference between two distinct temporal difference (TD) value estimates, each computed using TD($\lambda$) with a different value of $\lambda$.
no code implementations • 6 Apr 2023 • Abhishek Sharma, Sonali Parbhoo, Omer Gottesman, Finale Doshi-Velez
We also provide theoretical and empirical evidence, on a variety of simulators and real patient data, that RDF can learn simple yet effective models that can be used to plan personalized policies.
no code implementations • 29 Dec 2022 • Saket Tiwari, Omer Gottesman, George Konidaris
Central to our work is the idea that the transition dynamics induce a low dimensional manifold of reachable states embedded in the high-dimensional nominal state space.
no code implementations • 30 Jul 2022 • Kelly W. Zhang, Omer Gottesman, Finale Doshi-Velez
In the reinforcement learning literature, there are many algorithms developed for either Contextual Bandit (CB) or Markov Decision Processes (MDP) environments.
1 code implementation • 10 Dec 2021 • Kavosh Asadi, Rasool Fakoor, Omer Gottesman, Taesup Kim, Michael L. Littman, Alexander J. Smola
In this paper we endow two popular deep reinforcement learning algorithms, namely DQN and Rainbow, with updates that incentivize the online network to remain in the proximity of the target network.
no code implementations • 28 Nov 2021 • Ramtin Keramati, Omer Gottesman, Leo Anthony Celi, Finale Doshi-Velez, Emma Brunskill
Off-policy policy evaluation methods for sequential decision making can be used to help identify if a proposed decision policy is better than a current baseline policy.
no code implementations • 23 Oct 2021 • Omer Gottesman, Kavosh Asadi, Cameron Allen, Sam Lobel, George Konidaris, Michael Littman
We propose a new coarse-grained smoothness definition that generalizes the notion of Lipschitz continuity, is more widely applicable, and allows us to compute significantly tighter bounds on Q-functions, leading to improved learning.
1 code implementation • 13 Sep 2021 • Simon P. Shen, Yecheng Jason Ma, Omer Gottesman, Finale Doshi-Velez
Importance sampling-based estimators for off-policy evaluation (OPE) are valued for their simplicity, unbiasedness, and reliance on relatively few assumptions.
1 code implementation • NeurIPS 2021 • Cameron Allen, Neev Parikh, Omer Gottesman, George Konidaris
A fundamental assumption of reinforcement learning in Markov decision processes (MDPs) is that the relevant decision process is, in fact, Markov.
1 code implementation • NeurIPS 2020 • Samuel Håkansson, Viktor Lindblom, Omer Gottesman, Fredrik D. Johansson
Finding an effective medical treatment often requires a search by trial and error.
no code implementations • ICML 2020 • Omer Gottesman, Joseph Futoma, Yao Liu, Sonali Parbhoo, Leo Anthony Celi, Emma Brunskill, Finale Doshi-Velez
Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education, but safe deployment in high stakes settings requires ways of assessing its validity.
no code implementations • 24 May 2019 • Omer Gottesman, Weiwei Pan, Finale Doshi-Velez
Tensor decomposition methods allow us to learn the parameters of latent variable models through decomposition of low-order moments of data.
no code implementations • 14 May 2019 • Omer Gottesman, Yao Liu, Scott Sussex, Emma Brunskill, Finale Doshi-Velez
We consider a model-based approach to perform batch off-policy evaluation in reinforcement learning.
no code implementations • 15 Jan 2019 • Xuefeng Peng, Yi Ding, David Wihl, Omer Gottesman, Matthieu Komorowski, Li-wei H. Lehman, Andrew Ross, Aldo Faisal, Finale Doshi-Velez
On a large retrospective cohort, this mixture-based approach outperforms physician, kernel only, and DRL-only experts.
no code implementations • 3 Jul 2018 • Aniruddh Raghu, Omer Gottesman, Yao Liu, Matthieu Komorowski, Aldo Faisal, Finale Doshi-Velez, Emma Brunskill
In this work, we consider the problem of estimating a behaviour policy for use in Off-Policy Policy Evaluation (OPE) when the true behaviour policy is unknown.
no code implementations • 31 May 2018 • Omer Gottesman, Fredrik Johansson, Joshua Meier, Jack Dent, Dong-hun Lee, Srivatsan Srinivasan, Linying Zhang, Yi Ding, David Wihl, Xuefeng Peng, Jiayu Yao, Isaac Lage, Christopher Mosch, Li-wei H. Lehman, Matthieu Komorowski, Aldo Faisal, Leo Anthony Celi, David Sontag, Finale Doshi-Velez
Much attention has been devoted recently to the development of machine learning algorithms with the goal of improving treatment policies in healthcare.
1 code implementation • NeurIPS 2018 • Yao Liu, Omer Gottesman, Aniruddh Raghu, Matthieu Komorowski, Aldo Faisal, Finale Doshi-Velez, Emma Brunskill
We study the problem of off-policy policy evaluation (OPPE) in RL.
no code implementations • 18 Oct 2017 • Omer Gottesman, Weiwei Pan, Finale Doshi-Velez
Tensor decomposition methods are popular tools for learning latent variables given only lower-order moments of the data.