no code implementations • ICLR 2021 • Brandon Cui, Yin-Lam Chow, Mohammad Ghavamzadeh
We first formulate a LCE model to learn representations that are suitable to be used by a policy iteration style algorithm in the latent space.
Model-based Reinforcement Learning reinforcement-learning +2
no code implementations • NeurIPS 2020 • Joey Hong, Branislav Kveton, Manzil Zaheer, Yin-Lam Chow, Amr Ahmed, Craig Boutilier
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state.
no code implementations • 15 Jun 2020 • Joey Hong, Branislav Kveton, Manzil Zaheer, Yin-Lam Chow, Amr Ahmed
This approach is practical and analyzable, and we provide guarantees on both the quality of off-policy optimization and the regret during online deployment.
no code implementations • 9 Jun 2020 • Yin-Lam Chow, Brandon Cui, MoonKyung Ryu, Mohammad Ghavamzadeh
Model-based reinforcement learning (RL) algorithms allow us to combine model-generated data with those collected from interaction with the real system in order to alleviate the data efficiency problem in RL.
1 code implementation • ICML 2020 • Rui Shu, Tung Nguyen, Yin-Lam Chow, Tuan Pham, Khoat Than, Mohammad Ghavamzadeh, Stefano Ermon, Hung H. Bui
High-dimensional observations and unknown dynamics are major challenges when applying optimal control to many real-world decision making tasks.
no code implementations • 8 Feb 2020 • Sungryull Sohn, Yin-Lam Chow, Jayden Ooi, Ofir Nachum, Honglak Lee, Ed Chi, Craig Boutilier
In batch reinforcement learning (RL), one often constrains a learned policy to be close to the behavior (data-generating) policy, e. g., by constraining the learned action distribution to differ from the behavior policy by some maximum degree that is the same at each state.
no code implementations • 4 Dec 2019 • Ofir Nachum, Bo Dai, Ilya Kostrikov, Yin-Lam Chow, Lihong Li, Dale Schuurmans
In many real-world applications of reinforcement learning (RL), interactions with the environment are limited due to cost or feasibility.
no code implementations • ICLR 2020 • Moonkyung Ryu, Yin-Lam Chow, Ross Anderson, Christian Tjandraatmadja, Craig Boutilier
Value-based reinforcement learning (RL) methods like Q-learning have shown success in a variety of domains.
1 code implementation • ICLR 2020 • Nir Levine, Yin-Lam Chow, Rui Shu, Ang Li, Mohammad Ghavamzadeh, Hung Bui
A promising approach is to embed the high-dimensional observations into a lower-dimensional latent representation space, estimate the latent dynamics model, then utilize this model for control in the latent space.
2 code implementations • NeurIPS 2019 • Ofir Nachum, Yin-Lam Chow, Bo Dai, Lihong Li
In contrast to previous approaches, our algorithm is agnostic to knowledge of the behavior policy (or policies) used to generate the dataset.
1 code implementation • 28 Jan 2019 • Yin-Lam Chow, Ofir Nachum, Aleksandra Faust, Edgar Duenez-Guzman, Mohammad Ghavamzadeh
We formulate these problems as constrained Markov decision processes (CMDPs) and present safe policy optimization algorithms that are based on a Lyapunov approach to solve them.
no code implementations • NeurIPS 2018 • Bo Liu, Tengyang Xie, Yangyang Xu, Mohammad Ghavamzadeh, Yin-Lam Chow, Daoming Lyu, Daesub Yoon
Risk management in dynamic decision problems is a primary concern in many fields, including financial investment, autonomous driving, and healthcare.
no code implementations • 13 Aug 2018 • Jonathan Lacotte, Mohammad Ghavamzadeh, Yin-Lam Chow, Marco Pavone
We then derive two different versions of our RS-GAIL optimization problem that aim at matching the risk profiles of the agent and the expert w. r. t.
1 code implementation • NeurIPS 2018 • Yin-Lam Chow, Ofir Nachum, Edgar Duenez-Guzman, Mohammad Ghavamzadeh
In many real-world reinforcement learning (RL) problems, besides optimizing the main objective function, an agent must concurrently avoid violating a number of constraints.
no code implementations • ICML 2018 • Mehrdad Farajtabar, Yin-Lam Chow, Mohammad Ghavamzadeh
In particular, we focus on the doubly robust (DR) estimators that consist of an importance sampling (IS) component and a performance model, and utilize the low (or zero) bias of IS and low variance of the model at the same time.
no code implementations • ICML 2018 • Ofir Nachum, Yin-Lam Chow, Mohammad Ghavamzadeh
In this paper, we follow the work of Nachum et al. (2017) in the soft ERL setting, and propose a class of novel path consistency learning (PCL) algorithms, called {\em sparse PCL}, for the sparse ERL problem that can work with both on-policy and off-policy data.
no code implementations • ICLR 2018 • Aviv Tamar, Khashayar Rohanimanesh, Yin-Lam Chow, Chris Vigorito, Ben Goodrich, Michael Kahane, Derik Pridmore
In this paper we present an LfD approach for learning multiple modes of behavior from visual data.
no code implementations • NeurIPS 2016 • Marek Petrik, Yin-Lam Chow, Mohammad Ghavamzadeh
We show that our formulation is NP-hard and propose an approximate algorithm.
no code implementations • 5 Dec 2015 • Yin-Lam Chow, Mohammad Ghavamzadeh, Lucas Janson, Marco Pavone
In many sequential decision-making problems one is interested in minimizing an expected cumulative cost while taking into account \emph{risk}, i. e., increased awareness of events of small probability and high consequences.
no code implementations • 29 Sep 2015 • Yin-Lam Chow, Jia Yuan Yu, Marco Pavone
We consider one-way vehicle sharing systems where customers can rent a car at one station and drop it off at another.
no code implementations • NeurIPS 2015 • Yin-Lam Chow, Aviv Tamar, Shie Mannor, Marco Pavone
Our first contribution is to show that a CVaR objective, besides capturing risk sensitivity, has an alternative interpretation as expected cost under worst-case modeling errors, for a given error budget.
no code implementations • NeurIPS 2015 • Aviv Tamar, Yin-Lam Chow, Mohammad Ghavamzadeh, Shie Mannor
For static risk measures, our approach is in the spirit of policy gradient algorithms and combines a standard sampling approach with convex programming.
no code implementations • 12 Feb 2015 • Jiyan Yang, Yin-Lam Chow, Christopher Ré, Michael W. Mahoney
We aim to bridge the gap between these two methods in solving constrained overdetermined linear regression problems---e. g., $\ell_2$ and $\ell_1$ regression problems.
no code implementations • NeurIPS 2014 • Yin-Lam Chow, Mohammad Ghavamzadeh
In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in costs in addition to minimizing a standard criterion.