no code implementations • 20 Jan 2024 • Michael Gimelfarb, Ayal Taitler, Scott Sanner
To achieve such results, CGPO proposes a bi-level mixed-integer nonlinear optimization framework for optimizing policies within defined expressivity classes (i. e. piecewise (non)-linear) and reduces it to an optimal constraint generation methodology that adversarially generates worst-case state trajectories.
no code implementations • 13 May 2023 • Michael Gimelfarb, Michael Jong Kim
We study parameterized MDPs (PMDPs) in which the key parameters of interest are unknown and must be learned using Bayesian inference.
2 code implementations • 11 Nov 2022 • Ayal Taitler, Michael Gimelfarb, Jihwan Jeong, Sriram Gopalakrishnan, Martin Mladenov, Xiaotian Liu, Scott Sanner
We present pyRDDLGym, a Python framework for auto-generation of OpenAI Gym environments from RDDL declerative description.
1 code implementation • 7 Oct 2022 • Jihwan Jeong, Xiaoyu Wang, Michael Gimelfarb, Hyunwoo Kim, Baher Abdulhai, Scott Sanner
Offline reinforcement learning (RL) addresses the problem of learning a performant policy from a fixed batch of data collected by following some behavior policy.
no code implementations • 14 Jun 2021 • Noah Patton, Jihwan Jeong, Michael Gimelfarb, Scott Sanner
The direct optimization of this empirical objective in an end-to-end manner is called the risk-averse straight-line plan, which commits to a sequence of actions in advance and can be sub-optimal in highly stochastic domains.
no code implementations • NeurIPS 2021 • Michael Gimelfarb, André Barreto, Scott Sanner, Chi-Guhn Lee
Sample efficiency and risk-awareness are central to the development of practical reinforcement learning (RL) for complex decision-making.
1 code implementation • 2 Jul 2020 • Michael Gimelfarb, Scott Sanner, Chi-Guhn Lee
Resolving the exploration-exploitation trade-off remains a fundamental problem in the design and implementation of reinforcement learning (RL) algorithms.
no code implementations • 10 Jun 2020 • Michael Gimelfarb, Scott Sanner, Chi-Guhn Lee
We demonstrate the effectiveness of this approach for static optimization of smooth functions, and transfer learning in a high-dimensional supply chain problem with cost uncertainty.
no code implementations • 29 Feb 2020 • Michael Gimelfarb, Scott Sanner, Chi-Guhn Lee
In this paper, we assume knowledge of estimated source task dynamics and policies, and common sub-goals but different dynamics.
no code implementations • NeurIPS 2018 • Michael Gimelfarb, Scott Sanner, Chi-Guhn Lee
Potential based reward shaping is a powerful technique for accelerating convergence of reinforcement learning algorithms.