Search Results for author: Michael Gimelfarb

Found 10 papers, 3 papers with code

Constraint-Generation Policy Optimization (CGPO): Nonlinear Programming for Policy Optimization in Mixed Discrete-Continuous MDPs

no code implementations20 Jan 2024 Michael Gimelfarb, Ayal Taitler, Scott Sanner

To achieve such results, CGPO proposes a bi-level mixed-integer nonlinear optimization framework for optimizing policies within defined expressivity classes (i. e. piecewise (non)-linear) and reduces it to an optimal constraint generation methodology that adversarially generates worst-case state trajectories.

counterfactual

Thompson Sampling for Parameterized Markov Decision Processes with Uninformative Actions

no code implementations13 May 2023 Michael Gimelfarb, Michael Jong Kim

We study parameterized MDPs (PMDPs) in which the key parameters of interest are unknown and must be learned using Bayesian inference.

Bayesian Inference Thompson Sampling

pyRDDLGym: From RDDL to Gym Environments

2 code implementations11 Nov 2022 Ayal Taitler, Michael Gimelfarb, Jihwan Jeong, Sriram Gopalakrishnan, Martin Mladenov, Xiaotian Liu, Scott Sanner

We present pyRDDLGym, a Python framework for auto-generation of OpenAI Gym environments from RDDL declerative description.

OpenAI Gym

Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization

1 code implementation7 Oct 2022 Jihwan Jeong, Xiaoyu Wang, Michael Gimelfarb, Hyunwoo Kim, Baher Abdulhai, Scott Sanner

Offline reinforcement learning (RL) addresses the problem of learning a performant policy from a fixed batch of data collected by following some behavior policy.

Continuous Control D4RL +1

RAPTOR: End-to-end Risk-Aware MDP Planning and Policy Learning by Backpropagation

no code implementations14 Jun 2021 Noah Patton, Jihwan Jeong, Michael Gimelfarb, Scott Sanner

The direct optimization of this empirical objective in an end-to-end manner is called the risk-averse straight-line plan, which commits to a sequence of actions in advance and can be sub-optimal in highly stochastic domains.

Risk-Aware Transfer in Reinforcement Learning using Successor Features

no code implementations NeurIPS 2021 Michael Gimelfarb, André Barreto, Scott Sanner, Chi-Guhn Lee

Sample efficiency and risk-awareness are central to the development of practical reinforcement learning (RL) for complex decision-making.

Decision Making reinforcement-learning +2

ε-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning

1 code implementation2 Jul 2020 Michael Gimelfarb, Scott Sanner, Chi-Guhn Lee

Resolving the exploration-exploitation trade-off remains a fundamental problem in the design and implementation of reinforcement learning (RL) algorithms.

Reinforcement Learning (RL)

Bayesian Experience Reuse for Learning from Multiple Demonstrators

no code implementations10 Jun 2020 Michael Gimelfarb, Scott Sanner, Chi-Guhn Lee

We demonstrate the effectiveness of this approach for static optimization of smooth functions, and transfer learning in a high-dimensional supply chain problem with cost uncertainty.

Transfer Learning

Contextual Policy Transfer in Reinforcement Learning Domains via Deep Mixtures-of-Experts

no code implementations29 Feb 2020 Michael Gimelfarb, Scott Sanner, Chi-Guhn Lee

In this paper, we assume knowledge of estimated source task dynamics and policies, and common sub-goals but different dynamics.

OpenAI Gym Q-Learning +2

Cannot find the paper you are looking for? You can Submit a new open access paper.