Search Results for author: Michael Gimelfarb

Found 10 papers, 3 papers with code

Constraint-Generation Policy Optimization (CGPO): Nonlinear Programming for Policy Optimization in Mixed Discrete-Continuous MDPs

no code implementations • 20 Jan 2024 • Michael Gimelfarb, Ayal Taitler, Scott Sanner

To achieve such results, CGPO proposes a bi-level mixed-integer nonlinear optimization framework for optimizing policies within defined expressivity classes (i. e. piecewise (non)-linear) and reduces it to an optimal constraint generation methodology that adversarially generates worst-case state trajectories.

counterfactual

Paper
Add Code

Thompson Sampling for Parameterized Markov Decision Processes with Uninformative Actions

no code implementations • 13 May 2023 • Michael Gimelfarb, Michael Jong Kim

We study parameterized MDPs (PMDPs) in which the key parameters of interest are unknown and must be learned using Bayesian inference.

Bayesian Inference Thompson Sampling

Paper
Add Code

pyRDDLGym: From RDDL to Gym Environments

2 code implementations • 11 Nov 2022 • Ayal Taitler, Michael Gimelfarb, Jihwan Jeong, Sriram Gopalakrishnan, Martin Mladenov, Xiaotian Liu, Scott Sanner

We present pyRDDLGym, a Python framework for auto-generation of OpenAI Gym environments from RDDL declerative description.

OpenAI Gym

Paper
Code

Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization

1 code implementation • 7 Oct 2022 • Jihwan Jeong, Xiaoyu Wang, Michael Gimelfarb, Hyunwoo Kim, Baher Abdulhai, Scott Sanner

Offline reinforcement learning (RL) addresses the problem of learning a performant policy from a fixed batch of data collected by following some behavior policy.

Continuous Control D4RL +1

Paper
Code

RAPTOR: End-to-end Risk-Aware MDP Planning and Policy Learning by Backpropagation

no code implementations • 14 Jun 2021 • Noah Patton, Jihwan Jeong, Michael Gimelfarb, Scott Sanner

The direct optimization of this empirical objective in an end-to-end manner is called the risk-averse straight-line plan, which commits to a sequence of actions in advance and can be sub-optimal in highly stochastic domains.

Paper
Add Code

Risk-Aware Transfer in Reinforcement Learning using Successor Features

no code implementations • NeurIPS 2021 • Michael Gimelfarb, André Barreto, Scott Sanner, Chi-Guhn Lee

Sample efficiency and risk-awareness are central to the development of practical reinforcement learning (RL) for complex decision-making.

Decision Making reinforcement-learning +2

Paper
Add Code

ε-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning

1 code implementation • 2 Jul 2020 • Michael Gimelfarb, Scott Sanner, Chi-Guhn Lee

Resolving the exploration-exploitation trade-off remains a fundamental problem in the design and implementation of reinforcement learning (RL) algorithms.

Reinforcement Learning (RL)

Paper
Code

Bayesian Experience Reuse for Learning from Multiple Demonstrators

no code implementations • 10 Jun 2020 • Michael Gimelfarb, Scott Sanner, Chi-Guhn Lee

We demonstrate the effectiveness of this approach for static optimization of smooth functions, and transfer learning in a high-dimensional supply chain problem with cost uncertainty.

Transfer Learning

Paper
Add Code

Contextual Policy Transfer in Reinforcement Learning Domains via Deep Mixtures-of-Experts

no code implementations • 29 Feb 2020 • Michael Gimelfarb, Scott Sanner, Chi-Guhn Lee

In this paper, we assume knowledge of estimated source task dynamics and policies, and common sub-goals but different dynamics.

OpenAI Gym Q-Learning +2

Paper
Add Code

Reinforcement Learning with Multiple Experts: A Bayesian Model Combination Approach

no code implementations • NeurIPS 2018 • Michael Gimelfarb, Scott Sanner, Chi-Guhn Lee

Potential based reward shaping is a powerful technique for accelerating convergence of reinforcement learning algorithms.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.