Search Results for author: Anna Winnicki

Found 7 papers, 0 papers with code

Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization

no code implementations15 Feb 2024 Yihan Du, Anna Winnicki, Gal Dalal, Shie Mannor, R. Srikant

In PO-RLHF, knowledge of the reward function is not assumed and the algorithm relies on trajectory-based comparison feedback to infer the reward function.

A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games

no code implementations17 Mar 2023 Anna Winnicki, R. Srikant

We further show that lookahead can be implemented efficiently in the function approximation setting of linear Markov games, which are the counterpart of the much-studied linear MDPs.

Model-based Reinforcement Learning Multi-agent Reinforcement Learning +2

On The Convergence Of Policy Iteration-Based Reinforcement Learning With Monte Carlo Policy Evaluation

no code implementations23 Jan 2023 Anna Winnicki, R. Srikant

A common technique in reinforcement learning is to evaluate the value function from Monte Carlo simulations of a given policy, and use the estimated value function to obtain a new policy which is greedy with respect to the estimated value function.

Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation

no code implementations13 Oct 2022 Anna Winnicki, R. Srikant

We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are useful for very large MDPs, including lookahead, function approximation, and gradient descent.

reinforcement-learning Reinforcement Learning (RL)

The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation

no code implementations28 Sep 2021 Anna Winnicki, Joseph Lubars, Michael Livesay, R. Srikant

Therefore, techniques such as lookahead for policy improvement and m-step rollout for policy evaluation are used in practice to improve the performance of approximate dynamic programming with function approximation.

Optimistic Policy Iteration for MDPs with Acyclic Transient State Structure

no code implementations29 Jan 2021 Joseph Lubars, Anna Winnicki, Michael Livesay, R. Srikant

We consider Markov Decision Processes (MDPs) in which every stationary policy induces the same graph structure for the underlying Markov chain and further, the graph has the following property: if we replace each recurrent class by a node, then the resulting graph is acyclic.

Pricing Economic Dispatch with AC Power Flow via Local Multipliers and Conic Relaxation

no code implementations23 Oct 2019 Mariola Ndrio, Anna Winnicki, Subhonmesh Bose

We analyze pricing mechanisms in electricity markets with AC power flow equations that define a nonconvex feasible set for the economic dispatch problem.

Cannot find the paper you are looking for? You can Submit a new open access paper.