Search Results for author: Gal Dalal

Found 24 papers, 5 papers with code

Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization

no code implementations15 Feb 2024 Yihan Du, Anna Winnicki, Gal Dalal, Shie Mannor, R. Srikant

In PO-RLHF, knowledge of the reward function is not assumed and the algorithm relies on trajectory-based comparison feedback to infer the reward function.

SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search

no code implementations30 Jan 2023 Gal Dalal, Assaf Hallak, Gugan Thoppe, Shie Mannor, Gal Chechik

We prove that the resulting variance decays exponentially with the planning horizon as a function of the expansion policy.

Policy Gradient Methods

SoftTreeMax: Policy Gradient with Tree Search

no code implementations28 Sep 2022 Gal Dalal, Assaf Hallak, Shie Mannor, Gal Chechik

This allows us to reduce the variance of gradients by three orders of magnitude and to benefit from better sample complexity compared with standard policy gradient.

Policy Gradient Methods

Reinforcement Learning with a Terminator

1 code implementation30 May 2022 Guy Tennenholtz, Nadav Merlis, Lior Shani, Shie Mannor, Uri Shalit, Gal Chechik, Assaf Hallak, Gal Dalal

We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.

Autonomous Driving reinforcement-learning +1

Planning and Learning with Adaptive Lookahead

no code implementations28 Jan 2022 Aviv Rosenberg, Assaf Hallak, Shie Mannor, Gal Chechik, Gal Dalal

Some of the most powerful reinforcement learning frameworks use planning for action selection.

On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning

no code implementations ICLR 2022 Guy Tennenholtz, Assaf Hallak, Gal Dalal, Shie Mannor, Gal Chechik, Uri Shalit

We analyze the limitations of learning from such data with and without external reward, and propose an adjustment of standard imitation learning algorithms to fit this setup.

Imitation Learning Recommendation Systems +2

Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction

1 code implementation NeurIPS 2021 Assaf Hallak, Gal Dalal, Steven Dalton, Iuri Frosio, Shie Mannor, Gal Chechik

We first discover and analyze a counter-intuitive phenomenon: action selection through TS and a pre-trained value function often leads to lower performance compared to the original pre-trained agent, even when having access to the exact state and reward in future steps.

Atari Games

Acting in Delayed Environments with Non-Stationary Markov Policies

2 code implementations ICLR 2021 Esther Derman, Gal Dalal, Shie Mannor

We introduce a framework for learning and planning in MDPs where the decision-maker commits actions that are executed with a delay of $m$ steps.

Cloud Computing Q-Learning

The Architectural Implications of Distributed Reinforcement Learning on CPU-GPU Systems

no code implementations8 Dec 2020 Ahmet Inci, Evgeny Bolotin, Yaosheng Fu, Gal Dalal, Shie Mannor, David Nellans, Diana Marculescu

With deep reinforcement learning (RL) methods achieving results that exceed human capabilities in games, robotics, and simulated environments, continued scaling of RL training is crucial to its deployment in solving complex real-world problems.

reinforcement-learning Reinforcement Learning (RL)

A Tale of Two-Timescale Reinforcement Learning with the Tightest Finite-Time Bound

no code implementations20 Nov 2019 Gal Dalal, Balazs Szorenyi, Gugan Thoppe

Algorithms such as these have two iterates, $\theta_n$ and $w_n,$ which are updated using two distinct stepsize sequences, $\alpha_n$ and $\beta_n,$ respectively.

reinforcement-learning Reinforcement Learning (RL)

Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning

no code implementations NeurIPS 2018 Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control.

Model Predictive Control reinforcement-learning +1

How to Combine Tree-Search Methods in Reinforcement Learning

no code implementations6 Sep 2018 Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Finite-horizon lookahead policies are abundantly used in Reinforcement Learning and demonstrate impressive empirical success.

reinforcement-learning Reinforcement Learning (RL)

Multiple-Step Greedy Policies in Online and Approximate Reinforcement Learning

no code implementations21 May 2018 Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control.

Model Predictive Control reinforcement-learning +1

Safe Exploration in Continuous Action Spaces

6 code implementations26 Jan 2018 Gal Dalal, Krishnamurthy Dvijotham, Matej Vecerik, Todd Hester, Cosmin Paduraru, Yuval Tassa

We address the problem of deploying a reinforcement learning (RL) agent on a physical system such as a datacenter cooling unit or robot, where critical constraints must never be violated.

Reinforcement Learning (RL) Safe Exploration

Supervised Learning for Optimal Power Flow as a Real-Time Proxy

no code implementations20 Dec 2016 Raphael Canyasse, Gal Dalal, Shie Mannor

In this work we design and compare different supervised learning algorithms to compute the cost of Alternating Current Optimal Power Flow (ACOPF).

Unit Commitment using Nearest Neighbor as a Short-Term Proxy

no code implementations30 Nov 2016 Gal Dalal, Elad Gilboa, Shie Mannor, Louis Wehenkel

We devise the Unit Commitment Nearest Neighbor (UCNN) algorithm to be used as a proxy for quickly approximating outcomes of short-term decisions, to make tractable hierarchical long-term assessment and planning for large power systems.

Hierarchical Decision Making In Electricity Grid Management

no code implementations6 Mar 2016 Gal Dalal, Elad Gilboa, Shie Mannor

The power grid is a complex and vital system that necessitates careful reliability management.

Decision Making Management +1

Reinforcement Learning for the Unit Commitment Problem

no code implementations19 Jul 2015 Gal Dalal, Shie Mannor

In this work we solve the day-ahead unit commitment (UC) problem, by formulating it as a Markov decision process (MDP) and finding a low-cost policy for generation scheduling.

reinforcement-learning Reinforcement Learning (RL) +1

Cannot find the paper you are looking for? You can Submit a new open access paper.