Search Results for author: Assaf Hallak

Found 12 papers, 2 papers with code

SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search

no code implementations30 Jan 2023 Gal Dalal, Assaf Hallak, Gugan Thoppe, Shie Mannor, Gal Chechik

We prove that the resulting variance decays exponentially with the planning horizon as a function of the expansion policy.

Policy Gradient Methods

SoftTreeMax: Policy Gradient with Tree Search

no code implementations28 Sep 2022 Gal Dalal, Assaf Hallak, Shie Mannor, Gal Chechik

This allows us to reduce the variance of gradients by three orders of magnitude and to benefit from better sample complexity compared with standard policy gradient.

Policy Gradient Methods

Reinforcement Learning with a Terminator

1 code implementation30 May 2022 Guy Tennenholtz, Nadav Merlis, Lior Shani, Shie Mannor, Uri Shalit, Gal Chechik, Assaf Hallak, Gal Dalal

We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.

Autonomous Driving reinforcement-learning +1

Planning and Learning with Adaptive Lookahead

no code implementations28 Jan 2022 Aviv Rosenberg, Assaf Hallak, Shie Mannor, Gal Chechik, Gal Dalal

Some of the most powerful reinforcement learning frameworks use planning for action selection.

On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning

no code implementations ICLR 2022 Guy Tennenholtz, Assaf Hallak, Gal Dalal, Shie Mannor, Gal Chechik, Uri Shalit

We analyze the limitations of learning from such data with and without external reward, and propose an adjustment of standard imitation learning algorithms to fit this setup.

Imitation Learning Recommendation Systems +2

Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction

1 code implementation NeurIPS 2021 Assaf Hallak, Gal Dalal, Steven Dalton, Iuri Frosio, Shie Mannor, Gal Chechik

We first discover and analyze a counter-intuitive phenomenon: action selection through TS and a pre-trained value function often leads to lower performance compared to the original pre-trained agent, even when having access to the exact state and reward in future steps.

Atari Games

Automatic Representation for Lifetime Value Recommender Systems

no code implementations23 Feb 2017 Assaf Hallak, Yishay Mansour, Elad Yom-Tov

The LTV approach considers the future implications of the item recommendation, and seeks to maximize the cumulative gain over time.

Recommendation Systems Reinforcement Learning (RL)

Consistent On-Line Off-Policy Evaluation

no code implementations ICML 2017 Assaf Hallak, Shie Mannor

The problem of on-line off-policy evaluation (OPE) has been actively studied in the last decade due to its importance both as a stand-alone problem and as a module in a policy improvement scheme.

Off-policy evaluation

Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis

no code implementations17 Sep 2015 Assaf Hallak, Aviv Tamar, Remi Munos, Shie Mannor

We consider the off-policy evaluation problem in Markov decision processes with function approximation.

Off-policy evaluation

Emphatic TD Bellman Operator is a Contraction

no code implementations14 Aug 2015 Assaf Hallak, Aviv Tamar, Shie Mannor

Recently, \citet{SuttonMW15} introduced the emphatic temporal differences (ETD) algorithm for off-policy evaluation in Markov decision processes.

Off-policy evaluation

Off-policy evaluation for MDPs with unknown structure

no code implementations11 Feb 2015 Assaf Hallak, François Schnitzler, Timothy Mann, Shie Mannor

Off-policy learning in dynamic decision problems is essential for providing strong evidence that a new policy is better than the one in use.

Off-policy evaluation

Contextual Markov Decision Processes

no code implementations8 Feb 2015 Assaf Hallak, Dotan Di Castro, Shie Mannor

The objective is to learn a strategy that maximizes the accumulated reward across all contexts.

Cannot find the paper you are looking for? You can Submit a new open access paper.