Search Results for author: Nadav Merlis

Found 16 papers, 4 papers with code

The Value of Reward Lookahead in Reinforcement Learning

no code implementations18 Mar 2024 Nadav Merlis, Dorian Baudry, Vianney Perchet

In particular, we measure the ratio between the value of standard RL agents and that of agents with partial future-reward lookahead.

Offline RL reinforcement-learning +1

Ranking with Popularity Bias: User Welfare under Self-Amplification Dynamics

no code implementations24 May 2023 Guy Tennenholtz, Martin Mladenov, Nadav Merlis, Robert L. Axtell, Craig Boutilier

We highlight the importance of exploration, not to eliminate popularity bias, but to mitigate its negative impact on welfare.

Reinforcement Learning with History-Dependent Dynamic Contexts

no code implementations4 Feb 2023 Guy Tennenholtz, Nadav Merlis, Lior Shani, Martin Mladenov, Craig Boutilier

We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time.

reinforcement-learning Reinforcement Learning (RL)

On Preemption and Learning in Stochastic Scheduling

1 code implementation31 May 2022 Nadav Merlis, Hugo Richard, Flore Sentenac, Corentin Odic, Mathieu Molina, Vianney Perchet

We study single-machine scheduling of jobs, each belonging to a job type that determines its duration distribution.

Efficient Exploration Scheduling

Reinforcement Learning with a Terminator

1 code implementation30 May 2022 Guy Tennenholtz, Nadav Merlis, Lior Shani, Shie Mannor, Uri Shalit, Gal Chechik, Assaf Hallak, Gal Dalal

We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.

Autonomous Driving reinforcement-learning +1

Query-Reward Tradeoffs in Multi-Armed Bandits

no code implementations12 Oct 2021 Nadav Merlis, Yonathan Efroni, Shie Mannor

We consider a stochastic multi-armed bandit setting where reward must be actively queried for it to be observed.

Multi-Armed Bandits

Ensemble Bootstrapping for Q-Learning

no code implementations28 Feb 2021 Oren Peer, Chen Tessler, Nadav Merlis, Ron Meir

Finally, We demonstrate the superior performance of a deep RL variant of EBQL over other deep QL algorithms for a suite of ATARI games.

Atari Games Q-Learning

Confidence-Budget Matching for Sequential Budgeted Learning

no code implementations5 Feb 2021 Yonathan Efroni, Nadav Merlis, Aadirupa Saha, Shie Mannor

We analyze the performance of CBM based algorithms in different settings and show that they perform well in the presence of adversity in the contexts, initial states, and budgets.

Decision Making Decision Making Under Uncertainty +2

Reinforcement Learning with Trajectory Feedback

no code implementations13 Aug 2020 Yonathan Efroni, Nadav Merlis, Shie Mannor

The standard feedback model of reinforcement learning requires revealing the reward of every visited state-action pair.

reinforcement-learning Reinforcement Learning (RL) +1

Lenient Regret for Multi-Armed Bandits

1 code implementation10 Aug 2020 Nadav Merlis, Shie Mannor

Importantly, we show that when the mean of the optimal arm is high enough, the lenient regret of $\epsilon$-TS is bounded by a constant.

Multi-Armed Bandits Thompson Sampling

Tight Lower Bounds for Combinatorial Multi-Armed Bandits

no code implementations13 Feb 2020 Nadav Merlis, Shie Mannor

The Combinatorial Multi-Armed Bandit problem is a sequential decision-making problem in which an agent selects a set of arms on each round, observes feedback for each of these arms and aims to maximize a known reward function of the arms it chose.

Decision Making Multi-Armed Bandits

Never Worse, Mostly Better: Stable Policy Improvement in Deep Reinforcement Learning

no code implementations2 Oct 2019 Pranav Khanna, Guy Tennenholtz, Nadav Merlis, Shie Mannor, Chen Tessler

In recent years, there has been significant progress in applying deep reinforcement learning (RL) for solving challenging problems across a wide variety of domains.

Continuous Control reinforcement-learning +1

Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients

no code implementations25 Sep 2019 Chen Tessler, Nadav Merlis, Shie Mannor

In recent years, advances in deep learning have enabled the application of reinforcement learning algorithms in complex domains.

reinforcement-learning Reinforcement Learning (RL)

Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies

1 code implementation NeurIPS 2019 Yonathan Efroni, Nadav Merlis, Mohammad Ghavamzadeh, Shie Mannor

In this paper, we focus on model-based RL in the finite-state finite-horizon MDP setting and establish that exploring with \emph{greedy policies} -- act by \emph{1-step planning} -- can achieve tight minimax performance in terms of regret, $\tilde{\mathcal{O}}(\sqrt{HSAT})$.

Model-based Reinforcement Learning reinforcement-learning +1

Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem

no code implementations8 May 2019 Nadav Merlis, Shie Mannor

We show that a linear dependence of the regret in the batch size in existing algorithms can be replaced by this smoothness parameter.

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

no code implementations NeurIPS 2018 Tom Zahavy, Matan Haroush, Nadav Merlis, Daniel J. Mankowitz, Shie Mannor

Learning how to act when there are many available actions in each state is a challenging task for Reinforcement Learning (RL) agents, especially when many of the actions are redundant or irrelevant.

reinforcement-learning Reinforcement Learning (RL) +1

Cannot find the paper you are looking for? You can Submit a new open access paper.