no code implementations • 5 Nov 2024 • Shiyun Lin, Simon Mauras, Nadav Merlis, Vianney Perchet
We aim to guarantee each worker the largest possible share from the utility in her best possible stable matching.
1 code implementation • 17 Jun 2024 • Matilde Tullii, Solenne Gaucher, Nadav Merlis, Vianney Perchet
For this model, our algorithm obtains a regret $\tilde{\mathcal{O}}(T^{d+2\beta/d+3\beta})$, where $d$ is the dimension of the context space.
no code implementations • 4 Jun 2024 • Nadav Merlis
We study reinforcement learning (RL) problems in which agents observe the reward or transition realizations at their current state before deciding which action to take.
1 code implementation • 26 May 2024 • Itai Shufaro, Nadav Merlis, Nir Weinberger, Shie Mannor
We study the trade-off between the information an agent accumulates and the regret it suffers.
no code implementations • 18 Mar 2024 • Nadav Merlis, Dorian Baudry, Vianney Perchet
In particular, we measure the ratio between the value of standard RL agents and that of agents with partial future-reward lookahead.
no code implementations • 24 May 2023 • Guy Tennenholtz, Martin Mladenov, Nadav Merlis, Robert L. Axtell, Craig Boutilier
We highlight the importance of exploration, not to eliminate popularity bias, but to mitigate its negative impact on welfare.
no code implementations • 4 Feb 2023 • Guy Tennenholtz, Nadav Merlis, Lior Shani, Martin Mladenov, Craig Boutilier
We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time.
1 code implementation • 31 May 2022 • Nadav Merlis, Hugo Richard, Flore Sentenac, Corentin Odic, Mathieu Molina, Vianney Perchet
We study single-machine scheduling of jobs, each belonging to a job type that determines its duration distribution.
1 code implementation • 30 May 2022 • Guy Tennenholtz, Nadav Merlis, Lior Shani, Shie Mannor, Uri Shalit, Gal Chechik, Assaf Hallak, Gal Dalal
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
no code implementations • 12 Oct 2021 • Nadav Merlis, Yonathan Efroni, Shie Mannor
We consider a stochastic multi-armed bandit setting where reward must be actively queried for it to be observed.
no code implementations • 28 Feb 2021 • Oren Peer, Chen Tessler, Nadav Merlis, Ron Meir
Finally, We demonstrate the superior performance of a deep RL variant of EBQL over other deep QL algorithms for a suite of ATARI games.
no code implementations • 5 Feb 2021 • Yonathan Efroni, Nadav Merlis, Aadirupa Saha, Shie Mannor
We analyze the performance of CBM based algorithms in different settings and show that they perform well in the presence of adversity in the contexts, initial states, and budgets.
no code implementations • 13 Aug 2020 • Yonathan Efroni, Nadav Merlis, Shie Mannor
The standard feedback model of reinforcement learning requires revealing the reward of every visited state-action pair.
no code implementations • 10 Aug 2020 • Nadav Merlis, Shie Mannor
Importantly, we show that when the mean of the optimal arm is high enough, the lenient regret of $\epsilon$-TS is bounded by a constant.
no code implementations • 13 Feb 2020 • Nadav Merlis, Shie Mannor
The Combinatorial Multi-Armed Bandit problem is a sequential decision-making problem in which an agent selects a set of arms on each round, observes feedback for each of these arms and aims to maximize a known reward function of the arms it chose.
no code implementations • 2 Oct 2019 • Pranav Khanna, Guy Tennenholtz, Nadav Merlis, Shie Mannor, Chen Tessler
In recent years, there has been significant progress in applying deep reinforcement learning (RL) for solving challenging problems across a wide variety of domains.
no code implementations • 25 Sep 2019 • Chen Tessler, Nadav Merlis, Shie Mannor
In recent years, advances in deep learning have enabled the application of reinforcement learning algorithms in complex domains.
1 code implementation • NeurIPS 2019 • Yonathan Efroni, Nadav Merlis, Mohammad Ghavamzadeh, Shie Mannor
In this paper, we focus on model-based RL in the finite-state finite-horizon MDP setting and establish that exploring with \emph{greedy policies} -- act by \emph{1-step planning} -- can achieve tight minimax performance in terms of regret, $\tilde{\mathcal{O}}(\sqrt{HSAT})$.
Model-based Reinforcement Learning
reinforcement-learning
+2
no code implementations • 8 May 2019 • Nadav Merlis, Shie Mannor
We show that a linear dependence of the regret in the batch size in existing algorithms can be replaced by this smoothness parameter.
no code implementations • NeurIPS 2018 • Tom Zahavy, Matan Haroush, Nadav Merlis, Daniel J. Mankowitz, Shie Mannor
Learning how to act when there are many available actions in each state is a challenging task for Reinforcement Learning (RL) agents, especially when many of the actions are redundant or irrelevant.