You need to log in to edit.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

no code implementations • 11 Oct 2023 • Jeongyeol Kwon, Yonathan Efroni, Shie Mannor, Constantine Caramanis

In such an environment, the latent information remains fixed throughout each episode, since the identity of the user does not change during an interaction.

no code implementations • 25 Jul 2023 • Mark Kozdoba, Binyamin Perets, Shie Mannor

We propose a new approach to non-parametric density estimation, that is based on regularizing a Sobolev norm of the density.

no code implementations • 9 Jun 2023 • Kaixin Wang, Uri Gadot, Navdeep Kumar, Kfir Levy, Shie Mannor

Robust Markov Decision Processes (RMDPs) provide a framework for sequential decision-making that is robust to perturbations on the transition kernel.

no code implementations • 31 May 2023 • Ofir Nabati, Guy Tennenholtz, Shie Mannor

We present a representation-driven framework for reinforcement learning.

no code implementations • 2 May 2023 • Chen Tessler, Yoni Kasten, Yunrong Guo, Shie Mannor, Gal Chechik, Xue Bin Peng

In this work, we present Conditional Adversarial Latent Models (CALM), an approach for generating diverse and directable behaviors for user-controlled interactive virtual characters.

1 code implementation • 12 Mar 2023 • Esther Derman, Yevgeniy Men, Matthieu Geist, Shie Mannor

We then generalize regularized MDPs to twice regularized MDPs ($\text{R}^2$ MDPs), i. e., MDPs with $\textit{both}$ value and policy regularization.

no code implementations • 31 Jan 2023 • Navdeep Kumar, Esther Derman, Matthieu Geist, Kfir Levy, Shie Mannor

We present a novel robust policy gradient method (RPG) for s-rectangular robust Markov Decision Processes (MDPs).

no code implementations • 31 Jan 2023 • Navdeep Kumar, Kfir Levy, Kaixin Wang, Shie Mannor

We present an efficient robust value iteration for \texttt{s}-rectangular robust Markov Decision Processes (MDPs) with a time complexity comparable to standard (non-robust) MDPs which is significantly faster than any existing method.

no code implementations • 30 Jan 2023 • Gal Dalal, Assaf Hallak, Gugan Thoppe, Shie Mannor, Gal Chechik

We prove that the resulting variance decays exponentially with the planning horizon as a function of the expansion policy.

no code implementations • 3 Jan 2023 • Shie Mannor, Aviv Tamar

Reinforcement learning (RL) has demonstrated great potential, but is currently full of overhyping and pipe dreams.

no code implementations • 13 Dec 2022 • Peter Karkus, Boris Ivanovic, Shie Mannor, Marco Pavone

To enable the joint optimization of AV stacks while retaining modularity, we present DiffStack, a differentiable and modular stack for prediction, planning, and control.

no code implementations • 5 Oct 2022 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

We consider episodic reinforcement learning in reward-mixing Markov decision processes (RMMDPs): at the beginning of every episode nature randomly picks a latent reward model among $M$ candidates and an agent interacts with the MDP throughout the episode for $H$ time steps.

no code implementations • 5 Oct 2022 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

Then, through a method-of-moments approach, we design a procedure that provably learns a near-optimal policy with $O(\texttt{poly}(A) + \texttt{poly}(M, H)^{\min(M, H)})$ interactions.

no code implementations • 3 Oct 2022 • Navdeep Kumar, Kaixin Wang, Kfir Levy, Shie Mannor

The policy gradient theorem proves to be a cornerstone in Linear RL due to its elegance and ease of implementability.

no code implementations • 28 Sep 2022 • Gal Dalal, Assaf Hallak, Shie Mannor, Gal Chechik

This allows us to reduce the variance of gradients by three orders of magnitude and to benefit from better sample complexity compared with standard policy gradient.

no code implementations • 19 Jul 2022 • Mohammadi Zaki, Avinash Mohan, Aditya Gopalan, Shie Mannor

For the AC-based approach we provide convergence rate guarantees to a stationary point in the basic AC case and to a global optimum in the NAC case.

1 code implementation • 5 Jul 2022 • Benjamin Fuhrer, Yuval Shpigelman, Chen Tessler, Shie Mannor, Gal Chechik, Eitan Zahavi, Gal Dalal

As communication protocols evolve, datacenter network utilization increases.

no code implementations • 26 Jun 2022 • Shirli Di Castro Shashua, Shie Mannor, Dotan Di-Castro

We provide an analysis of the properties of the sampled process such as stationarity, Markovity and autocorrelation in terms of the properties of the original process.

1 code implementation • 30 May 2022 • Guy Tennenholtz, Nadav Merlis, Lior Shani, Shie Mannor, Uri Shalit, Gal Chechik, Assaf Hallak, Gal Dalal

We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.

1 code implementation • 28 May 2022 • Navdeep Kumar, Kfir Levy, Kaixin Wang, Shie Mannor

But we don't have a clear understanding to exploit this equivalence, to do policy improvement steps to get the optimal value function or policy.

2 code implementations • 10 May 2022 • Ido Greenberg, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor

In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.

no code implementations • 18 Apr 2022 • Eli A. Meirom, Haggai Maron, Shie Mannor, Gal Chechik

Quantum Computing (QC) stands to revolutionize computing, but is currently still limited.

no code implementations • 12 Mar 2022 • Binyamin Perets, Mark Kozdoba, Shie Mannor

However, standard HMM learning algorithms rely crucially on the assumption that the positions of the missing observations \emph{within the observation sequence} are known.

no code implementations • 2 Feb 2022 • Yuval Atzmon, Eli A. Meirom, Shie Mannor, Gal Chechik

Reasoning and interacting with dynamic environments is a fundamental problem in AI, but it becomes extremely challenging when actions can trigger cascades of cross-dependent events.

1 code implementation • 31 Jan 2022 • Stav Belogolovsky, Ido Greenberg, Danny Eitan, Shie Mannor

Neural differential equations predict the derivative of a stochastic process.

no code implementations • 30 Jan 2022 • Kaixin Wang, Navdeep Kumar, Kuangqi Zhou, Bryan Hooi, Jiashi Feng, Shie Mannor

The key of this perspective is to decompose the value space, in a state-wise manner, into unions of hypersurfaces.

no code implementations • 30 Jan 2022 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

This parallelization gain is fundamentally altered by the presence of adversarial users: unless there are super-polynomial number of users, we show a lower bound of $\tilde{\Omega}(\min(S, A) \cdot \alpha^2 / \epsilon^2)$ {\it per-user} interactions to learn an $\epsilon$-optimal policy for the good users.

no code implementations • 28 Jan 2022 • Aviv Rosenberg, Assaf Hallak, Shie Mannor, Gal Chechik, Gal Dalal

Some of the most powerful reinforcement learning frameworks use planning for action selection.

no code implementations • ICLR 2022 • Guy Tennenholtz, Assaf Hallak, Gal Dalal, Shie Mannor, Gal Chechik, Uri Shalit

We analyze the limitations of learning from such data with and without external reward, and propose an adjustment of standard imitation learning algorithms to fit this setup.

no code implementations • NeurIPS 2021 • Esther Derman, Matthieu Geist, Shie Mannor

We finally generalize regularized MDPs to twice regularized MDPs (R${}^2$ MDPs), i. e., MDPs with $\textit{both}$ value and policy regularization.

no code implementations • 12 Oct 2021 • Nadav Merlis, Yonathan Efroni, Shie Mannor

We consider a stochastic multi-armed bandit setting where reward must be actively queried for it to be observed.

no code implementations • NeurIPS 2021 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

We study the problem of learning a near optimal policy for two reward-mixing MDPs.

1 code implementation • 5 Oct 2021 • Michael Lutter, Boris Belousov, Shie Mannor, Dieter Fox, Animesh Garg, Jan Peters

Especially for continuous control, solving this differential equation and its extension the Hamilton-Jacobi-Isaacs equation, is important as it yields the optimal policy that achieves the maximum reward on a give task.

no code implementations • NeurIPS 2021 • Shirli Di Castro Shashua, Dotan Di Castro, Shie Mannor

Simulation is used extensively in autonomous systems, particularly in robotic manipulation.

no code implementations • 29 Sep 2021 • Mark Kozdoba, Shie Mannor

Specifically, we discover and analyze two regimes of behavior of the networks, which are roughly related to the sparsity of the last layer.

no code implementations • 29 Sep 2021 • Ido Greenberg, Shie Mannor, Netanel Yannay

Determining the noise parameters of a Kalman Filter (KF) has been studied for decades.

no code implementations • 22 Sep 2021 • Roy Zohar, Shie Mannor, Guy Tennenholtz

Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.

Multi-agent Reinforcement Learning
reinforcement-learning
**+1**

1 code implementation • NeurIPS 2021 • Assaf Hallak, Gal Dalal, Steven Dalton, Iuri Frosio, Shie Mannor, Gal Chechik

We first discover and analyze a counter-intuitive phenomenon: action selection through TS and a pre-trained value function often leads to lower performance compared to the original pre-trained agent, even when having access to the exact state and reward in future steps.

1 code implementation • 25 May 2021 • Michael Lutter, Shie Mannor, Jan Peters, Dieter Fox, Animesh Garg

The adversarial perturbations encourage a optimal policy that is robust to changes in the dynamics.

1 code implementation • 10 May 2021 • Michael Lutter, Shie Mannor, Jan Peters, Dieter Fox, Animesh Garg

This algorithm enables dynamic programming for continuous states and actions with a known dynamics model.

no code implementations • 1 May 2021 • Mohammani Zaki, Avi Mohan, Aditya Gopalan, Shie Mannor

We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay.

1 code implementation • 6 Apr 2021 • Ido Greenberg, Shie Mannor, Netanel Yannay

The Kalman Filter (KF) parameters are traditionally determined by noise estimation, since under the KF assumptions, the state prediction errors are minimized when the parameters correspond to the noise covariance.

no code implementations • 18 Mar 2021 • Nir Baram, Guy Tennenholtz, Shie Mannor

However, using mixture policies in the Maximum Entropy (MaxEnt) framework is not straightforward.

no code implementations • 22 Feb 2021 • Guy Tennenholtz, Shie Mannor

In this work, we combine parametric and nonparametric methods for uncertainty estimation through a novel latent space based metric.

no code implementations • 22 Feb 2021 • Nir Baram, Guy Tennenholtz, Shie Mannor

Maximum Entropy (MaxEnt) reinforcement learning is a powerful learning paradigm which seeks to maximize return under entropy regularization.

no code implementations • 18 Feb 2021 • Chen Tessler, Yuval Shpigelman, Gal Dalal, Amit Mandelbaum, Doron Haritan Kazakov, Benjamin Fuhrer, Gal Chechik, Shie Mannor

We approach the task of network congestion control in datacenters using Reinforcement Learning (RL).

no code implementations • 16 Feb 2021 • Mohammadi Zaki, Avinash Mohan, Aditya Gopalan, Shie Mannor

We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process, and wishes to combine them optimally to produce a potentially new controller that can outperform each of the base ones.

no code implementations • 13 Feb 2021 • Lior Shani, Tom Zahavy, Shie Mannor

Finally, we implement a deep variant of our algorithm which shares some similarities to GAIL \cite{ho2016generative}, but where the discriminator is replaced with the costs learned by the OAL problem.

no code implementations • NeurIPS 2021 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

In this work, we consider the regret minimization problem for reinforcement learning in latent Markov Decision Processes (LMDP).

no code implementations • 7 Feb 2021 • Mark Kozdoba, Shie Mannor

In this work we study generalization guarantees for the metric learning problem, where the metric is induced by a neural network type embedding of the data.

3 code implementations • 7 Feb 2021 • Ofir Nabati, Tom Zahavy, Shie Mannor

To alleviate this, we propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.

no code implementations • 5 Feb 2021 • Yonathan Efroni, Nadav Merlis, Aadirupa Saha, Shie Mannor

We analyze the performance of CBM based algorithms in different settings and show that they perform well in the presence of adversity in the contexts, initial states, and budgets.

2 code implementations • ICLR 2021 • Esther Derman, Gal Dalal, Shie Mannor

We introduce a framework for learning and planning in MDPs where the decision-maker commits actions that are executed with a delay of $m$ steps.

no code implementations • 1 Jan 2021 • Bingyi Kang, Shie Mannor, Jiashi Feng

Reinforcement Learning (RL) with safety guarantee is critical for agents performing tasks in risky environments.

no code implementations • 1 Jan 2021 • Tom Zahavy, Ofir Nabati, Leor Cohen, Shie Mannor

We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.

no code implementations • 8 Dec 2020 • Ahmet Inci, Evgeny Bolotin, Yaosheng Fu, Gal Dalal, Shie Mannor, David Nellans, Diana Marculescu

With deep reinforcement learning (RL) methods achieving results that exceed human capabilities in games, robotics, and simulated environments, continued scaling of RL training is crucial to its deployment in solving complex real-world problems.

1 code implementation • 22 Oct 2020 • Ido Greenberg, Shie Mannor

In many RL applications, once training ends, it is vital to detect any deterioration in the agent performance as soon as possible.

no code implementations • 11 Oct 2020 • Eli A. Meirom, Haggai Maron, Shie Mannor, Gal Chechik

We consider the problem of controlling a partially-observed dynamic process on a graph by a limited number of interventions.

no code implementations • 28 Sep 2020 • Ido Greenberg, Shie Mannor

The statistical power of the new testing procedure is shown to outperform alternative tests - often by orders of magnitude - for a variety of environment modifications (which cause deterioration in agent performance).

no code implementations • 13 Aug 2020 • Yonathan Efroni, Nadav Merlis, Shie Mannor

The standard feedback model of reinforcement learning requires revealing the reward of every visited state-action pair.

1 code implementation • 10 Aug 2020 • Nadav Merlis, Shie Mannor

Importantly, we show that when the mean of the optimal arm is high enough, the lenient regret of $\epsilon$-TS is bounded by a constant.

no code implementations • ICLR 2021 • Shauharda Khadka, Estelle Aflalo, Mattias Marder, Avrech Ben-David, Santiago Miret, Shie Mannor, Tamir Hazan, Hanlin Tang, Somdeb Majumdar

For deep neural network accelerators, memory movement is both energetically expensive and can bound computation.

no code implementations • 11 Jun 2020 • Guy Tennenholtz, Uri Shalit, Shie Mannor, Yonathan Efroni

We construct a linear bandit algorithm that takes advantage of the projected information, and prove regret bounds.

no code implementations • 5 Mar 2020 • Esther Derman, Shie Mannor

Distributionally Robust Optimization (DRO) has enabled to prove the equivalence between robustness and regularization in classification and regression, thus providing an analytical reason why regularization generalizes well in statistical learning.

no code implementations • 4 Mar 2020 • Yonathan Efroni, Shie Mannor, Matteo Pirotta

In many sequential decision-making problems, the goal is to optimize a utility function while satisfying a set of constraints on different utilities.

1 code implementation • 23 Feb 2020 • Daniel Teitelman, Itay Naeh, Shie Mannor

This paper makes a substantial step towards cloning the functionality of black-box models by introducing a Machine learning (ML) architecture named Deep Neural Trees (DNTs).

no code implementations • ICML 2020 • Yonathan Efroni, Lior Shani, Aviv Rosenberg, Shie Mannor

To the best of our knowledge, the two results are the first sub-linear regret bounds obtained for policy optimization algorithms with unknown transitions and bandit feedback.

1 code implementation • 17 Feb 2020 • Shirli Di-Castro Shashua, Shie Mannor

These frameworks can learn uncertainties over the value parameters and exploit them for policy exploration.

no code implementations • 13 Feb 2020 • Nadav Merlis, Shie Mannor

The Combinatorial Multi-Armed Bandit problem is a sequential decision-making problem in which an agent selects a set of arms on each round, observes feedback for each of these arms and aims to maximize a known reward function of the arms it chose.

1 code implementation • CVPR 2021 • Roi Pony, Itay Naeh, Shie Mannor

In this work we present a manipulation scheme for fooling video classifiers by introducing a flickering temporal perturbation that in some cases may be unnoticeable by human observers and is implementable in the real world.

no code implementations • 9 Feb 2020 • Chen Tessler, Shie Mannor

In reinforcement learning, the discount factor $\gamma$ controls the agent's effective planning horizon.

no code implementations • 2 Oct 2019 • Erez Schwartz, Guy Tennenholtz, Chen Tessler, Shie Mannor

Recent advances in reinforcement learning have shown its potential to tackle complex real-life tasks.

no code implementations • 2 Oct 2019 • Pranav Khanna, Guy Tennenholtz, Nadav Merlis, Shie Mannor, Chen Tessler

In recent years, there has been significant progress in applying deep reinforcement learning (RL) for solving challenging problems across a wide variety of domains.

1 code implementation • 25 Sep 2019 • Tom Zahavy, Shie Mannor

We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.

no code implementations • 25 Sep 2019 • Chen Tessler, Nadav Merlis, Shie Mannor

In recent years, advances in deep learning have enabled the application of reinforcement learning algorithms in complex domains.

no code implementations • 25 Sep 2019 • Nir Baram, Shie Mannor

Model-based imitation learning methods require full knowledge of the transition kernel for policy evaluation.

no code implementations • 25 Sep 2019 • Philip Korsunsky, Stav Belogolovsky, Tom Zahavy, Chen Tessler, Shie Mannor

In this setting, the reward, which is unknown to the agent, is a function of a static parameter referred to as the context.

no code implementations • NeurIPS 2020 • Yonathan Efroni, Mohammad Ghavamzadeh, Shie Mannor

This is the first work that proves improved sample complexity as a result of {\em increasing} the lookahead horizon in online planning.

no code implementations • 9 Sep 2019 • Guy Tennenholtz, Shie Mannor, Uri Shalit

This work studies the problem of batch off-policy evaluation for Reinforcement Learning in partially observable environments.

no code implementations • 6 Sep 2019 • Lior Shani, Yonathan Efroni, Shie Mannor

Trust region policy optimization (TRPO) is a popular and empirically successful policy search algorithm in Reinforcement Learning (RL) in which a surrogate problem, that restricts consecutive policies to be 'close' to one another, is iteratively solved.

no code implementations • 22 Aug 2019 • Dotan Di Castro, Joel Oren, Shie Mannor

Practical application of Reinforcement Learning (RL) often involves risk considerations.

no code implementations • 13 Jun 2019 • Mark Kozdoba, Edward Moroshko, Shie Mannor, Koby Crammer

The proposed bounds depend on the shape of a certain spectrum related to the system operator, and thus provide the first known explicit geometric parameter of the data that can be used to bound estimation errors.

1 code implementation • ICML 2020 • Dan Fisher, Mark Kozdoba, Shie Mannor

FDMs model second moment under general generative assumptions on the data.

1 code implementation • NeurIPS 2019 • Yonathan Efroni, Nadav Merlis, Mohammad Ghavamzadeh, Shie Mannor

In this paper, we focus on model-based RL in the finite-state finite-horizon MDP setting and establish that exploring with \emph{greedy policies} -- act by \emph{1-step planning} -- can achieve tight minimax performance in terms of regret, $\tilde{\mathcal{O}}(\sqrt{HSAT})$.

Model-based Reinforcement Learning
reinforcement-learning
**+1**

no code implementations • 23 May 2019 • Chen Tessler, Tom Zahavy, Deborah Cohen, Daniel J. Mankowitz, Shie Mannor

We propose a computationally efficient algorithm that combines compressed sensing with imitation learning to solve text-based games with combinatorial action spaces.

3 code implementations • NeurIPS 2019 • Chen Tessler, Guy Tennenholtz, Shie Mannor

We show that optimizing over such sets results in local movement in the action space and thus convergence to sub-optimal solutions.

2 code implementations • 23 May 2019 • Stav Belogolovsky, Philip Korsunsky, Shie Mannor, Chen Tessler, Tom Zahavy

Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts).

no code implementations • 20 May 2019 • Esther Derman, Daniel Mankowitz, Timothy Mann, Shie Mannor

Robust Markov Decision Processes (RMDPs) intend to ensure robustness with respect to changing or adversarial system behavior.

no code implementations • 8 May 2019 • Nadav Merlis, Shie Mannor

We show that a linear dependence of the regret in the batch size in existing algorithms can be replaced by this smoothness parameter.

no code implementations • 6 May 2019 • Shreyansh Gandhi, Samrat Kokkula, Abon Chaudhuri, Alessandro Magnani, Theban Stanley, Behzad Ahmadi, Venkatesh Kandaswamy, Omer Ovenc, Shie Mannor

In this paper, we present a computer vision driven offensive and non-compliant image detection system for extremely large image datasets.

no code implementations • 12 Feb 2019 • Xavier Fontaine, Shie Mannor, Vianney Perchet

This can be recast as a specific stochastic optimization problem where the objective is to maximize the cumulative reward, or equivalently to minimize the regret.

1 code implementation • 4 Feb 2019 • Guy Tennenholtz, Shie Mannor

We introduce Act2Vec, a general framework for learning context-based action representation for Reinforcement Learning.

no code implementations • NeurIPS 2019 • Chao Qu, Shie Mannor, Huan Xu, Yuan Qi, Le Song, Junwu Xiong

To the best of our knowledge, it is the first MARL algorithm with convergence guarantee in the control, off-policy and non-linear function approximation setting.

Multi-agent Reinforcement Learning
reinforcement-learning
**+1**

2 code implementations • 26 Jan 2019 • Chen Tessler, Yonathan Efroni, Shie Mannor

In this work we formalize two new criteria of robustness to action uncertainty.

no code implementations • 24 Jan 2019 • Tom Zahavy, Shie Mannor

We study the neural-linear bandit model for solving sequential decision-making problems with high dimensional side information.

no code implementations • 23 Jan 2019 • Shirli Di-Castro Shashua, Shie Mannor

However, this approach ignores certain distributional properties of both the errors and value parameters.

no code implementations • 17 Dec 2018 • Mark Kozdoba, Edward Moroshko, Lior Shani, Takuya Takagi, Takashi Katoh, Shie Mannor, Koby Crammer

In the context of Multi Instance Learning, we analyze the Single Instance (SI) learning objective.

1 code implementation • 13 Dec 2018 • Lior Shani, Yonathan Efroni, Shie Mannor

We continue and analyze properties of exploration-conscious optimal policies and characterize two general approaches to solve such criteria.

no code implementations • NeurIPS 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control.

1 code implementation • AAAI 2019 • Mark Kozdoba, Jakub Marecek, Tigran Tchrakian, Shie Mannor

Based on this insight, we devise an on-line algorithm for improper learning of a linear dynamical system (LDS), which considers only a few most recent observations.

no code implementations • 16 Sep 2018 • Nir Baram, Shie Mannor

We denote this setup as \textit{Inspiration Learning} - knowledge transfer between agents that operate in different action spaces.

no code implementations • 6 Sep 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Finite-horizon lookahead policies are abundantly used in Reinforcement Learning and demonstrate impressive empirical success.

no code implementations • NeurIPS 2018 • Tom Zahavy, Matan Haroush, Nadav Merlis, Daniel J. Mankowitz, Shie Mannor

Learning how to act when there are many available actions in each state is a challenging task for Reinforcement Learning (RL) agents, especially when many of the actions are redundant or irrelevant.

no code implementations • 14 Aug 2018 • Orly Avner, Shie Mannor

Communication networks shared by many users are a widespread challenge nowadays.

no code implementations • ICML 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation.

no code implementations • 4 Jun 2018 • Asaf Cassel, Shie Mannor, Assaf Zeevi

Unlike the case of cumulative criteria, in the problems we study here the oracle policy, that knows the problem parameters a priori and is used to "center" the regret, is not trivial.

1 code implementation • ICLR 2019 • Chen Tessler, Daniel J. Mankowitz, Shie Mannor

Solving tasks in Reinforcement Learning is no easy feat.

no code implementations • 21 May 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control.

no code implementations • 20 May 2018 • Chao Qu, Shie Mannor, Huan Xu

We devise a distributional variant of gradient temporal-difference (TD) learning.

no code implementations • 11 Apr 2018 • Mark Kozdoba, Shie Mannor

Gibbs sampling, as a model learning method, is known to produce the most accurate results available in a variety of domains, and is a de facto standard in these domains.

no code implementations • 15 Mar 2018 • Tom Zahavy, Alex Dikopoltsev, Oren Cohen, Shie Mannor, Mordechai Segev

Ultra-short laser pulses with femtosecond to attosecond pulse duration are the shortest systematic events humans can create.

no code implementations • 11 Mar 2018 • Esther Derman, Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

It learns an optimal policy with respect to a distribution over an uncertainty set and stays robust to model uncertainty but avoids the conservativeness of robust strategies.

no code implementations • 16 Feb 2018 • Guy Tennenholtz, Tom Zahavy, Shie Mannor

We define the notion of on-average-validation-stable algorithms as one in which using small portions of validation data for training does not overfit the model selection process.

no code implementations • 10 Feb 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation.

no code implementations • 9 Feb 2018 • Daniel J. Mankowitz, Timothy A. Mann, Pierre-Luc Bacon, Doina Precup, Shie Mannor

We present a Robust Options Policy Iteration (ROPI) algorithm with convergence guarantees, which learns options that are robust to model uncertainty.

no code implementations • 22 Nov 2017 • Guy Tennenholtz, Constantine Caramanis, Shie Mannor

We devise a simple policy that only vaccinates neighbors of infected nodes and is optimal on regular trees and on general graphs for a sufficiently large budget.

no code implementations • 20 Nov 2017 • Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

We learn reusable options in different scenarios in a RoboCup soccer domain (i. e., winning/losing).

no code implementations • ICML 2017 • Nir Baram, Oron Anschel, Itai Caspi, Shie Mannor

Generative Adversarial Networks (GANs) have been successfully applied to the problem of policy imitation in a model-free setup.

no code implementations • ICML 2017 • Robert Busa-Fekete, Balazs Szorenyi, Paul Weng, Shie Mannor

We study the multi-armed bandit (MAB) problem where the agent receives a vectorial feedback that encodes many possibly competing objectives to be optimized.

no code implementations • NeurIPS 2017 • Nir Levine, Tom Zahavy, Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

In this work we propose a hybrid approach -- the Least Squares Deep Q-Network (LS-DQN), which combines rich feature representations learned by a DRL algorithm with the stability of a linear least squares method.

no code implementations • 4 Apr 2017 • Gal Dalal, Balázs Szörényi, Gugan Thoppe, Shie Mannor

TD(0) is one of the most commonly used algorithms in reinforcement learning.

no code implementations • 15 Mar 2017 • Gal Dalal, Balazs Szorenyi, Gugan Thoppe, Shie Mannor

Using this, we provide a concentration bound, which is the first such result for a two-timescale SA.

no code implementations • 7 Mar 2017 • Shirli Di-Castro Shashua, Shie Mannor

The Deep-RoK algorithm is a robust Bayesian method, based on the Extended Kalman Filter (EKF), that accounts for both the uncertainty in the weights of the approximated value function and the uncertainty in the transition probabilities, improving the robustness of the agent.

no code implementations • 25 Feb 2017 • Alon Cohen, Shie Mannor

We study the problem of prediction with expert advice when the number of experts in question may be extremely large or even infinite.

no code implementations • ICML 2017 • Assaf Hallak, Shie Mannor

The problem of on-line off-policy evaluation (OPE) has been actively studied in the last decade due to its importance both as a stand-alone problem and as a module in a policy improvement scheme.

no code implementations • NeurIPS 2017 • Nir Levine, Koby Crammer, Shie Mannor

In the classical MAB problem, a decision maker must choose an arm at each time step, upon which she receives a reward.

no code implementations • 1 Jan 2017 • Jiashi Feng, Huan Xu, Shie Mannor

We consider the problem of learning from noisy data in practical settings where the size of data is too large to store on a single machine.

no code implementations • 30 Dec 2016 • Timothy A. Mann, Hugo Penedones, Shie Mannor, Todd Hester

Temporal Difference learning or TD($\lambda$) is a fundamental algorithm in the field of reinforcement learning.

no code implementations • 20 Dec 2016 • Raphael Canyasse, Gal Dalal, Shie Mannor

In this work we design and compare different supervised learning algorithms to compute the cost of Alternating Current Optimal Power Flow (ACOPF).

no code implementations • 7 Dec 2016 • Nir Baram, Oron Anschel, Shie Mannor

A model-based approach for the problem of adversarial imitation learning.

no code implementations • NeurIPS 2016 • Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that (1) learns skills (i. e., temporally extended actions or options) as well as (2) where to apply them.

no code implementations • 30 Nov 2016 • Gal Dalal, Elad Gilboa, Shie Mannor, Louis Wehenkel

We devise the Unit Commitment Nearest Neighbor (UCNN) algorithm to be used as a proxy for quickly approximating outcomes of short-term decisions, to make tractable hierarchical long-term assessment and planning for large power systems.

no code implementations • 29 Nov 2016 • Tom Zahavy, Alessandro Magnani, Abhinandan Krishnan, Shie Mannor

Classifying products into categories precisely and efficiently is a major challenge in modern e-commerce.

no code implementations • 10 Oct 2016 • Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

In addition, the learned risk aware skills are able to mitigate reward-based model misspecification.

no code implementations • 8 Oct 2016 • Vineet Abhishek, Shie Mannor

The proposed test does not require knowledge of the underlying probability distribution generating the data.

no code implementations • 14 Sep 2016 • Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, Aviv Tamar

The objective of the paper is to provide a comprehensive survey on Bayesian RL algorithms and their theoretical and empirical properties.

no code implementations • 10 Jul 2016 • Oran Richman, Shie Mannor

We study classification problems where features are corrupted by noise and where the magnitude of the noise in each feature is influenced by the resources allocated to its acquisition.

no code implementations • 22 Jun 2016 • Nir Ben Zrihem, Tom Zahavy, Shie Mannor

Deep Reinforcement Learning (DRL) is a trending field of research, showing great promise in many challenging problems such as playing Atari, solving Go and controlling robots.

no code implementations • 16 Jun 2016 • Nir Baram, Tom Zahavy, Shie Mannor

Deep Reinforcement Learning (DRL) is a trending field of research, showing great promise in challenging problems such as playing Atari, solving Go and controlling robots.

no code implementations • 21 May 2016 • Oran Richman, Shie Mannor

Features that hold information about the "difficulty" of the data may be non-discriminative and are therefore disregarded in the classification process.

no code implementations • 13 May 2016 • Irit Hochberg, Guy Feraru, Mark Kozdoba, Shie Mannor, Moshe Tennenholtz, Elad Yom-Tov

Messages were personalized through a Reinforcement Learning (RL) algorithm which optimized messages to improve each participant's compliance with the activity regimen.

no code implementations • 9 May 2016 • Mark Kozdoba, Shie Mannor

Suppose that we are given a time series where consecutive samples are believed to come from a probabilistic source, that the source changes from time to time and that the total number of sources is fixed.

no code implementations • 25 Apr 2016 • Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J. Mankowitz, Shie Mannor

Skill distillation enables the HDRLN to efficiently retain knowledge and therefore scale in lifelong learning, by accumulating knowledge and encapsulating multiple reusable skills into a single distilled network.

no code implementations • 6 Mar 2016 • Gal Dalal, Elad Gilboa, Shie Mannor

The power grid is a complex and vital system that necessitates careful reliability management.

no code implementations • 10 Feb 2016 • Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

For complex, high-dimensional Markov Decision Processes (MDPs), it may be necessary to represent the policy with function approximation.

no code implementations • 10 Feb 2016 • Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that (1) learns skills (i. e., temporally extended actions or options) as well as (2) where to apply them.

no code implementations • 8 Feb 2016 • Tom Zahavy, Nir Ben Zrihem, Shie Mannor

In recent years there is a growing interest in using deep representations for reinforcement learning.

no code implementations • ICLR 2018 • Tom Zahavy, Bingyi Kang, Alex Sivak, Jiashi Feng, Huan Xu, Shie Mannor

As most deep learning algorithms are stochastic (e. g., Stochastic Gradient Descent, Dropout, and Bayes-by-backprop), we revisit the robustness arguments of Xu & Mannor, and introduce a new approach, ensemble robustness, that concerns the robustness of a population of hypotheses.

no code implementations • NeurIPS 2015 • Mark Kozdoba, Shie Mannor

We present a new algorithm for community detection.

no code implementations • NeurIPS 2015 • Oren Anava, Elad Hazan, Shie Mannor

In this work we extend the notion of learning with memory to the general Online Convex Optimization (OCO) framework, and present two algorithms that attain low regret.

2 code implementations • 4 Nov 2015 • Noam Segev, Maayan Harel, Shie Mannor, Koby Crammer, Ran El-Yaniv

We propose novel model transfer-learning methods that refine a decision forest model M learned within a "source" domain using a training set sampled from a "target" domain, assumed to be a variation of the source.

no code implementations • 17 Sep 2015 • Assaf Hallak, Aviv Tamar, Remi Munos, Shie Mannor

We consider the off-policy evaluation problem in Markov decision processes with function approximation.

no code implementations • 14 Aug 2015 • Assaf Hallak, Aviv Tamar, Shie Mannor

Recently, \citet{SuttonMW15} introduced the emphatic temporal differences (ETD) algorithm for off-policy evaluation in Markov decision processes.

no code implementations • 19 Jul 2015 • Gal Dalal, Shie Mannor

In this work we solve the day-ahead unit commitment (UC) problem, by formulating it as a Markov decision process (MDP) and finding a low-cost policy for generation scheduling.

no code implementations • 11 Jun 2015 • Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

The monolithic approach to policy representation in Markov Decision Processes (MDPs) looks for a single policy that can be represented as a function from states to actions.

no code implementations • NeurIPS 2015 • Yin-Lam Chow, Aviv Tamar, Shie Mannor, Marco Pavone

Our first contribution is to show that a CVaR objective, besides capturing risk sensitivity, has an alternative interpretation as expected cost under worst-case modeling errors, for a given error budget.

no code implementations • 30 Apr 2015 • Orly Avner, Shie Mannor

Inspired by cognitive radio networks, we consider a setting where multiple users share several channels modeled as a multi-user multi-armed bandit (MAB) problem.

no code implementations • 26 Apr 2015 • Mark Kozdoba, Shie Mannor

We present a new algorithm for community detection.

no code implementations • 26 Apr 2015 • Mark Kozdoba, Shie Mannor

We present a new online algorithm for detecting overlapping communities.

no code implementations • 16 Apr 2015 • Nir Levine, Timothy A. Mann, Shie Mannor

Twitter, a popular social network, presents great opportunities for on-line machine learning research.

no code implementations • NeurIPS 2015 • Aviv Tamar, Yin-Lam Chow, Mohammad Ghavamzadeh, Shie Mannor

For static risk measures, our approach is in the spirit of policy gradient algorithms and combines a standard sampling approach with convex programming.

no code implementations • 11 Feb 2015 • Assaf Hallak, François Schnitzler, Timothy Mann, Shie Mannor

Off-policy learning in dynamic decision problems is essential for providing strong evidence that a new policy is better than the one in use.

no code implementations • 8 Feb 2015 • Assaf Hallak, Dotan Di Castro, Shie Mannor

The objective is to learn a strategy that maximizes the accumulated reward across all contexts.

no code implementations • 21 Dec 2014 • Aviv Tamar, Panos Toulis, Shie Mannor, Edoardo M. Airoldi

In reinforcement learning, the TD($\lambda$) algorithm is a fundamental policy evaluation method with an efficient online implementation that is suitable for large-scale problems.

no code implementations • NeurIPS 2014 • Odalric-Ambrym Maillard, Timothy A. Mann, Shie Mannor

In Reinforcement Learning (RL), state-of-the-art algorithms require a large number of samples per state-action pair to estimate the transition kernel $p$.

no code implementations • NeurIPS 2014 • Jiashi Feng, Huan Xu, Shie Mannor, Shuicheng Yan

We consider logistic regression with arbitrary outliers in the covariate matrix.

no code implementations • 30 Sep 2014 • Noga Alon, Nicolò Cesa-Bianchi, Claudio Gentile, Shie Mannor, Yishay Mansour, Ohad Shamir

This naturally models several situations where the losses of different actions are related, and knowing the loss of one action provides information on the loss of other actions.

no code implementations • 21 Sep 2014 • Jiashi Feng, Huan Xu, Shie Mannor

We propose a framework for distributed robust statistical learning on {\em big contaminated data}.

no code implementations • 29 Jun 2014 • Aditya Gopalan, Shie Mannor

We consider reinforcement learning in parameterized Markov Decision Processes (MDPs), where the parameterization may induce correlation across transition probabilities or rewards.

no code implementations • 22 Apr 2014 • Orly Avner, Shie Mannor

Even the number of users may be unknown and can vary as users join or leave the network.

1 code implementation • 15 Apr 2014 • Aviv Tamar, Yonatan Glassner, Shie Mannor

Conditional Value at Risk (CVaR) is a prominent risk measure that is being used extensively in various domains.

no code implementations • 25 Feb 2014 • Aharon Ben-Tal, Elad Hazan, Tomer Koren, Shie Mannor

Robust optimization is a common framework in optimization under uncertainty when the problem parameters are not known, but it is rather known that the parameters belong to some given uncertainty set.

no code implementations • 10 Feb 2014 • Shie Mannor, Vianney Perchet, Gilles Stoltz

We show that it is impossible, in general, to approach the best target set in hindsight and propose achievable though ambitious alternative goals.

no code implementations • 6 Feb 2014 • Eli A. Meirom, Chris Milling, Constantine Caramanis, Shie Mannor, Ariel Orda, Sanjay Shakkottai

Our algorithm requires only local-neighbor knowledge of this graph, and in a broad array of settings that we describe, succeeds even when false negatives and false positives make up an overwhelming fraction of the data available.

no code implementations • NeurIPS 2013 • Daniel Vainsencher, Shie Mannor, Huan Xu

We demonstrate the robustness benefits of our approach with some experimental results and prove for the important case of clustering that our approach has a non-trivial breakdown point, i. e., is guaranteed to be robust to a fixed percentage of adversarial unbounded outliers.

no code implementations • NeurIPS 2013 • Jiashi Feng, Huan Xu, Shie Mannor, Shuicheng Yan

We consider the online Principal Component Analysis (PCA) for contaminated samples (containing outliers) which are revealed sequentially to the Principal Components (PCs) estimator.

no code implementations • NeurIPS 2013 • Shiau Hong Lim, Huan Xu, Shie Mannor

An important challenge in Markov decision processes is to ensure robustness with respect to unexpected or adversarial system behavior while taking advantage of well-behaving parts of the system.

no code implementations • 3 Nov 2013 • Aditya Gopalan, Shie Mannor, Yishay Mansour

We consider stochastic multi-armed bandit problems with complex actions over a set of basic arms, where the decision maker plays a complex action rather than a basic arm in each round.

no code implementations • 14 Oct 2013 • Aviv Tamar, Shie Mannor

We present an actor-critic framework for MDPs where the objective is the variance-adjusted expected return.

no code implementations • 26 Jun 2013 • Aviv Tamar, Huan Xu, Shie Mannor

We consider large-scale Markov decision processes (MDPs) with parameter uncertainty, under the robust MDP paradigm.

no code implementations • 23 May 2013 • Shie Mannor, Vianney Perchet, Gilles Stoltz

In this paper we provide primal conditions on a convex set to be approachable with partial monitoring.

no code implementations • 27 Feb 2013 • Oren Anava, Elad Hazan, Shie Mannor

The framework of online learning with memory naturally captures learning problems with temporal constraints, and was previously studied for the experts setting.

no code implementations • NeurIPS 2012 • Maayan Harel, Shie Mannor

We introduce a new discrepancy score between two distributions that gives an indication on their \emph{similarity}.

no code implementations • 27 Jun 2012 • Dotan Di Castro, Aviv Tamar, Shie Mannor

In this paper we devise a framework for local policy gradient style algorithms for reinforcement learning for variance related criteria.

no code implementations • NeurIPS 2011 • Loc X. Bui, Ramesh Johari, Shie Mannor

In the second phase the decision maker has to commit to one of the arms and stick with it.

no code implementations • NeurIPS 2011 • Shie Mannor, Ohad Shamir

We consider an adversarial online learning setting where a decision maker can choose an action in every stage of the game.

no code implementations • NeurIPS 2010 • Andrey Bernstein, Shie Mannor, Nahum Shimkin

To our best knowledge, this is the first algorithm that addresses the problem of the average tp-rate maximization under average fp-rate constraints in the online setting.

no code implementations • NeurIPS 2010 • Huan Xu, Shie Mannor

We consider Markov decision processes where the values of the parameters are uncertain.

no code implementations • NeurIPS 2008 • Huan Xu, Constantine Caramanis, Shie Mannor

We generalize this robust formulation to consider more general uncertainty sets, which all lead to tractable convex optimization problems.

no code implementations • NeurIPS 2008 • Amir M. Farahmand, Mohammad Ghavamzadeh, Shie Mannor, Csaba Szepesvári

In this paper we consider approximate policy-iteration-based reinforcement learning algorithms.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.