no code implementations • 21 Jan 2025 • Uri Gadot, Assaf Shocher, Shie Mannor, Gal Chechik, Assaf Hallak
Video encoders optimize compression for human perception by minimizing reconstruction error under bit-rate constraints.
1 code implementation • 4 Nov 2024 • Elad Sharony, Heng Yang, Tong Che, Marco Pavone, Shie Mannor, Peter Karkus
Sequentially solving similar optimization problems under strict runtime constraints is essential for many applications, such as robot control, autonomous driving, and portfolio management.
no code implementations • 25 Oct 2024 • Ryan Park, Darren J. Hsu, C. Brian Roland, Maria Korshunova, Chen Tessler, Shie Mannor, Olivia Viessmann, Bruno Trentini
However, when applied to peptides, these models are prone to generating repetitive sequences that do not fold into the reference structure.
no code implementations • 11 Oct 2024 • Navdeep Kumar, Priyank Agrawal, Giorgia Ramponi, Kfir Yehuda Levy, Shie Mannor
In this paper, we establish the global convergence of the actor-critic algorithm with a significantly improved sample complexity of $O(\epsilon^{-3})$, advancing beyond the existing local convergence results.
no code implementations • 27 Sep 2024 • Emilie Jong, Samuel Chevalier, Spyros Chatzivasileiadis, Shie Mannor
Electricity markets currently fail to incorporate preferences of buyers, treating polluting and renewable energy sources as having equal social benefit under a system of uniform clearing prices.
no code implementations • 26 Sep 2024 • Mark Kozdoba, Binyamin Perets, Shie Mannor
Due to the complexity of optimisation algorithms in most modern representation learning approaches, for a given method it may be non-trivial to decide whether the obtained fairness-performance curve of the method is optimal, i. e., whether it is close to the true Pareto front for these quantities for the underlying data distribution.
no code implementations • 20 Aug 2024 • Guy Lutsker, Gal Sapir, Smadar Shilo, Jordi Merino, Anastasia Godneva, Jerry R Greenfield, Dorit Samocha-Bonet, Raja Dhir, Francisco Gude, Shie Mannor, Eli Meirom, Gal Chechik, Hagai Rossman, Eran Segal
Here, we present GluFormer, a generative foundation model for CGM data that learns nuanced glycemic patterns and translates them into predictive representations of metabolic health.
no code implementations • 26 Jun 2024 • Assaf Hallak, Gal Dalal, Chen Tessler, Kelly Guo, Shie Mannor, Gal Chechik
Controlling humanoids in complex physically simulated worlds is a long-standing challenge with numerous applications in gaming, simulation, and visual content creation.
no code implementations • 3 Jun 2024 • Jeongyeol Kwon, Shie Mannor, Constantine Caramanis, Yonathan Efroni
Our result builds off a new perspective on the role of off-policy evaluation guarantees and coverage coefficients in LMDPs, a perspective, that has been overlooked in the context of exploration in partially observed environments.
1 code implementation • 26 May 2024 • Itai Shufaro, Nadav Merlis, Nir Weinberger, Shie Mannor
We study the trade-off between the information an agent accumulates and the regret it suffers.
1 code implementation • 8 Apr 2024 • David Valensi, Esther Derman, Shie Mannor, Gal Dalal
We show that given observed delay values, it is sufficient to perform a policy search in the class of Markov policies in order to reach optimal performance, thus extending the deterministic fixed delay case.
no code implementations • 11 Mar 2024 • Navdeep Kumar, Yashaswini Murthy, Itai Shufaro, Kfir Y. Levy, R. Srikant, Shie Mannor
We present the first finite time global convergence analysis of policy gradient in the context of infinite horizon average reward Markov decision processes (MDPs).
no code implementations • 8 Mar 2024 • Nitsan Soffair, Shie Mannor
DDPG is hindered by the overestimation bias problem, wherein its $Q$-estimates tend to overstate the actual $Q$-values.
no code implementations • 15 Feb 2024 • Yihan Du, Anna Winnicki, Gal Dalal, Shie Mannor, R. Srikant
The algorithm is based on the popular Policy Cover-Policy Gradient (PC-PG) algorithm, which assumes knowledge of the reward function.
1 code implementation • 8 Feb 2024 • Lior Cohen, Kaixin Wang, Bingyi Kang, Shie Mannor
We incorporate POP in a novel TBWM agent named REM (Retentive Environment Model), showcasing a 15. 4x faster imagination compared to prior TBWMs.
no code implementations • 3 Feb 2024 • Nitsan Soffair, Dotan Di-Castro, Orly Avner, Shie Mannor
We implement SQT on top of TD3/TD7 code and test it against the state-of-the-art (SOTA) actor-critic algorithms, DDPG, TD3 and TD7 on seven popular MuJoCo and Bullet tasks.
no code implementations • 3 Feb 2024 • Nitsan Soffair, Shie Mannor
MinMaxMin $Q$-learning is a novel optimistic Actor-Critic algorithm that addresses the problem of overestimation bias ($Q$-estimations are overestimating the real $Q$-values) inherent in conservative RL algorithms.
no code implementations • 11 Oct 2023 • Jeongyeol Kwon, Yonathan Efroni, Shie Mannor, Constantine Caramanis
In such an environment, the latent information remains fixed throughout each episode, since the identity of the user does not change during an interaction.
no code implementations • 3 Sep 2023 • Uri Gadot, Esther Derman, Navdeep Kumar, Maxence Mohamed Elfatihi, Kfir Levy, Shie Mannor
In robust Markov decision processes (RMDPs), it is assumed that the reward and the transition dynamics lie in a given uncertainty set.
no code implementations • 25 Jul 2023 • Mark Kozdoba, Binyamin Perets, Shie Mannor
We propose a new approach to non-parametric density estimation that is based on regularizing a Sobolev norm of the density.
no code implementations • 9 Jun 2023 • Kaixin Wang, Uri Gadot, Navdeep Kumar, Kfir Levy, Shie Mannor
Robust Markov Decision Processes (RMDPs) provide a framework for sequential decision-making that is robust to perturbations on the transition kernel.
no code implementations • 31 May 2023 • Ofir Nabati, Guy Tennenholtz, Shie Mannor
We present a representation-driven framework for reinforcement learning.
no code implementations • 2 May 2023 • Chen Tessler, Yoni Kasten, Yunrong Guo, Shie Mannor, Gal Chechik, Xue Bin Peng
In this work, we present Conditional Adversarial Latent Models (CALM), an approach for generating diverse and directable behaviors for user-controlled interactive virtual characters.
1 code implementation • 12 Mar 2023 • Esther Derman, Yevgeniy Men, Matthieu Geist, Shie Mannor
We then generalize regularized MDPs to twice regularized MDPs ($\text{R}^2$ MDPs), i. e., MDPs with $\textit{both}$ value and policy regularization.
no code implementations • NeurIPS 2023 • Navdeep Kumar, Esther Derman, Matthieu Geist, Kfir Levy, Shie Mannor
We provide a closed-form expression for the worst occupation measure.
no code implementations • 31 Jan 2023 • Navdeep Kumar, Kfir Levy, Kaixin Wang, Shie Mannor
We present an efficient robust value iteration for \texttt{s}-rectangular robust Markov Decision Processes (MDPs) with a time complexity comparable to standard (non-robust) MDPs which is significantly faster than any existing method.
no code implementations • 30 Jan 2023 • Gal Dalal, Assaf Hallak, Gugan Thoppe, Shie Mannor, Gal Chechik
We prove that the resulting variance decays exponentially with the planning horizon as a function of the expansion policy.
no code implementations • 3 Jan 2023 • Shie Mannor, Aviv Tamar
Reinforcement learning (RL) has demonstrated great potential, but is currently full of overhyping and pipe dreams.
no code implementations • 13 Dec 2022 • Peter Karkus, Boris Ivanovic, Shie Mannor, Marco Pavone
To enable the joint optimization of AV stacks while retaining modularity, we present DiffStack, a differentiable and modular stack for prediction, planning, and control.
no code implementations • 5 Oct 2022 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor
Then, through a method-of-moments approach, we design a procedure that provably learns a near-optimal policy with $O(\texttt{poly}(A) + \texttt{poly}(M, H)^{\min(M, H)})$ interactions.
no code implementations • 5 Oct 2022 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor
We consider episodic reinforcement learning in reward-mixing Markov decision processes (RMMDPs): at the beginning of every episode nature randomly picks a latent reward model among $M$ candidates and an agent interacts with the MDP throughout the episode for $H$ time steps.
no code implementations • 3 Oct 2022 • Navdeep Kumar, Kaixin Wang, Kfir Levy, Shie Mannor
The policy gradient theorem proves to be a cornerstone in Linear RL due to its elegance and ease of implementability.
no code implementations • 28 Sep 2022 • Gal Dalal, Assaf Hallak, Shie Mannor, Gal Chechik
This allows us to reduce the variance of gradients by three orders of magnitude and to benefit from better sample complexity compared with standard policy gradient.
no code implementations • 19 Jul 2022 • Mohammadi Zaki, Avinash Mohan, Aditya Gopalan, Shie Mannor
For the AC-based approach we provide convergence rate guarantees to a stationary point in the basic AC case and to a global optimum in the NAC case.
1 code implementation • 5 Jul 2022 • Benjamin Fuhrer, Yuval Shpigelman, Chen Tessler, Shie Mannor, Gal Chechik, Eitan Zahavi, Gal Dalal
As communication protocols evolve, datacenter network utilization increases.
no code implementations • 26 Jun 2022 • Shirli Di Castro Shashua, Shie Mannor, Dotan Di-Castro
We provide an analysis of the properties of the sampled process such as stationarity, Markovity and autocorrelation in terms of the properties of the original process.
1 code implementation • 30 May 2022 • Guy Tennenholtz, Nadav Merlis, Lior Shani, Shie Mannor, Uri Shalit, Gal Chechik, Assaf Hallak, Gal Dalal
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
1 code implementation • 28 May 2022 • Navdeep Kumar, Kfir Levy, Kaixin Wang, Shie Mannor
But we don't have a clear understanding to exploit this equivalence, to do policy improvement steps to get the optimal value function or policy.
2 code implementations • 10 May 2022 • Ido Greenberg, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
no code implementations • 18 Apr 2022 • Eli A. Meirom, Haggai Maron, Shie Mannor, Gal Chechik
Quantum Computing (QC) stands to revolutionize computing, but is currently still limited.
no code implementations • 12 Mar 2022 • Binyamin Perets, Mark Kozdoba, Shie Mannor
However, standard HMM learning algorithms rely crucially on the assumption that the positions of the missing observations \emph{within the observation sequence} are known.
no code implementations • 2 Feb 2022 • Yuval Atzmon, Eli A. Meirom, Shie Mannor, Gal Chechik
Reasoning and interacting with dynamic environments is a fundamental problem in AI, but it becomes extremely challenging when actions can trigger cascades of cross-dependent events.
1 code implementation • 31 Jan 2022 • Stav Belogolovsky, Ido Greenberg, Danny Eitan, Shie Mannor
Neural differential equations predict the derivative of a stochastic process.
no code implementations • 30 Jan 2022 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor
This parallelization gain is fundamentally altered by the presence of adversarial users: unless there are super-polynomial number of users, we show a lower bound of $\tilde{\Omega}(\min(S, A) \cdot \alpha^2 / \epsilon^2)$ {\it per-user} interactions to learn an $\epsilon$-optimal policy for the good users.
no code implementations • 30 Jan 2022 • Kaixin Wang, Navdeep Kumar, Kuangqi Zhou, Bryan Hooi, Jiashi Feng, Shie Mannor
The key of this perspective is to decompose the value space, in a state-wise manner, into unions of hypersurfaces.
no code implementations • 28 Jan 2022 • Aviv Rosenberg, Assaf Hallak, Shie Mannor, Gal Chechik, Gal Dalal
Some of the most powerful reinforcement learning frameworks use planning for action selection.
no code implementations • ICLR 2022 • Guy Tennenholtz, Assaf Hallak, Gal Dalal, Shie Mannor, Gal Chechik, Uri Shalit
We analyze the limitations of learning from such data with and without external reward, and propose an adjustment of standard imitation learning algorithms to fit this setup.
no code implementations • NeurIPS 2021 • Esther Derman, Matthieu Geist, Shie Mannor
We finally generalize regularized MDPs to twice regularized MDPs (R${}^2$ MDPs), i. e., MDPs with $\textit{both}$ value and policy regularization.
no code implementations • 12 Oct 2021 • Nadav Merlis, Yonathan Efroni, Shie Mannor
We consider a stochastic multi-armed bandit setting where reward must be actively queried for it to be observed.
no code implementations • NeurIPS 2021 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor
We study the problem of learning a near optimal policy for two reward-mixing MDPs.
1 code implementation • 5 Oct 2021 • Michael Lutter, Boris Belousov, Shie Mannor, Dieter Fox, Animesh Garg, Jan Peters
Especially for continuous control, solving this differential equation and its extension the Hamilton-Jacobi-Isaacs equation, is important as it yields the optimal policy that achieves the maximum reward on a give task.
no code implementations • NeurIPS 2021 • Shirli Di Castro Shashua, Dotan Di Castro, Shie Mannor
Simulation is used extensively in autonomous systems, particularly in robotic manipulation.
no code implementations • 29 Sep 2021 • Ido Greenberg, Shie Mannor, Netanel Yannay
Determining the noise parameters of a Kalman Filter (KF) has been studied for decades.
no code implementations • 29 Sep 2021 • Mark Kozdoba, Shie Mannor
Specifically, we discover and analyze two regimes of behavior of the networks, which are roughly related to the sparsity of the last layer.
no code implementations • 22 Sep 2021 • Roy Zohar, Shie Mannor, Guy Tennenholtz
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.
Multi-agent Reinforcement Learning reinforcement-learning +1
1 code implementation • NeurIPS 2021 • Assaf Hallak, Gal Dalal, Steven Dalton, Iuri Frosio, Shie Mannor, Gal Chechik
We first discover and analyze a counter-intuitive phenomenon: action selection through TS and a pre-trained value function often leads to lower performance compared to the original pre-trained agent, even when having access to the exact state and reward in future steps.
1 code implementation • 25 May 2021 • Michael Lutter, Shie Mannor, Jan Peters, Dieter Fox, Animesh Garg
The adversarial perturbations encourage a optimal policy that is robust to changes in the dynamics.
1 code implementation • 10 May 2021 • Michael Lutter, Shie Mannor, Jan Peters, Dieter Fox, Animesh Garg
This algorithm enables dynamic programming for continuous states and actions with a known dynamics model.
no code implementations • 1 May 2021 • Mohammani Zaki, Avi Mohan, Aditya Gopalan, Shie Mannor
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay.
1 code implementation • 6 Apr 2021 • Ido Greenberg, Shie Mannor, Netanel Yannay
The Kalman Filter (KF) parameters are traditionally determined by noise estimation, since under the KF assumptions, the state prediction errors are minimized when the parameters correspond to the noise covariance.
no code implementations • 18 Mar 2021 • Nir Baram, Guy Tennenholtz, Shie Mannor
However, using mixture policies in the Maximum Entropy (MaxEnt) framework is not straightforward.
no code implementations • 22 Feb 2021 • Guy Tennenholtz, Shie Mannor
In this work, we combine parametric and nonparametric methods for uncertainty estimation through a novel latent space based metric.
no code implementations • 22 Feb 2021 • Nir Baram, Guy Tennenholtz, Shie Mannor
Maximum Entropy (MaxEnt) reinforcement learning is a powerful learning paradigm which seeks to maximize return under entropy regularization.
no code implementations • 18 Feb 2021 • Chen Tessler, Yuval Shpigelman, Gal Dalal, Amit Mandelbaum, Doron Haritan Kazakov, Benjamin Fuhrer, Gal Chechik, Shie Mannor
We approach the task of network congestion control in datacenters using Reinforcement Learning (RL).
no code implementations • 16 Feb 2021 • Mohammadi Zaki, Avinash Mohan, Aditya Gopalan, Shie Mannor
We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process, and wishes to combine them optimally to produce a potentially new controller that can outperform each of the base ones.
no code implementations • 13 Feb 2021 • Lior Shani, Tom Zahavy, Shie Mannor
Finally, we implement a deep variant of our algorithm which shares some similarities to GAIL \cite{ho2016generative}, but where the discriminator is replaced with the costs learned by the OAL problem.
no code implementations • NeurIPS 2021 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor
In this work, we consider the regret minimization problem for reinforcement learning in latent Markov Decision Processes (LMDP).
no code implementations • 7 Feb 2021 • Mark Kozdoba, Shie Mannor
In this work we study generalization guarantees for the metric learning problem, where the metric is induced by a neural network type embedding of the data.
3 code implementations • 7 Feb 2021 • Ofir Nabati, Tom Zahavy, Shie Mannor
To alleviate this, we propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.
no code implementations • 5 Feb 2021 • Yonathan Efroni, Nadav Merlis, Aadirupa Saha, Shie Mannor
We analyze the performance of CBM based algorithms in different settings and show that they perform well in the presence of adversity in the contexts, initial states, and budgets.
2 code implementations • ICLR 2021 • Esther Derman, Gal Dalal, Shie Mannor
We introduce a framework for learning and planning in MDPs where the decision-maker commits actions that are executed with a delay of $m$ steps.
no code implementations • 1 Jan 2021 • Bingyi Kang, Shie Mannor, Jiashi Feng
Reinforcement Learning (RL) with safety guarantee is critical for agents performing tasks in risky environments.
no code implementations • 1 Jan 2021 • Tom Zahavy, Ofir Nabati, Leor Cohen, Shie Mannor
We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.
no code implementations • 8 Dec 2020 • Ahmet Inci, Evgeny Bolotin, Yaosheng Fu, Gal Dalal, Shie Mannor, David Nellans, Diana Marculescu
With deep reinforcement learning (RL) methods achieving results that exceed human capabilities in games, robotics, and simulated environments, continued scaling of RL training is crucial to its deployment in solving complex real-world problems.
1 code implementation • 22 Oct 2020 • Ido Greenberg, Shie Mannor
In many RL applications, once training ends, it is vital to detect any deterioration in the agent performance as soon as possible.
no code implementations • 11 Oct 2020 • Eli A. Meirom, Haggai Maron, Shie Mannor, Gal Chechik
We consider the problem of controlling a partially-observed dynamic process on a graph by a limited number of interventions.
no code implementations • 28 Sep 2020 • Ido Greenberg, Shie Mannor
The statistical power of the new testing procedure is shown to outperform alternative tests - often by orders of magnitude - for a variety of environment modifications (which cause deterioration in agent performance).
no code implementations • 13 Aug 2020 • Yonathan Efroni, Nadav Merlis, Shie Mannor
The standard feedback model of reinforcement learning requires revealing the reward of every visited state-action pair.
1 code implementation • 10 Aug 2020 • Nadav Merlis, Shie Mannor
Importantly, we show that when the mean of the optimal arm is high enough, the lenient regret of $\epsilon$-TS is bounded by a constant.
no code implementations • ICLR 2021 • Shauharda Khadka, Estelle Aflalo, Mattias Marder, Avrech Ben-David, Santiago Miret, Shie Mannor, Tamir Hazan, Hanlin Tang, Somdeb Majumdar
For deep neural network accelerators, memory movement is both energetically expensive and can bound computation.
no code implementations • 11 Jun 2020 • Guy Tennenholtz, Uri Shalit, Shie Mannor, Yonathan Efroni
We construct a linear bandit algorithm that takes advantage of the projected information, and prove regret bounds.
no code implementations • 5 Mar 2020 • Esther Derman, Shie Mannor
Distributionally Robust Optimization (DRO) has enabled to prove the equivalence between robustness and regularization in classification and regression, thus providing an analytical reason why regularization generalizes well in statistical learning.
no code implementations • 4 Mar 2020 • Yonathan Efroni, Shie Mannor, Matteo Pirotta
In many sequential decision-making problems, the goal is to optimize a utility function while satisfying a set of constraints on different utilities.
1 code implementation • 23 Feb 2020 • Daniel Teitelman, Itay Naeh, Shie Mannor
This paper makes a substantial step towards cloning the functionality of black-box models by introducing a Machine learning (ML) architecture named Deep Neural Trees (DNTs).
no code implementations • ICML 2020 • Yonathan Efroni, Lior Shani, Aviv Rosenberg, Shie Mannor
To the best of our knowledge, the two results are the first sub-linear regret bounds obtained for policy optimization algorithms with unknown transitions and bandit feedback.
1 code implementation • 17 Feb 2020 • Shirli Di-Castro Shashua, Shie Mannor
These frameworks can learn uncertainties over the value parameters and exploit them for policy exploration.
no code implementations • 13 Feb 2020 • Nadav Merlis, Shie Mannor
The Combinatorial Multi-Armed Bandit problem is a sequential decision-making problem in which an agent selects a set of arms on each round, observes feedback for each of these arms and aims to maximize a known reward function of the arms it chose.
1 code implementation • CVPR 2021 • Roi Pony, Itay Naeh, Shie Mannor
In this work we present a manipulation scheme for fooling video classifiers by introducing a flickering temporal perturbation that in some cases may be unnoticeable by human observers and is implementable in the real world.
no code implementations • 9 Feb 2020 • Chen Tessler, Shie Mannor
In reinforcement learning, the discount factor $\gamma$ controls the agent's effective planning horizon.
no code implementations • 2 Oct 2019 • Pranav Khanna, Guy Tennenholtz, Nadav Merlis, Shie Mannor, Chen Tessler
In recent years, there has been significant progress in applying deep reinforcement learning (RL) for solving challenging problems across a wide variety of domains.
no code implementations • 2 Oct 2019 • Erez Schwartz, Guy Tennenholtz, Chen Tessler, Shie Mannor
Recent advances in reinforcement learning have shown its potential to tackle complex real-life tasks.
no code implementations • 25 Sep 2019 • Nir Baram, Shie Mannor
Model-based imitation learning methods require full knowledge of the transition kernel for policy evaluation.
1 code implementation • 25 Sep 2019 • Tom Zahavy, Shie Mannor
We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.
no code implementations • 25 Sep 2019 • Philip Korsunsky, Stav Belogolovsky, Tom Zahavy, Chen Tessler, Shie Mannor
In this setting, the reward, which is unknown to the agent, is a function of a static parameter referred to as the context.
no code implementations • 25 Sep 2019 • Chen Tessler, Nadav Merlis, Shie Mannor
In recent years, advances in deep learning have enabled the application of reinforcement learning algorithms in complex domains.
no code implementations • NeurIPS 2020 • Yonathan Efroni, Mohammad Ghavamzadeh, Shie Mannor
This is the first work that proves improved sample complexity as a result of {\em increasing} the lookahead horizon in online planning.
no code implementations • 9 Sep 2019 • Guy Tennenholtz, Shie Mannor, Uri Shalit
This work studies the problem of batch off-policy evaluation for Reinforcement Learning in partially observable environments.
no code implementations • 6 Sep 2019 • Lior Shani, Yonathan Efroni, Shie Mannor
Trust region policy optimization (TRPO) is a popular and empirically successful policy search algorithm in Reinforcement Learning (RL) in which a surrogate problem, that restricts consecutive policies to be 'close' to one another, is iteratively solved.
no code implementations • 22 Aug 2019 • Dotan Di Castro, Joel Oren, Shie Mannor
Practical application of Reinforcement Learning (RL) often involves risk considerations.
no code implementations • 13 Jun 2019 • Mark Kozdoba, Edward Moroshko, Shie Mannor, Koby Crammer
The proposed bounds depend on the shape of a certain spectrum related to the system operator, and thus provide the first known explicit geometric parameter of the data that can be used to bound estimation errors.
1 code implementation • ICML 2020 • Dan Fisher, Mark Kozdoba, Shie Mannor
FDMs model second moment under general generative assumptions on the data.
1 code implementation • NeurIPS 2019 • Yonathan Efroni, Nadav Merlis, Mohammad Ghavamzadeh, Shie Mannor
In this paper, we focus on model-based RL in the finite-state finite-horizon MDP setting and establish that exploring with \emph{greedy policies} -- act by \emph{1-step planning} -- can achieve tight minimax performance in terms of regret, $\tilde{\mathcal{O}}(\sqrt{HSAT})$.
Model-based Reinforcement Learning reinforcement-learning +2
2 code implementations • 23 May 2019 • Stav Belogolovsky, Philip Korsunsky, Shie Mannor, Chen Tessler, Tom Zahavy
Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts).
no code implementations • 23 May 2019 • Chen Tessler, Tom Zahavy, Deborah Cohen, Daniel J. Mankowitz, Shie Mannor
We propose a computationally efficient algorithm that combines compressed sensing with imitation learning to solve text-based games with combinatorial action spaces.
3 code implementations • NeurIPS 2019 • Chen Tessler, Guy Tennenholtz, Shie Mannor
We show that optimizing over such sets results in local movement in the action space and thus convergence to sub-optimal solutions.
no code implementations • 20 May 2019 • Esther Derman, Daniel Mankowitz, Timothy Mann, Shie Mannor
Robust Markov Decision Processes (RMDPs) intend to ensure robustness with respect to changing or adversarial system behavior.
no code implementations • 8 May 2019 • Nadav Merlis, Shie Mannor
We show that a linear dependence of the regret in the batch size in existing algorithms can be replaced by this smoothness parameter.
no code implementations • 6 May 2019 • Shreyansh Gandhi, Samrat Kokkula, Abon Chaudhuri, Alessandro Magnani, Theban Stanley, Behzad Ahmadi, Venkatesh Kandaswamy, Omer Ovenc, Shie Mannor
In this paper, we present a computer vision driven offensive and non-compliant image detection system for extremely large image datasets.
no code implementations • 12 Feb 2019 • Xavier Fontaine, Shie Mannor, Vianney Perchet
This can be recast as a specific stochastic optimization problem where the objective is to maximize the cumulative reward, or equivalently to minimize the regret.
1 code implementation • 4 Feb 2019 • Guy Tennenholtz, Shie Mannor
We introduce Act2Vec, a general framework for learning context-based action representation for Reinforcement Learning.
no code implementations • NeurIPS 2019 • Chao Qu, Shie Mannor, Huan Xu, Yuan Qi, Le Song, Junwu Xiong
To the best of our knowledge, it is the first MARL algorithm with convergence guarantee in the control, off-policy and non-linear function approximation setting.
Multi-agent Reinforcement Learning reinforcement-learning +2
2 code implementations • 26 Jan 2019 • Chen Tessler, Yonathan Efroni, Shie Mannor
In this work we formalize two new criteria of robustness to action uncertainty.
no code implementations • 24 Jan 2019 • Tom Zahavy, Shie Mannor
We study the neural-linear bandit model for solving sequential decision-making problems with high dimensional side information.
no code implementations • 23 Jan 2019 • Shirli Di-Castro Shashua, Shie Mannor
However, this approach ignores certain distributional properties of both the errors and value parameters.
no code implementations • 17 Dec 2018 • Mark Kozdoba, Edward Moroshko, Lior Shani, Takuya Takagi, Takashi Katoh, Shie Mannor, Koby Crammer
In the context of Multi Instance Learning, we analyze the Single Instance (SI) learning objective.
1 code implementation • 13 Dec 2018 • Lior Shani, Yonathan Efroni, Shie Mannor
We continue and analyze properties of exploration-conscious optimal policies and characterize two general approaches to solve such criteria.
no code implementations • NeurIPS 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor
Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control.
1 code implementation • AAAI 2019 • Mark Kozdoba, Jakub Marecek, Tigran Tchrakian, Shie Mannor
Based on this insight, we devise an on-line algorithm for improper learning of a linear dynamical system (LDS), which considers only a few most recent observations.
no code implementations • 16 Sep 2018 • Nir Baram, Shie Mannor
We denote this setup as \textit{Inspiration Learning} - knowledge transfer between agents that operate in different action spaces.
no code implementations • 6 Sep 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor
Finite-horizon lookahead policies are abundantly used in Reinforcement Learning and demonstrate impressive empirical success.
no code implementations • NeurIPS 2018 • Tom Zahavy, Matan Haroush, Nadav Merlis, Daniel J. Mankowitz, Shie Mannor
Learning how to act when there are many available actions in each state is a challenging task for Reinforcement Learning (RL) agents, especially when many of the actions are redundant or irrelevant.
no code implementations • 14 Aug 2018 • Orly Avner, Shie Mannor
Communication networks shared by many users are a widespread challenge nowadays.
no code implementations • ICML 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor
The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation.
no code implementations • 4 Jun 2018 • Asaf Cassel, Shie Mannor, Assaf Zeevi
Unlike the case of cumulative criteria, in the problems we study here the oracle policy, that knows the problem parameters a priori and is used to "center" the regret, is not trivial.
1 code implementation • ICLR 2019 • Chen Tessler, Daniel J. Mankowitz, Shie Mannor
Solving tasks in Reinforcement Learning is no easy feat.
no code implementations • 21 May 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor
Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control.
no code implementations • 20 May 2018 • Chao Qu, Shie Mannor, Huan Xu
We devise a distributional variant of gradient temporal-difference (TD) learning.
Distributional Reinforcement Learning Reinforcement Learning
no code implementations • 11 Apr 2018 • Mark Kozdoba, Shie Mannor
Gibbs sampling, as a model learning method, is known to produce the most accurate results available in a variety of domains, and is a de facto standard in these domains.
no code implementations • 15 Mar 2018 • Tom Zahavy, Alex Dikopoltsev, Oren Cohen, Shie Mannor, Mordechai Segev
Ultra-short laser pulses with femtosecond to attosecond pulse duration are the shortest systematic events humans can create.
no code implementations • 11 Mar 2018 • Esther Derman, Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor
It learns an optimal policy with respect to a distribution over an uncertainty set and stays robust to model uncertainty but avoids the conservativeness of robust strategies.
no code implementations • 16 Feb 2018 • Guy Tennenholtz, Tom Zahavy, Shie Mannor
We define the notion of on-average-validation-stable algorithms as one in which using small portions of validation data for training does not overfit the model selection process.
no code implementations • 10 Feb 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor
The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation.
no code implementations • 9 Feb 2018 • Daniel J. Mankowitz, Timothy A. Mann, Pierre-Luc Bacon, Doina Precup, Shie Mannor
We present a Robust Options Policy Iteration (ROPI) algorithm with convergence guarantees, which learns options that are robust to model uncertainty.
no code implementations • 22 Nov 2017 • Guy Tennenholtz, Constantine Caramanis, Shie Mannor
We devise a simple policy that only vaccinates neighbors of infected nodes and is optimal on regular trees and on general graphs for a sufficiently large budget.
no code implementations • 20 Nov 2017 • Daniel J. Mankowitz, Aviv Tamar, Shie Mannor
We learn reusable options in different scenarios in a RoboCup soccer domain (i. e., winning/losing).
no code implementations • ICML 2017 • Nir Baram, Oron Anschel, Itai Caspi, Shie Mannor
Generative Adversarial Networks (GANs) have been successfully applied to the problem of policy imitation in a model-free setup.
no code implementations • ICML 2017 • Robert Busa-Fekete, Balazs Szorenyi, Paul Weng, Shie Mannor
We study the multi-armed bandit (MAB) problem where the agent receives a vectorial feedback that encodes many possibly competing objectives to be optimized.
no code implementations • NeurIPS 2017 • Nir Levine, Tom Zahavy, Daniel J. Mankowitz, Aviv Tamar, Shie Mannor
In this work we propose a hybrid approach -- the Least Squares Deep Q-Network (LS-DQN), which combines rich feature representations learned by a DRL algorithm with the stability of a linear least squares method.
no code implementations • 4 Apr 2017 • Gal Dalal, Balázs Szörényi, Gugan Thoppe, Shie Mannor
TD(0) is one of the most commonly used algorithms in reinforcement learning.
no code implementations • 15 Mar 2017 • Gal Dalal, Balazs Szorenyi, Gugan Thoppe, Shie Mannor
Using this, we provide a concentration bound, which is the first such result for a two-timescale SA.
no code implementations • 7 Mar 2017 • Shirli Di-Castro Shashua, Shie Mannor
The Deep-RoK algorithm is a robust Bayesian method, based on the Extended Kalman Filter (EKF), that accounts for both the uncertainty in the weights of the approximated value function and the uncertainty in the transition probabilities, improving the robustness of the agent.
no code implementations • 25 Feb 2017 • Alon Cohen, Shie Mannor
We study the problem of prediction with expert advice when the number of experts in question may be extremely large or even infinite.
no code implementations • ICML 2017 • Assaf Hallak, Shie Mannor
The problem of on-line off-policy evaluation (OPE) has been actively studied in the last decade due to its importance both as a stand-alone problem and as a module in a policy improvement scheme.
no code implementations • NeurIPS 2017 • Nir Levine, Koby Crammer, Shie Mannor
In the classical MAB problem, a decision maker must choose an arm at each time step, upon which she receives a reward.
no code implementations • 1 Jan 2017 • Jiashi Feng, Huan Xu, Shie Mannor
We consider the problem of learning from noisy data in practical settings where the size of data is too large to store on a single machine.
no code implementations • 30 Dec 2016 • Timothy A. Mann, Hugo Penedones, Shie Mannor, Todd Hester
Temporal Difference learning or TD($\lambda$) is a fundamental algorithm in the field of reinforcement learning.
no code implementations • 20 Dec 2016 • Raphael Canyasse, Gal Dalal, Shie Mannor
In this work we design and compare different supervised learning algorithms to compute the cost of Alternating Current Optimal Power Flow (ACOPF).
no code implementations • 7 Dec 2016 • Nir Baram, Oron Anschel, Shie Mannor
A model-based approach for the problem of adversarial imitation learning.
no code implementations • NeurIPS 2016 • Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor
We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that (1) learns skills (i. e., temporally extended actions or options) as well as (2) where to apply them.
no code implementations • 30 Nov 2016 • Gal Dalal, Elad Gilboa, Shie Mannor, Louis Wehenkel
We devise the Unit Commitment Nearest Neighbor (UCNN) algorithm to be used as a proxy for quickly approximating outcomes of short-term decisions, to make tractable hierarchical long-term assessment and planning for large power systems.
no code implementations • 29 Nov 2016 • Tom Zahavy, Alessandro Magnani, Abhinandan Krishnan, Shie Mannor
Classifying products into categories precisely and efficiently is a major challenge in modern e-commerce.
no code implementations • 10 Oct 2016 • Daniel J. Mankowitz, Aviv Tamar, Shie Mannor
In addition, the learned risk aware skills are able to mitigate reward-based model misspecification.
no code implementations • 8 Oct 2016 • Vineet Abhishek, Shie Mannor
The proposed test does not require knowledge of the underlying probability distribution generating the data.
no code implementations • 14 Sep 2016 • Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, Aviv Tamar
The objective of the paper is to provide a comprehensive survey on Bayesian RL algorithms and their theoretical and empirical properties.
no code implementations • 10 Jul 2016 • Oran Richman, Shie Mannor
We study classification problems where features are corrupted by noise and where the magnitude of the noise in each feature is influenced by the resources allocated to its acquisition.
no code implementations • 22 Jun 2016 • Nir Ben Zrihem, Tom Zahavy, Shie Mannor
Deep Reinforcement Learning (DRL) is a trending field of research, showing great promise in many challenging problems such as playing Atari, solving Go and controlling robots.
no code implementations • 16 Jun 2016 • Nir Baram, Tom Zahavy, Shie Mannor
Deep Reinforcement Learning (DRL) is a trending field of research, showing great promise in challenging problems such as playing Atari, solving Go and controlling robots.
no code implementations • 21 May 2016 • Oran Richman, Shie Mannor
Features that hold information about the "difficulty" of the data may be non-discriminative and are therefore disregarded in the classification process.
no code implementations • 13 May 2016 • Irit Hochberg, Guy Feraru, Mark Kozdoba, Shie Mannor, Moshe Tennenholtz, Elad Yom-Tov
Messages were personalized through a Reinforcement Learning (RL) algorithm which optimized messages to improve each participant's compliance with the activity regimen.
no code implementations • 9 May 2016 • Mark Kozdoba, Shie Mannor
Suppose that we are given a time series where consecutive samples are believed to come from a probabilistic source, that the source changes from time to time and that the total number of sources is fixed.
no code implementations • 25 Apr 2016 • Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J. Mankowitz, Shie Mannor
Skill distillation enables the HDRLN to efficiently retain knowledge and therefore scale in lifelong learning, by accumulating knowledge and encapsulating multiple reusable skills into a single distilled network.
no code implementations • 6 Mar 2016 • Gal Dalal, Elad Gilboa, Shie Mannor
The power grid is a complex and vital system that necessitates careful reliability management.
no code implementations • 10 Feb 2016 • Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor
We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that (1) learns skills (i. e., temporally extended actions or options) as well as (2) where to apply them.
no code implementations • 10 Feb 2016 • Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor
For complex, high-dimensional Markov Decision Processes (MDPs), it may be necessary to represent the policy with function approximation.
no code implementations • 8 Feb 2016 • Tom Zahavy, Nir Ben Zrihem, Shie Mannor
In recent years there is a growing interest in using deep representations for reinforcement learning.
no code implementations • ICLR 2018 • Tom Zahavy, Bingyi Kang, Alex Sivak, Jiashi Feng, Huan Xu, Shie Mannor
As most deep learning algorithms are stochastic (e. g., Stochastic Gradient Descent, Dropout, and Bayes-by-backprop), we revisit the robustness arguments of Xu & Mannor, and introduce a new approach, ensemble robustness, that concerns the robustness of a population of hypotheses.
no code implementations • NeurIPS 2015 • Mark Kozdoba, Shie Mannor
We present a new algorithm for community detection.
no code implementations • NeurIPS 2015 • Oren Anava, Elad Hazan, Shie Mannor
In this work we extend the notion of learning with memory to the general Online Convex Optimization (OCO) framework, and present two algorithms that attain low regret.
2 code implementations • 4 Nov 2015 • Noam Segev, Maayan Harel, Shie Mannor, Koby Crammer, Ran El-Yaniv
We propose novel model transfer-learning methods that refine a decision forest model M learned within a "source" domain using a training set sampled from a "target" domain, assumed to be a variation of the source.
no code implementations • 17 Sep 2015 • Assaf Hallak, Aviv Tamar, Remi Munos, Shie Mannor
We consider the off-policy evaluation problem in Markov decision processes with function approximation.
no code implementations • 14 Aug 2015 • Assaf Hallak, Aviv Tamar, Shie Mannor
Recently, \citet{SuttonMW15} introduced the emphatic temporal differences (ETD) algorithm for off-policy evaluation in Markov decision processes.
no code implementations • 19 Jul 2015 • Gal Dalal, Shie Mannor
In this work we solve the day-ahead unit commitment (UC) problem, by formulating it as a Markov decision process (MDP) and finding a low-cost policy for generation scheduling.
no code implementations • 11 Jun 2015 • Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor
The monolithic approach to policy representation in Markov Decision Processes (MDPs) looks for a single policy that can be represented as a function from states to actions.
no code implementations • NeurIPS 2015 • Yin-Lam Chow, Aviv Tamar, Shie Mannor, Marco Pavone
Our first contribution is to show that a CVaR objective, besides capturing risk sensitivity, has an alternative interpretation as expected cost under worst-case modeling errors, for a given error budget.
no code implementations • 30 Apr 2015 • Orly Avner, Shie Mannor
Inspired by cognitive radio networks, we consider a setting where multiple users share several channels modeled as a multi-user multi-armed bandit (MAB) problem.
no code implementations • 26 Apr 2015 • Mark Kozdoba, Shie Mannor
We present a new algorithm for community detection.
no code implementations • 26 Apr 2015 • Mark Kozdoba, Shie Mannor
We present a new online algorithm for detecting overlapping communities.
no code implementations • 16 Apr 2015 • Nir Levine, Timothy A. Mann, Shie Mannor
Twitter, a popular social network, presents great opportunities for on-line machine learning research.
no code implementations • NeurIPS 2015 • Aviv Tamar, Yin-Lam Chow, Mohammad Ghavamzadeh, Shie Mannor
For static risk measures, our approach is in the spirit of policy gradient algorithms and combines a standard sampling approach with convex programming.
no code implementations • 11 Feb 2015 • Assaf Hallak, François Schnitzler, Timothy Mann, Shie Mannor
Off-policy learning in dynamic decision problems is essential for providing strong evidence that a new policy is better than the one in use.
no code implementations • 8 Feb 2015 • Assaf Hallak, Dotan Di Castro, Shie Mannor
The objective is to learn a strategy that maximizes the accumulated reward across all contexts.
no code implementations • 21 Dec 2014 • Aviv Tamar, Panos Toulis, Shie Mannor, Edoardo M. Airoldi
In reinforcement learning, the TD($\lambda$) algorithm is a fundamental policy evaluation method with an efficient online implementation that is suitable for large-scale problems.
no code implementations • NeurIPS 2014 • Odalric-Ambrym Maillard, Timothy A. Mann, Shie Mannor
In Reinforcement Learning (RL), state-of-the-art algorithms require a large number of samples per state-action pair to estimate the transition kernel $p$.
no code implementations • NeurIPS 2014 • Jiashi Feng, Huan Xu, Shie Mannor, Shuicheng Yan
We consider logistic regression with arbitrary outliers in the covariate matrix.
no code implementations • 30 Sep 2014 • Noga Alon, Nicolò Cesa-Bianchi, Claudio Gentile, Shie Mannor, Yishay Mansour, Ohad Shamir
This naturally models several situations where the losses of different actions are related, and knowing the loss of one action provides information on the loss of other actions.