Search Results for author: Shie Mannor

Found 198 papers, 32 papers with code

Tree Search-Based Policy Optimization under Stochastic Execution Delay

1 code implementation • 8 Apr 2024 • David Valensi, Esther Derman, Shie Mannor, Gal Dalal

We show that given observed delay values, it is sufficient to perform a policy search in the class of Markov policies in order to reach optimal performance, thus extending the deterministic fixed delay case.

Paper
Code

On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes

no code implementations • 11 Mar 2024 • Navdeep Kumar, Yashaswini Murthy, Itai Shufaro, Kfir Y. Levy, R. Srikant, Shie Mannor

We present the first finite time global convergence analysis of policy gradient in the context of infinite horizon average reward Markov decision processes (MDPs).

Paper
Add Code

Conservative DDPG -- Pessimistic RL without Ensemble

no code implementations • 8 Mar 2024 • Nitsan Soffair, Shie Mannor

DDPG is hindered by the overestimation bias problem, wherein its $Q$-estimates tend to overstate the actual $Q$-values.

Paper
Add Code

Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization

no code implementations • 15 Feb 2024 • Yihan Du, Anna Winnicki, Gal Dalal, Shie Mannor, R. Srikant

In PO-RLHF, knowledge of the reward function is not assumed and the algorithm relies on trajectory-based comparison feedback to infer the reward function.

Paper
Add Code

Improving Token-Based World Models with Parallel Observation Prediction

1 code implementation • 8 Feb 2024 • Lior Cohen, Kaixin Wang, Bingyi Kang, Shie Mannor

We incorporate POP in a novel TBWM agent named REM (Retentive Environment Model), showcasing a 15. 4x faster imagination compared to prior TBWMs.

Paper
Code

MinMaxMin $Q$-learning

no code implementations • 3 Feb 2024 • Nitsan Soffair, Shie Mannor

MinMaxMin $Q$-learning is a novel optimistic Actor-Critic algorithm that addresses the problem of overestimation bias ($Q$-estimations are overestimating the real $Q$-values) inherent in conservative RL algorithms.

Q-Learning

Paper
Add Code

SQT -- std $Q$-target

no code implementations • 3 Feb 2024 • Nitsan Soffair, Dotan Di-Castro, Orly Avner, Shie Mannor

We implement SQT on top of TD3/TD7 code and test it against the state-of-the-art (SOTA) actor-critic algorithms, DDPG, TD3 and TD7 on seven popular MuJoCo and Bullet tasks.

Q-Learning

Paper
Add Code

Prospective Side Information for Latent MDPs

no code implementations • 11 Oct 2023 • Jeongyeol Kwon, Yonathan Efroni, Shie Mannor, Constantine Caramanis

In such an environment, the latent information remains fixed throughout each episode, since the identity of the user does not change during an interaction.

Decision Making

Paper
Add Code

Solving Non-Rectangular Reward-Robust MDPs via Frequency Regularization

no code implementations • 3 Sep 2023 • Uri Gadot, Esther Derman, Navdeep Kumar, Maxence Mohamed Elfatihi, Kfir Levy, Shie Mannor

In robust Markov decision processes (RMDPs), it is assumed that the reward and the transition dynamics lie in a given uncertainty set.

Paper
Add Code

Sobolev Space Regularised Pre Density Models

no code implementations • 25 Jul 2023 • Mark Kozdoba, Binyamin Perets, Shie Mannor

We propose a new approach to non-parametric density estimation that is based on regularizing a Sobolev norm of the density.

Anomaly Detection Density Estimation +1

Paper
Add Code

Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating The Worst Kernel

no code implementations • 9 Jun 2023 • Kaixin Wang, Uri Gadot, Navdeep Kumar, Kfir Levy, Shie Mannor

Robust Markov Decision Processes (RMDPs) provide a framework for sequential decision-making that is robust to perturbations on the transition kernel.

Decision Making reinforcement-learning +1

Paper
Add Code

Representation-Driven Reinforcement Learning

no code implementations • 31 May 2023 • Ofir Nabati, Guy Tennenholtz, Shie Mannor

We present a representation-driven framework for reinforcement learning.

Multi-Armed Bandits reinforcement-learning

Paper
Add Code

CALM: Conditional Adversarial Latent Models for Directable Virtual Characters

no code implementations • 2 May 2023 • Chen Tessler, Yoni Kasten, Yunrong Guo, Shie Mannor, Gal Chechik, Xue Bin Peng

In this work, we present Conditional Adversarial Latent Models (CALM), an approach for generating diverse and directable behaviors for user-controlled interactive virtual characters.

Imitation Learning

Paper
Add Code

Twice Regularized Markov Decision Processes: The Equivalence between Robustness and Regularization

1 code implementation • 12 Mar 2023 • Esther Derman, Yevgeniy Men, Matthieu Geist, Shie Mannor

We then generalize regularized MDPs to twice regularized MDPs ($\text{R}^2$ MDPs), i. e., MDPs with $\textit{both}$ value and policy regularization.

Paper
Code

Policy Gradient for Rectangular Robust Markov Decision Processes

no code implementations • NeurIPS 2023 • Navdeep Kumar, Esther Derman, Matthieu Geist, Kfir Levy, Shie Mannor

We provide a closed-form expression for the worst occupation measure.

Policy Gradient Methods

Paper
Add Code

An Efficient Solution to s-Rectangular Robust Markov Decision Processes

no code implementations • 31 Jan 2023 • Navdeep Kumar, Kfir Levy, Kaixin Wang, Shie Mannor

We present an efficient robust value iteration for \texttt{s}-rectangular robust Markov Decision Processes (MDPs) with a time complexity comparable to standard (non-robust) MDPs which is significantly faster than any existing method.

LEMMA

Paper
Add Code

SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search

no code implementations • 30 Jan 2023 • Gal Dalal, Assaf Hallak, Gugan Thoppe, Shie Mannor, Gal Chechik

We prove that the resulting variance decays exponentially with the planning horizon as a function of the expansion policy.

Policy Gradient Methods

Paper
Add Code

Towards Deployable RL - What's Broken with RL Research and a Potential Fix

no code implementations • 3 Jan 2023 • Shie Mannor, Aviv Tamar

Reinforcement learning (RL) has demonstrated great potential, but is currently full of overhyping and pipe dreams.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

DiffStack: A Differentiable and Modular Control Stack for Autonomous Vehicles

no code implementations • 13 Dec 2022 • Peter Karkus, Boris Ivanovic, Shie Mannor, Marco Pavone

To enable the joint optimization of AV stacks while retaining modularity, we present DiffStack, a differentiable and modular stack for prediction, planning, and control.

Autonomous Vehicles

Paper
Add Code

Reward-Mixing MDPs with a Few Latent Contexts are Learnable

no code implementations • 5 Oct 2022 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

We consider episodic reinforcement learning in reward-mixing Markov decision processes (RMMDPs): at the beginning of every episode nature randomly picks a latent reward model among $M$ candidates and an agent interacts with the MDP throughout the episode for $H$ time steps.

Paper
Add Code

Tractable Optimality in Episodic Latent MABs

no code implementations • 5 Oct 2022 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

Then, through a method-of-moments approach, we design a procedure that provably learns a near-optimal policy with $O(\texttt{poly}(A) + \texttt{poly}(M, H)^{\min(M, H)})$ interactions.

Paper
Add Code

Policy Gradient for Reinforcement Learning with General Utilities

no code implementations • 3 Oct 2022 • Navdeep Kumar, Kaixin Wang, Kfir Levy, Shie Mannor

The policy gradient theorem proves to be a cornerstone in Linear RL due to its elegance and ease of implementability.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

SoftTreeMax: Policy Gradient with Tree Search

no code implementations • 28 Sep 2022 • Gal Dalal, Assaf Hallak, Shie Mannor, Gal Chechik

This allows us to reduce the variance of gradients by three orders of magnitude and to benefit from better sample complexity compared with standard policy gradient.

Policy Gradient Methods

Paper
Add Code

Actor-Critic based Improper Reinforcement Learning

no code implementations • 19 Jul 2022 • Mohammadi Zaki, Avinash Mohan, Aditya Gopalan, Shie Mannor

For the AC-based approach we provide convergence rate guarantees to a stationary point in the basic AC case and to a global optimum in the NAC case.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs

1 code implementation • 5 Jul 2022 • Benjamin Fuhrer, Yuval Shpigelman, Chen Tessler, Shie Mannor, Gal Chechik, Eitan Zahavi, Gal Dalal

As communication protocols evolve, datacenter network utilization increases.

Fairness reinforcement-learning +1

Paper
Code

Analysis of Stochastic Processes through Replay Buffers

no code implementations • 26 Jun 2022 • Shirli Di Castro Shashua, Shie Mannor, Dotan Di-Castro

We provide an analysis of the properties of the sampled process such as stationarity, Markovity and autocorrelation in terms of the properties of the original process.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Reinforcement Learning with a Terminator

1 code implementation • 30 May 2022 • Guy Tennenholtz, Nadav Merlis, Lior Shani, Shie Mannor, Uri Shalit, Gal Chechik, Assaf Hallak, Gal Dalal

We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.

Autonomous Driving reinforcement-learning +1

Paper
Code

Efficient Policy Iteration for Robust Markov Decision Processes via Regularization

1 code implementation • 28 May 2022 • Navdeep Kumar, Kfir Levy, Kaixin Wang, Shie Mannor

But we don't have a clear understanding to exploit this equivalence, to do policy improvement steps to get the optimal value function or policy.

Paper
Code

Efficient Risk-Averse Reinforcement Learning

2 code implementations • 10 May 2022 • Ido Greenberg, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor

In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.

Autonomous Driving reinforcement-learning +1

Paper
Code

Optimizing Tensor Network Contraction Using Reinforcement Learning

no code implementations • 18 Apr 2022 • Eli A. Meirom, Haggai Maron, Shie Mannor, Gal Chechik

Quantum Computing (QC) stands to revolutionize computing, but is currently still limited.

Combinatorial Optimization reinforcement-learning +1

Paper
Add Code

Learning Hidden Markov Models When the Locations of Missing Observations are Unknown

no code implementations • 12 Mar 2022 • Binyamin Perets, Mark Kozdoba, Shie Mannor

However, standard HMM learning algorithms rely crucially on the assumption that the positions of the missing observations \emph{within the observation sequence} are known.

Paper
Add Code

Learning to reason about and to act on physical cascading events

no code implementations • 2 Feb 2022 • Yuval Atzmon, Eli A. Meirom, Shie Mannor, Gal Chechik

Reasoning and interacting with dynamic environments is a fundamental problem in AI, but it becomes extremely challenging when actions can trigger cascades of cross-dependent events.

counterfactual

Paper
Add Code

Continuous Forecasting via Neural Eigen Decomposition

1 code implementation • 31 Jan 2022 • Stav Belogolovsky, Ido Greenberg, Danny Eitan, Shie Mannor

Neural differential equations predict the derivative of a stochastic process.

Paper
Code

Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms

no code implementations • 30 Jan 2022 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

This parallelization gain is fundamentally altered by the presence of adversarial users: unless there are super-polynomial number of users, we show a lower bound of $\tilde{\Omega}(\min(S, A) \cdot \alpha^2 / \epsilon^2)$ {\it per-user} interactions to learn an $\epsilon$-optimal policy for the good users.

Collaborative Filtering Multi-Armed Bandits +1

Paper
Add Code

The Geometry of Robust Value Functions

no code implementations • 30 Jan 2022 • Kaixin Wang, Navdeep Kumar, Kuangqi Zhou, Bryan Hooi, Jiashi Feng, Shie Mannor

The key of this perspective is to decompose the value space, in a state-wise manner, into unions of hypersurfaces.

Paper
Add Code

Planning and Learning with Adaptive Lookahead

no code implementations • 28 Jan 2022 • Aviv Rosenberg, Assaf Hallak, Shie Mannor, Gal Chechik, Gal Dalal

Some of the most powerful reinforcement learning frameworks use planning for action selection.

Paper
Add Code

On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning

no code implementations • ICLR 2022 • Guy Tennenholtz, Assaf Hallak, Gal Dalal, Shie Mannor, Gal Chechik, Uri Shalit

We analyze the limitations of learning from such data with and without external reward, and propose an adjustment of standard imitation learning algorithms to fit this setup.

Imitation Learning Recommendation Systems +2

Paper
Add Code

Twice regularized MDPs and the equivalence between robustness and regularization

no code implementations • NeurIPS 2021 • Esther Derman, Matthieu Geist, Shie Mannor

We finally generalize regularized MDPs to twice regularized MDPs (R${}^2$ MDPs), i. e., MDPs with $\textit{both}$ value and policy regularization.

Paper
Add Code

Query-Reward Tradeoffs in Multi-Armed Bandits

no code implementations • 12 Oct 2021 • Nadav Merlis, Yonathan Efroni, Shie Mannor

We consider a stochastic multi-armed bandit setting where reward must be actively queried for it to be observed.

Multi-Armed Bandits

Paper
Add Code

Reinforcement Learning in Reward-Mixing MDPs

no code implementations • NeurIPS 2021 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

We study the problem of learning a near optimal policy for two reward-mixing MDPs.

Efficient Exploration reinforcement-learning +1

Paper
Add Code

Continuous-Time Fitted Value Iteration for Robust Policies

1 code implementation • 5 Oct 2021 • Michael Lutter, Boris Belousov, Shie Mannor, Dieter Fox, Animesh Garg, Jan Peters

Especially for continuous control, solving this differential equation and its extension the Hamilton-Jacobi-Isaacs equation, is important as it yields the optimal policy that achieves the maximum reward on a give task.

Continuous Control

Paper
Code

Sim and Real: Better Together

no code implementations • NeurIPS 2021 • Shirli Di Castro Shashua, Dotan Di Castro, Shie Mannor

Simulation is used extensively in autonomous systems, particularly in robotic manipulation.

Paper
Add Code

Two Regimes of Generalization for Non-Linear Metric Learning

no code implementations • 29 Sep 2021 • Mark Kozdoba, Shie Mannor

Specifically, we discover and analyze two regimes of behavior of the networks, which are roughly related to the sparsity of the last layer.

Generalization Bounds Metric Learning +1

Paper
Add Code

Kalman Filter Is All You Need: Optimization Works When Noise Estimation Fails

no code implementations • 29 Sep 2021 • Ido Greenberg, Shie Mannor, Netanel Yannay

Determining the noise parameters of a Kalman Filter (KF) has been studied for decades.

Noise Estimation

Paper
Add Code

Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning

no code implementations • 22 Sep 2021 • Roy Zohar, Shie Mannor, Guy Tennenholtz

Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction

1 code implementation • NeurIPS 2021 • Assaf Hallak, Gal Dalal, Steven Dalton, Iuri Frosio, Shie Mannor, Gal Chechik

We first discover and analyze a counter-intuitive phenomenon: action selection through TS and a pre-trained value function often leads to lower performance compared to the original pre-trained agent, even when having access to the exact state and reward in future steps.

Atari Games

Paper
Code

Robust Value Iteration for Continuous Control Tasks

1 code implementation • 25 May 2021 • Michael Lutter, Shie Mannor, Jan Peters, Dieter Fox, Animesh Garg

The adversarial perturbations encourage a optimal policy that is robust to changes in the dynamics.

Continuous Control reinforcement-learning +1

Paper
Code

Value Iteration in Continuous Actions, States and Time

1 code implementation • 10 May 2021 • Michael Lutter, Shie Mannor, Jan Peters, Dieter Fox, Animesh Garg

This algorithm enables dynamic programming for continuous states and actions with a known dynamics model.

Paper
Code

Better than the Best: Gradient-based Improper Reinforcement Learning for Network Scheduling

no code implementations • 1 May 2021 • Mohammani Zaki, Avi Mohan, Aditya Gopalan, Shie Mannor

We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

The Fragility of Noise Estimation in Kalman Filter: Optimization Can Handle Model-Misspecification

1 code implementation • 6 Apr 2021 • Ido Greenberg, Shie Mannor, Netanel Yannay

The Kalman Filter (KF) parameters are traditionally determined by noise estimation, since under the KF assumptions, the state prediction errors are minimized when the parameters correspond to the noise covariance.

Noise Estimation

Paper
Code

Maximum Entropy Reinforcement Learning with Mixture Policies

no code implementations • 18 Mar 2021 • Nir Baram, Guy Tennenholtz, Shie Mannor

However, using mixture policies in the Maximum Entropy (MaxEnt) framework is not straightforward.

Continuous Control reinforcement-learning +1

Paper
Add Code

Uncertainty Estimation Using Riemannian Model~Dynamics for Offline Reinforcement Learning

no code implementations • 22 Feb 2021 • Guy Tennenholtz, Shie Mannor

In this work, we combine parametric and nonparametric methods for uncertainty estimation through a novel latent space based metric.

Autonomous Driving Continuous Control +4

Paper
Add Code

Action Redundancy in Reinforcement Learning

no code implementations • 22 Feb 2021 • Nir Baram, Guy Tennenholtz, Shie Mannor

Maximum Entropy (MaxEnt) reinforcement learning is a powerful learning paradigm which seeks to maximize return under entropy regularization.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Reinforcement Learning for Datacenter Congestion Control

no code implementations • 18 Feb 2021 • Chen Tessler, Yuval Shpigelman, Gal Dalal, Amit Mandelbaum, Doron Haritan Kazakov, Benjamin Fuhrer, Gal Chechik, Shie Mannor

We approach the task of network congestion control in datacenters using Reinforcement Learning (RL).

Network Congestion Control reinforcement-learning +1

Paper
Add Code

Improper Reinforcement Learning with Gradient-based Policy Optimization

no code implementations • 16 Feb 2021 • Mohammadi Zaki, Avinash Mohan, Aditya Gopalan, Shie Mannor

We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process, and wishes to combine them optimally to produce a potentially new controller that can outperform each of the base ones.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Online Apprenticeship Learning

no code implementations • 13 Feb 2021 • Lior Shani, Tom Zahavy, Shie Mannor

Finally, we implement a deep variant of our algorithm which shares some similarities to GAIL \cite{ho2016generative}, but where the discriminator is replaced with the costs learned by the OAL problem.

Paper
Add Code

RL for Latent MDPs: Regret Guarantees and a Lower Bound

no code implementations • NeurIPS 2021 • Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

In this work, we consider the regret minimization problem for reinforcement learning in latent Markov Decision Processes (LMDP).

Paper
Add Code

Dimension Free Generalization Bounds for Non Linear Metric Learning

no code implementations • 7 Feb 2021 • Mark Kozdoba, Shie Mannor

In this work we study generalization guarantees for the metric learning problem, where the metric is induced by a neural network type embedding of the data.

Generalization Bounds Metric Learning

Paper
Add Code

Online Limited Memory Neural-Linear Bandits with Likelihood Matching

3 code implementations • 7 Feb 2021 • Ofir Nabati, Tom Zahavy, Shie Mannor

To alleviate this, we propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.

Efficient Exploration Multi-Armed Bandits +1

Paper
Code

Confidence-Budget Matching for Sequential Budgeted Learning

no code implementations • 5 Feb 2021 • Yonathan Efroni, Nadav Merlis, Aadirupa Saha, Shie Mannor

We analyze the performance of CBM based algorithms in different settings and show that they perform well in the presence of adversity in the contexts, initial states, and budgets.

Decision Making Decision Making Under Uncertainty +2

Paper
Add Code

Acting in Delayed Environments with Non-Stationary Markov Policies

2 code implementations • ICLR 2021 • Esther Derman, Gal Dalal, Shie Mannor

We introduce a framework for learning and planning in MDPs where the decision-maker commits actions that are executed with a delay of $m$ steps.

Cloud Computing Q-Learning

Paper
Code

Online Limited Memory Neural-Linear Bandits

no code implementations • 1 Jan 2021 • Tom Zahavy, Ofir Nabati, Leor Cohen, Shie Mannor

We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.

Efficient Exploration Multi-Armed Bandits +2

Paper
Add Code

Learning Safe Policies with Cost-sensitive Advantage Estimation

no code implementations • 1 Jan 2021 • Bingyi Kang, Shie Mannor, Jiashi Feng

Reinforcement Learning (RL) with safety guarantee is critical for agents performing tasks in risky environments.

Reinforcement Learning (RL)

Paper
Add Code

The Architectural Implications of Distributed Reinforcement Learning on CPU-GPU Systems

no code implementations • 8 Dec 2020 • Ahmet Inci, Evgeny Bolotin, Yaosheng Fu, Gal Dalal, Shie Mannor, David Nellans, Diana Marculescu

With deep reinforcement learning (RL) methods achieving results that exceed human capabilities in games, robotics, and simulated environments, continued scaling of RL training is crucial to its deployment in solving complex real-world problems.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Detecting Rewards Deterioration in Episodic Reinforcement Learning

1 code implementation • 22 Oct 2020 • Ido Greenberg, Shie Mannor

In many RL applications, once training ends, it is vital to detect any deterioration in the agent performance as soon as possible.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Code

Controlling Graph Dynamics with Reinforcement Learning and Graph Neural Networks

no code implementations • 11 Oct 2020 • Eli A. Meirom, Haggai Maron, Shie Mannor, Gal Chechik

We consider the problem of controlling a partially-observed dynamic process on a graph by a limited number of interventions.

Marketing reinforcement-learning +2

Paper
Add Code

Drift Detection in Episodic Data: Detect When Your Agent Starts Faltering

no code implementations • 28 Sep 2020 • Ido Greenberg, Shie Mannor

The statistical power of the new testing procedure is shown to outperform alternative tests - often by orders of magnitude - for a variety of environment modifications (which cause deterioration in agent performance).

Paper
Add Code

Reinforcement Learning with Trajectory Feedback

no code implementations • 13 Aug 2020 • Yonathan Efroni, Nadav Merlis, Shie Mannor

The standard feedback model of reinforcement learning requires revealing the reward of every visited state-action pair.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Lenient Regret for Multi-Armed Bandits

1 code implementation • 10 Aug 2020 • Nadav Merlis, Shie Mannor

Importantly, we show that when the mean of the optimal arm is high enough, the lenient regret of $\epsilon$-TS is bounded by a constant.

Multi-Armed Bandits Thompson Sampling

Paper
Code

Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning

no code implementations • ICLR 2021 • Shauharda Khadka, Estelle Aflalo, Mattias Marder, Avrech Ben-David, Santiago Miret, Shie Mannor, Tamir Hazan, Hanlin Tang, Somdeb Majumdar

For deep neural network accelerators, memory movement is both energetically expensive and can bound computation.

Network Pruning reinforcement-learning +1

Paper
Add Code

Bandits with Partially Observable Confounded Data

no code implementations • 11 Jun 2020 • Guy Tennenholtz, Uri Shalit, Shie Mannor, Yonathan Efroni

We construct a linear bandit algorithm that takes advantage of the projected information, and prove regret bounds.

Multi-Armed Bandits

Paper
Add Code

Distributional Robustness and Regularization in Reinforcement Learning

no code implementations • 5 Mar 2020 • Esther Derman, Shie Mannor

Distributionally Robust Optimization (DRO) has enabled to prove the equivalence between robustness and regularization in classification and regression, thus providing an analytical reason why regularization generalizes well in statistical learning.

Decision Making reinforcement-learning +1

Paper
Add Code

Exploration-Exploitation in Constrained MDPs

no code implementations • 4 Mar 2020 • Yonathan Efroni, Shie Mannor, Matteo Pirotta

In many sequential decision-making problems, the goal is to optimize a utility function while satisfying a set of constraints on different utilities.

Decision Making

Paper
Add Code

Stealing Black-Box Functionality Using The Deep Neural Tree Architecture

1 code implementation • 23 Feb 2020 • Daniel Teitelman, Itay Naeh, Shie Mannor

This paper makes a substantial step towards cloning the functionality of black-box models by introducing a Machine learning (ML) architecture named Deep Neural Trees (DNTs).

Active Learning

Paper
Code

Optimistic Policy Optimization with Bandit Feedback

no code implementations • ICML 2020 • Yonathan Efroni, Lior Shani, Aviv Rosenberg, Shie Mannor

To the best of our knowledge, the two results are the first sub-linear regret bounds obtained for policy optimization algorithms with unknown transitions and bandit feedback.

Reinforcement Learning (RL)

Paper
Add Code

Kalman meets Bellman: Improving Policy Evaluation through Value Tracking

1 code implementation • 17 Feb 2020 • Shirli Di-Castro Shashua, Shie Mannor

These frameworks can learn uncertainties over the value parameters and exploit them for policy exploration.

Gaussian Processes Reinforcement Learning (RL)

Paper
Code

Tight Lower Bounds for Combinatorial Multi-Armed Bandits

no code implementations • 13 Feb 2020 • Nadav Merlis, Shie Mannor

The Combinatorial Multi-Armed Bandit problem is a sequential decision-making problem in which an agent selects a set of arms on each round, observes feedback for each of these arms and aims to maximize a known reward function of the arms it chose.

Decision Making Multi-Armed Bandits

Paper
Add Code

Over-the-Air Adversarial Flickering Attacks against Video Recognition Networks

1 code implementation • CVPR 2021 • Roi Pony, Itay Naeh, Shie Mannor

In this work we present a manipulation scheme for fooling video classifiers by introducing a flickering temporal perturbation that in some cases may be unnoticeable by human observers and is implementable in the real world.

Action Classification Classification +5

Paper
Code

Reward Tweaking: Maximizing the Total Reward While Planning for Short Horizons

no code implementations • 9 Feb 2020 • Chen Tessler, Shie Mannor

In reinforcement learning, the discount factor $\gamma$ controls the agent's effective planning horizon.

Continuous Control reinforcement-learning +1

Paper
Add Code

Language is Power: Representing States Using Natural Language in Reinforcement Learning

no code implementations • 2 Oct 2019 • Erez Schwartz, Guy Tennenholtz, Chen Tessler, Shie Mannor

Recent advances in reinforcement learning have shown its potential to tackle complex real-life tasks.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Never Worse, Mostly Better: Stable Policy Improvement in Deep Reinforcement Learning

no code implementations • 2 Oct 2019 • Pranav Khanna, Guy Tennenholtz, Nadav Merlis, Shie Mannor, Chen Tessler

In recent years, there has been significant progress in applying deep reinforcement learning (RL) for solving challenging problems across a wide variety of domains.

Continuous Control reinforcement-learning +1

Paper
Add Code

Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood Matching

1 code implementation • 25 Sep 2019 • Tom Zahavy, Shie Mannor

We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.

Efficient Exploration Multi-Armed Bandits +3

Paper
Code

Contextual Inverse Reinforcement Learning

no code implementations • 25 Sep 2019 • Philip Korsunsky, Stav Belogolovsky, Tom Zahavy, Chen Tessler, Shie Mannor

In this setting, the reward, which is unknown to the agent, is a function of a static parameter referred to as the context.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients

no code implementations • 25 Sep 2019 • Chen Tessler, Nadav Merlis, Shie Mannor

In recent years, advances in deep learning have enabled the application of reinforcement learning algorithms in complex domains.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Partial Simulation for Imitation Learning

no code implementations • 25 Sep 2019 • Nir Baram, Shie Mannor

Model-based imitation learning methods require full knowledge of the transition kernel for policy evaluation.

Imitation Learning Reinforcement Learning (RL)

Paper
Add Code

Online Planning with Lookahead Policies

no code implementations • NeurIPS 2020 • Yonathan Efroni, Mohammad Ghavamzadeh, Shie Mannor

This is the first work that proves improved sample complexity as a result of {\em increasing} the lookahead horizon in online planning.

Paper
Add Code

Off-Policy Evaluation in Partially Observable Environments

no code implementations • 9 Sep 2019 • Guy Tennenholtz, Shie Mannor, Uri Shalit

This work studies the problem of batch off-policy evaluation for Reinforcement Learning in partially observable environments.

Off-policy evaluation

Paper
Add Code

Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs

no code implementations • 6 Sep 2019 • Lior Shani, Yonathan Efroni, Shie Mannor

Trust region policy optimization (TRPO) is a popular and empirically successful policy search algorithm in Reinforcement Learning (RL) in which a surrogate problem, that restricts consecutive policies to be 'close' to one another, is iteratively solved.

Reinforcement Learning (RL)

Paper
Add Code

Practical Risk Measures in Reinforcement Learning

no code implementations • 22 Aug 2019 • Dotan Di Castro, Joel Oren, Shie Mannor

Practical application of Reinforcement Learning (RL) often involves risk considerations.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Topic Modeling via Full Dependence Mixtures

1 code implementation • ICML 2020 • Dan Fisher, Mark Kozdoba, Shie Mannor

FDMs model second moment under general generative assumptions on the data.

Stochastic Optimization

Paper
Code

Finite Sample Analysis Of Dynamic Regression Parameter Learning

no code implementations • 13 Jun 2019 • Mark Kozdoba, Edward Moroshko, Shie Mannor, Koby Crammer

The proposed bounds depend on the shape of a certain spectrum related to the system operator, and thus provide the first known explicit geometric parameter of the data that can be used to bound estimation errors.

regression

Paper
Add Code

Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies

1 code implementation • NeurIPS 2019 • Yonathan Efroni, Nadav Merlis, Mohammad Ghavamzadeh, Shie Mannor

In this paper, we focus on model-based RL in the finite-state finite-horizon MDP setting and establish that exploring with \emph{greedy policies} -- act by \emph{1-step planning} -- can achieve tight minimax performance in terms of regret, $\tilde{\mathcal{O}}(\sqrt{HSAT})$.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Code

Action Assembly: Sparse Imitation Learning for Text Based Games with Combinatorial Action Spaces

no code implementations • 23 May 2019 • Chen Tessler, Tom Zahavy, Deborah Cohen, Daniel J. Mankowitz, Shie Mannor

We propose a computationally efficient algorithm that combines compressed sensing with imitation learning to solve text-based games with combinatorial action spaces.

Imitation Learning text-based games +1

Paper
Add Code

Distributional Policy Optimization: An Alternative Approach for Continuous Control

3 code implementations • NeurIPS 2019 • Chen Tessler, Guy Tennenholtz, Shie Mannor

We show that optimizing over such sets results in local movement in the action space and thus convergence to sub-optimal solutions.

Continuous Control Policy Gradient Methods

Paper
Code

Inverse Reinforcement Learning in Contextual MDPs

2 code implementations • 23 May 2019 • Stav Belogolovsky, Philip Korsunsky, Shie Mannor, Chen Tessler, Tom Zahavy

Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts).

Autonomous Driving reinforcement-learning +1

Paper
Code

A Bayesian Approach to Robust Reinforcement Learning

no code implementations • 20 May 2019 • Esther Derman, Daniel Mankowitz, Timothy Mann, Shie Mannor

Robust Markov Decision Processes (RMDPs) intend to ensure robustness with respect to changing or adversarial system behavior.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem

no code implementations • 8 May 2019 • Nadav Merlis, Shie Mannor

We show that a linear dependence of the regret in the batch size in existing algorithms can be replaced by this smoothness parameter.

Paper
Add Code

Image Matters: Scalable Detection of Offensive and Non-Compliant Content / Logo in Product Images

no code implementations • 6 May 2019 • Shreyansh Gandhi, Samrat Kokkula, Abon Chaudhuri, Alessandro Magnani, Theban Stanley, Behzad Ahmadi, Venkatesh Kandaswamy, Omer Ovenc, Shie Mannor

In this paper, we present a computer vision driven offensive and non-compliant image detection system for extremely large image datasets.

Image Classification object-detection +1

Paper
Add Code

An adaptive stochastic optimization algorithm for resource allocation

no code implementations • 12 Feb 2019 • Xavier Fontaine, Shie Mannor, Vianney Perchet

This can be recast as a specific stochastic optimization problem where the objective is to maximize the cumulative reward, or equivalently to minimize the regret.

Stochastic Optimization

Paper
Add Code

The Natural Language of Actions

1 code implementation • 4 Feb 2019 • Guy Tennenholtz, Shie Mannor

We introduce Act2Vec, a general framework for learning context-based action representation for Reinforcement Learning.

reinforcement-learning Reinforcement Learning (RL) +2

Paper
Code

Value Propagation for Decentralized Networked Deep Multi-agent Reinforcement Learning

no code implementations • NeurIPS 2019 • Chao Qu, Shie Mannor, Huan Xu, Yuan Qi, Le Song, Junwu Xiong

To the best of our knowledge, it is the first MARL algorithm with convergence guarantee in the control, off-policy and non-linear function approximation setting.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Action Robust Reinforcement Learning and Applications in Continuous Control

2 code implementations • 26 Jan 2019 • Chen Tessler, Yonathan Efroni, Shie Mannor

In this work we formalize two new criteria of robustness to action uncertainty.

Continuous Control reinforcement-learning +1

Paper
Code

Deep Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood Matching

no code implementations • 24 Jan 2019 • Tom Zahavy, Shie Mannor

We study the neural-linear bandit model for solving sequential decision-making problems with high dimensional side information.

Decision Making Efficient Exploration +4

Paper
Add Code

Trust Region Value Optimization using Kalman Filtering

no code implementations • 23 Jan 2019 • Shirli Di-Castro Shashua, Shie Mannor

However, this approach ignores certain distributional properties of both the errors and value parameters.

Paper
Add Code

Multi Instance Learning For Unbalanced Data

no code implementations • 17 Dec 2018 • Mark Kozdoba, Edward Moroshko, Lior Shani, Takuya Takagi, Takashi Katoh, Shie Mannor, Koby Crammer

In the context of Multi Instance Learning, we analyze the Single Instance (SI) learning objective.

Paper
Add Code

Exploration Conscious Reinforcement Learning Revisited

1 code implementation • 13 Dec 2018 • Lior Shani, Yonathan Efroni, Shie Mannor

We continue and analyze properties of exploration-conscious optimal policies and characterize two general approaches to solve such criteria.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning

no code implementations • NeurIPS 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control.

Model Predictive Control reinforcement-learning +1

Paper
Add Code

On-Line Learning of Linear Dynamical Systems: Exponential Forgetting in Kalman Filters

1 code implementation • AAAI 2019 • Mark Kozdoba, Jakub Marecek, Tigran Tchrakian, Shie Mannor

Based on this insight, we devise an on-line algorithm for improper learning of a linear dynamical system (LDS), which considers only a few most recent observations.

regression Time Series +1

Paper
Code

Inspiration Learning through Preferences

no code implementations • 16 Sep 2018 • Nir Baram, Shie Mannor

We denote this setup as \textit{Inspiration Learning} - knowledge transfer between agents that operate in different action spaces.

Imitation Learning Transfer Learning

Paper
Add Code

How to Combine Tree-Search Methods in Reinforcement Learning

no code implementations • 6 Sep 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Finite-horizon lookahead policies are abundantly used in Reinforcement Learning and demonstrate impressive empirical success.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

no code implementations • NeurIPS 2018 • Tom Zahavy, Matan Haroush, Nadav Merlis, Daniel J. Mankowitz, Shie Mannor

Learning how to act when there are many available actions in each state is a challenging task for Reinforcement Learning (RL) agents, especially when many of the actions are redundant or irrelevant.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Multi-user Communication Networks: A Coordinated Multi-armed Bandit Approach

no code implementations • 14 Aug 2018 • Orly Avner, Shie Mannor

Communication networks shared by many users are a widespread challenge nowadays.

Paper
Add Code

Beyond the One-Step Greedy Approach in Reinforcement Learning

no code implementations • ICML 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

A General Framework for Bandit Problems Beyond Cumulative Objectives

no code implementations • 4 Jun 2018 • Asaf Cassel, Shie Mannor, Assaf Zeevi

Unlike the case of cumulative criteria, in the problems we study here the oracle policy, that knows the problem parameters a priori and is used to "center" the regret, is not trivial.

Multi-Armed Bandits

Paper
Add Code

Reward Constrained Policy Optimization

1 code implementation • ICLR 2019 • Chen Tessler, Daniel J. Mankowitz, Shie Mannor

Solving tasks in Reinforcement Learning is no easy feat.

Reinforcement Learning (RL) Safe Reinforcement Learning

Paper
Code

Multiple-Step Greedy Policies in Online and Approximate Reinforcement Learning

no code implementations • 21 May 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control.

Model Predictive Control reinforcement-learning +1

Paper
Add Code

Nonlinear Distributional Gradient Temporal-Difference Learning

no code implementations • 20 May 2018 • Chao Qu, Shie Mannor, Huan Xu

We devise a distributional variant of gradient temporal-difference (TD) learning.

Distributional Reinforcement Learning

Paper
Add Code

Interdependent Gibbs Samplers

no code implementations • 11 Apr 2018 • Mark Kozdoba, Shie Mannor

Gibbs sampling, as a model learning method, is known to produce the most accurate results available in a variety of domains, and is a de facto standard in these domains.

Paper
Add Code

Deep Learning Reconstruction of Ultra-Short Pulses

no code implementations • 15 Mar 2018 • Tom Zahavy, Alex Dikopoltsev, Oren Cohen, Shie Mannor, Mordechai Segev

Ultra-short laser pulses with femtosecond to attosecond pulse duration are the shortest systematic events humans can create.

Paper
Add Code

Soft-Robust Actor-Critic Policy-Gradient

no code implementations • 11 Mar 2018 • Esther Derman, Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

It learns an optimal policy with respect to a distribution over an uncertainty set and stays robust to model uncertainty but avoids the conservativeness of robust strategies.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Train on Validation: Squeezing the Data Lemon

no code implementations • 16 Feb 2018 • Guy Tennenholtz, Tom Zahavy, Shie Mannor

We define the notion of on-average-validation-stable algorithms as one in which using small portions of validation data for training does not overfit the model selection process.

Model Selection

Paper
Add Code

Beyond the One Step Greedy Approach in Reinforcement Learning

no code implementations • 10 Feb 2018 • Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Learning Robust Options

no code implementations • 9 Feb 2018 • Daniel J. Mankowitz, Timothy A. Mann, Pierre-Luc Bacon, Doina Precup, Shie Mannor

We present a Robust Options Policy Iteration (ROPI) algorithm with convergence guarantees, which learns options that are robust to model uncertainty.

Paper
Add Code

The Stochastic Firefighter Problem

no code implementations • 22 Nov 2017 • Guy Tennenholtz, Constantine Caramanis, Shie Mannor

We devise a simple policy that only vaccinates neighbors of infected nodes and is optimal on regular trees and on general graphs for a sufficiently large budget.

Paper
Add Code

Situationally Aware Options

no code implementations • 20 Nov 2017 • Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

We learn reusable options in different scenarios in a RoboCup soccer domain (i. e., winning/losing).

Paper
Add Code

End-to-End Differentiable Adversarial Imitation Learning

no code implementations • ICML 2017 • Nir Baram, Oron Anschel, Itai Caspi, Shie Mannor

Generative Adversarial Networks (GANs) have been successfully applied to the problem of policy imitation in a model-free setup.

Imitation Learning

Paper
Add Code

Multi-objective Bandits: Optimizing the Generalized Gini Index

no code implementations • ICML 2017 • Robert Busa-Fekete, Balazs Szorenyi, Paul Weng, Shie Mannor

We study the multi-armed bandit (MAB) problem where the agent receives a vectorial feedback that encodes many possibly competing objectives to be optimized.

Paper
Add Code

Shallow Updates for Deep Reinforcement Learning

no code implementations • NeurIPS 2017 • Nir Levine, Tom Zahavy, Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

In this work we propose a hybrid approach -- the Least Squares Deep Q-Network (LS-DQN), which combines rich feature representations learned by a DRL algorithm with the stability of a linear least squares method.

Atari Games Feature Engineering +2

Paper
Add Code

Finite Sample Analyses for TD(0) with Function Approximation

no code implementations • 4 Apr 2017 • Gal Dalal, Balázs Szörényi, Gugan Thoppe, Shie Mannor

TD(0) is one of the most commonly used algorithms in reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning

no code implementations • 15 Mar 2017 • Gal Dalal, Balazs Szorenyi, Gugan Thoppe, Shie Mannor

Using this, we provide a concentration bound, which is the first such result for a two-timescale SA.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Deep Robust Kalman Filter

no code implementations • 7 Mar 2017 • Shirli Di-Castro Shashua, Shie Mannor

The Deep-RoK algorithm is a robust Bayesian method, based on the Extended Kalman Filter (EKF), that accounts for both the uncertainty in the weights of the approximated value function and the uncertainty in the transition probabilities, improving the robustness of the agent.

Decision Making

Paper
Add Code

Online Learning with Many Experts

no code implementations • 25 Feb 2017 • Alon Cohen, Shie Mannor

We study the problem of prediction with expert advice when the number of experts in question may be extremely large or even infinite.

Paper
Add Code

Consistent On-Line Off-Policy Evaluation

no code implementations • ICML 2017 • Assaf Hallak, Shie Mannor

The problem of on-line off-policy evaluation (OPE) has been actively studied in the last decade due to its importance both as a stand-alone problem and as a module in a policy improvement scheme.

Off-policy evaluation

Paper
Add Code

Rotting Bandits

no code implementations • NeurIPS 2017 • Nir Levine, Koby Crammer, Shie Mannor

In the classical MAB problem, a decision maker must choose an arm at each time step, upon which she receives a reward.

Multi-Armed Bandits

Paper
Add Code

Outlier Robust Online Learning

no code implementations • 1 Jan 2017 • Jiashi Feng, Huan Xu, Shie Mannor

We consider the problem of learning from noisy data in practical settings where the size of data is too large to store on a single machine.

Paper
Add Code

Adaptive Lambda Least-Squares Temporal Difference Learning

no code implementations • 30 Dec 2016 • Timothy A. Mann, Hugo Penedones, Shie Mannor, Todd Hester

Temporal Difference learning or TD($\lambda$) is a fundamental algorithm in the field of reinforcement learning.

Paper
Add Code

Supervised Learning for Optimal Power Flow as a Real-Time Proxy

no code implementations • 20 Dec 2016 • Raphael Canyasse, Gal Dalal, Shie Mannor

In this work we design and compare different supervised learning algorithms to compute the cost of Alternating Current Optimal Power Flow (ACOPF).

Paper
Add Code

Model-based Adversarial Imitation Learning

no code implementations • 7 Dec 2016 • Nir Baram, Oron Anschel, Shie Mannor

A model-based approach for the problem of adversarial imitation learning.

Imitation Learning

Paper
Add Code

Adaptive Skills Adaptive Partitions (ASAP)

no code implementations • NeurIPS 2016 • Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that (1) learns skills (i. e., temporally extended actions or options) as well as (2) where to apply them.

Paper
Add Code

Unit Commitment using Nearest Neighbor as a Short-Term Proxy

no code implementations • 30 Nov 2016 • Gal Dalal, Elad Gilboa, Shie Mannor, Louis Wehenkel

We devise the Unit Commitment Nearest Neighbor (UCNN) algorithm to be used as a proxy for quickly approximating outcomes of short-term decisions, to make tractable hierarchical long-term assessment and planning for large power systems.

Paper
Add Code

Is a picture worth a thousand words? A Deep Multi-Modal Fusion Architecture for Product Classification in e-commerce

no code implementations • 29 Nov 2016 • Tom Zahavy, Alessandro Magnani, Abhinandan Krishnan, Shie Mannor

Classifying products into categories precisely and efficiently is a major challenge in modern e-commerce.

General Classification

Paper
Add Code

Situational Awareness by Risk-Conscious Skills

no code implementations • 10 Oct 2016 • Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

In addition, the learned risk aware skills are able to mitigate reward-based model misspecification.

Hierarchical Reinforcement Learning

Paper
Add Code

A nonparametric sequential test for online randomized experiments

no code implementations • 8 Oct 2016 • Vineet Abhishek, Shie Mannor

The proposed test does not require knowledge of the underlying probability distribution generating the data.

Paper
Add Code

Bayesian Reinforcement Learning: A Survey

no code implementations • 14 Sep 2016 • Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, Aviv Tamar

The objective of the paper is to provide a comprehensive survey on Bayesian RL algorithms and their theoretical and empirical properties.

Bayesian Inference reinforcement-learning +1

Paper
Add Code

How to Allocate Resources For Features Acquisition?

no code implementations • 10 Jul 2016 • Oran Richman, Shie Mannor

We study classification problems where features are corrupted by noise and where the magnitude of the noise in each feature is influenced by the resources allocated to its acquisition.

General Classification

Paper
Add Code

Visualizing Dynamics: from t-SNE to SEMI-MDPs

no code implementations • 22 Jun 2016 • Nir Ben Zrihem, Tom Zahavy, Shie Mannor

Deep Reinforcement Learning (DRL) is a trending field of research, showing great promise in many challenging problems such as playing Atari, solving Go and controlling robots.

Paper
Add Code

Deep Reinforcement Learning Discovers Internal Models

no code implementations • 16 Jun 2016 • Nir Baram, Tom Zahavy, Shie Mannor

Deep Reinforcement Learning (DRL) is a trending field of research, showing great promise in challenging problems such as playing Atari, solving Go and controlling robots.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Bending the Curve: Improving the ROC Curve Through Error Redistribution

no code implementations • 21 May 2016 • Oran Richman, Shie Mannor

Features that hold information about the "difficulty" of the data may be non-discriminative and are therefore disregarded in the classification process.

General Classification Meta-Learning

Paper
Add Code

A Reinforcement Learning System to Encourage Physical Activity in Diabetes Patients

no code implementations • 13 May 2016 • Irit Hochberg, Guy Feraru, Mark Kozdoba, Shie Mannor, Moshe Tennenholtz, Elad Yom-Tov

Messages were personalized through a Reinforcement Learning (RL) algorithm which optimized messages to improve each participant's compliance with the activity regimen.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Clustering Time Series and the Surprising Robustness of HMMs

no code implementations • 9 May 2016 • Mark Kozdoba, Shie Mannor

Suppose that we are given a time series where consecutive samples are believed to come from a probabilistic source, that the source changes from time to time and that the total number of sources is fixed.

Clustering Time Series +1

Paper
Add Code

A Deep Hierarchical Approach to Lifelong Learning in Minecraft

no code implementations • 25 Apr 2016 • Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J. Mankowitz, Shie Mannor

Skill distillation enables the HDRLN to efficiently retain knowledge and therefore scale in lifelong learning, by accumulating knowledge and encapsulating multiple reusable skills into a single distilled network.

Paper
Add Code

Hierarchical Decision Making In Electricity Grid Management

no code implementations • 6 Mar 2016 • Gal Dalal, Elad Gilboa, Shie Mannor

The power grid is a complex and vital system that necessitates careful reliability management.

Decision Making Management +1

Paper
Add Code

Iterative Hierarchical Optimization for Misspecified Problems (IHOMP)

no code implementations • 10 Feb 2016 • Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

For complex, high-dimensional Markov Decision Processes (MDPs), it may be necessary to represent the policy with function approximation.

Paper
Add Code

Adaptive Skills, Adaptive Partitions (ASAP)

no code implementations • 10 Feb 2016 • Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that (1) learns skills (i. e., temporally extended actions or options) as well as (2) where to apply them.

Paper
Add Code

Graying the black box: Understanding DQNs

no code implementations • 8 Feb 2016 • Tom Zahavy, Nir Ben Zrihem, Shie Mannor

In recent years there is a growing interest in using deep representations for reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Ensemble Robustness and Generalization of Stochastic Deep Learning Algorithms

no code implementations • ICLR 2018 • Tom Zahavy, Bingyi Kang, Alex Sivak, Jiashi Feng, Huan Xu, Shie Mannor

As most deep learning algorithms are stochastic (e. g., Stochastic Gradient Descent, Dropout, and Bayes-by-backprop), we revisit the robustness arguments of Xu & Mannor, and introduce a new approach, ensemble robustness, that concerns the robustness of a population of hypotheses.

Paper
Add Code

Online Learning for Adversaries with Memory: Price of Past Mistakes

no code implementations • NeurIPS 2015 • Oren Anava, Elad Hazan, Shie Mannor

In this work we extend the notion of learning with memory to the general Online Convex Optimization (OCO) framework, and present two algorithms that attain low regret.

Paper
Add Code

Community Detection via Measure Space Embedding

no code implementations • NeurIPS 2015 • Mark Kozdoba, Shie Mannor

We present a new algorithm for community detection.

Community Detection Stochastic Block Model

Paper
Add Code

Learn on Source, Refine on Target:A Model Transfer Learning Framework with Random Forests

2 code implementations • 4 Nov 2015 • Noam Segev, Maayan Harel, Shie Mannor, Koby Crammer, Ran El-Yaniv

We propose novel model transfer-learning methods that refine a decision forest model M learned within a "source" domain using a training set sampled from a "target" domain, assumed to be a variation of the source.

Transfer Learning

279

Paper
Code

Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis

no code implementations • 17 Sep 2015 • Assaf Hallak, Aviv Tamar, Remi Munos, Shie Mannor

We consider the off-policy evaluation problem in Markov decision processes with function approximation.

Off-policy evaluation

Paper
Add Code

Emphatic TD Bellman Operator is a Contraction

no code implementations • 14 Aug 2015 • Assaf Hallak, Aviv Tamar, Shie Mannor

Recently, \citet{SuttonMW15} introduced the emphatic temporal differences (ETD) algorithm for off-policy evaluation in Markov decision processes.

Off-policy evaluation

Paper
Add Code

Reinforcement Learning for the Unit Commitment Problem

no code implementations • 19 Jul 2015 • Gal Dalal, Shie Mannor

In this work we solve the day-ahead unit commitment (UC) problem, by formulating it as a Markov decision process (MDP) and finding a low-cost policy for generation scheduling.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Bootstrapping Skills

no code implementations • 11 Jun 2015 • Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

The monolithic approach to policy representation in Markov Decision Processes (MDPs) looks for a single policy that can be represented as a function from states to actions.

Reinforcement Learning (RL)

Paper
Add Code

Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach

no code implementations • NeurIPS 2015 • Yin-Lam Chow, Aviv Tamar, Shie Mannor, Marco Pavone

Our first contribution is to show that a CVaR objective, besides capturing risk sensitivity, has an alternative interpretation as expected cost under worst-case modeling errors, for a given error budget.

Decision Making

Paper
Add Code

Multi-user lax communications: a multi-armed bandit approach

no code implementations • 30 Apr 2015 • Orly Avner, Shie Mannor

Inspired by cognitive radio networks, we consider a setting where multiple users share several channels modeled as a multi-user multi-armed bandit (MAB) problem.

Paper
Add Code

Overlapping Communities Detection via Measure Space Embedding

no code implementations • 26 Apr 2015 • Mark Kozdoba, Shie Mannor

We present a new algorithm for community detection.

Community Detection Stochastic Block Model

Paper
Add Code

Overlapping Community Detection by Online Cluster Aggregation

no code implementations • 26 Apr 2015 • Mark Kozdoba, Shie Mannor

We present a new online algorithm for detecting overlapping communities.

Community Detection

Paper
Add Code

Actively Learning to Attract Followers on Twitter

no code implementations • 16 Apr 2015 • Nir Levine, Timothy A. Mann, Shie Mannor

Twitter, a popular social network, presents great opportunities for on-line machine learning research.

BIG-bench Machine Learning

Paper
Add Code

Policy Gradient for Coherent Risk Measures

no code implementations • NeurIPS 2015 • Aviv Tamar, Yin-Lam Chow, Mohammad Ghavamzadeh, Shie Mannor

For static risk measures, our approach is in the spirit of policy gradient algorithms and combines a standard sampling approach with convex programming.

Policy Gradient Methods

Paper
Add Code

Off-policy evaluation for MDPs with unknown structure

no code implementations • 11 Feb 2015 • Assaf Hallak, François Schnitzler, Timothy Mann, Shie Mannor

Off-policy learning in dynamic decision problems is essential for providing strong evidence that a new policy is better than the one in use.

Off-policy evaluation

Paper
Add Code

Contextual Markov Decision Processes

no code implementations • 8 Feb 2015 • Assaf Hallak, Dotan Di Castro, Shie Mannor

The objective is to learn a strategy that maximizes the accumulated reward across all contexts.

Paper
Add Code

Implicit Temporal Differences

no code implementations • 21 Dec 2014 • Aviv Tamar, Panos Toulis, Shie Mannor, Edoardo M. Airoldi

In reinforcement learning, the TD($\lambda$) algorithm is a fundamental policy evaluation method with an efficient online implementation that is suitable for large-scale problems.

Paper
Add Code

Robust Logistic Regression and Classification

no code implementations • NeurIPS 2014 • Jiashi Feng, Huan Xu, Shie Mannor, Shuicheng Yan

We consider logistic regression with arbitrary outliers in the covariate matrix.

Binary Classification Classification +2

Paper
Add Code

How hard is my MDP?" The distribution-norm to the rescue"

no code implementations • NeurIPS 2014 • Odalric-Ambrym Maillard, Timothy A. Mann, Shie Mannor

In Reinforcement Learning (RL), state-of-the-art algorithms require a large number of samples per state-action pair to estimate the transition kernel $p$.

Reinforcement Learning (RL)

Paper
Add Code

Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback

no code implementations • 30 Sep 2014 • Noga Alon, Nicolò Cesa-Bianchi, Claudio Gentile, Shie Mannor, Yishay Mansour, Ohad Shamir

This naturally models several situations where the losses of different actions are related, and knowing the loss of one action provides information on the loss of other actions.

Multi-Armed Bandits

Paper
Add Code

Distributed Robust Learning

no code implementations • 21 Sep 2014 • Jiashi Feng, Huan Xu, Shie Mannor

We propose a framework for distributed robust statistical learning on {\em big contaminated data}.

Paper
Add Code

Thompson Sampling for Learning Parameterized Markov Decision Processes

no code implementations • 29 Jun 2014 • Aditya Gopalan, Shie Mannor

We consider reinforcement learning in parameterized Markov Decision Processes (MDPs), where the parameterization may induce correlation across transition probabilities or rewards.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Concurrent bandits and cognitive radio networks

no code implementations • 22 Apr 2014 • Orly Avner, Shie Mannor

Even the number of users may be unknown and can vary as users join or leave the network.

Collision Avoidance

Paper
Add Code

Optimizing the CVaR via Sampling

1 code implementation • 15 Apr 2014 • Aviv Tamar, Yonatan Glassner, Shie Mannor

Conditional Value at Risk (CVaR) is a prominent risk measure that is being used extensively in various domains.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Oracle-Based Robust Optimization via Online Learning

no code implementations • 25 Feb 2014 • Aharon Ben-Tal, Elad Hazan, Tomer Koren, Shie Mannor

Robust optimization is a common framework in optimization under uncertainty when the problem parameters are not known, but it is rather known that the parameters belong to some given uncertainty set.

Paper
Add Code

Approachability in unknown games: Online learning meets multi-objective optimization

no code implementations • 10 Feb 2014 • Shie Mannor, Vianney Perchet, Gilles Stoltz

We show that it is impossible, in general, to approach the best target set in hindsight and propose achievable though ambitious alternative goals.

Paper
Add Code

Localized epidemic detection in networks with overwhelming noise

no code implementations • 6 Feb 2014 • Eli A. Meirom, Chris Milling, Constantine Caramanis, Shie Mannor, Ariel Orda, Sanjay Shakkottai

Our algorithm requires only local-neighbor knowledge of this graph, and in a broad array of settings that we describe, succeeds even when false negatives and false positives make up an overwhelming fraction of the data available.

Paper
Add Code

Learning Multiple Models via Regularized Weighting

no code implementations • NeurIPS 2013 • Daniel Vainsencher, Shie Mannor, Huan Xu

We demonstrate the robustness benefits of our approach with some experimental results and prove for the important case of clustering that our approach has a non-trivial breakdown point, i. e., is guaranteed to be robust to a fixed percentage of adversarial unbounded outliers.

Clustering Generalization Bounds

Paper
Add Code

Reinforcement Learning in Robust Markov Decision Processes

no code implementations • NeurIPS 2013 • Shiau Hong Lim, Huan Xu, Shie Mannor

An important challenge in Markov decision processes is to ensure robustness with respect to unexpected or adversarial system behavior while taking advantage of well-behaving parts of the system.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Online PCA for Contaminated Data

no code implementations • NeurIPS 2013 • Jiashi Feng, Huan Xu, Shie Mannor, Shuicheng Yan

We consider the online Principal Component Analysis (PCA) for contaminated samples (containing outliers) which are revealed sequentially to the Principal Components (PCs) estimator.

Paper
Add Code

Thompson Sampling for Complex Bandit Problems

no code implementations • 3 Nov 2013 • Aditya Gopalan, Shie Mannor, Yishay Mansour

We consider stochastic multi-armed bandit problems with complex actions over a set of basic arms, where the decision maker plays a complex action rather than a basic arm in each round.

Thompson Sampling

Paper
Add Code

Variance Adjusted Actor Critic Algorithms

no code implementations • 14 Oct 2013 • Aviv Tamar, Shie Mannor

We present an actor-critic framework for MDPs where the objective is the variance-adjusted expected return.

Paper
Add Code

Scaling Up Robust MDPs by Reinforcement Learning

no code implementations • 26 Jun 2013 • Aviv Tamar, Huan Xu, Shie Mannor

We consider large-scale Markov decision processes (MDPs) with parameter uncertainty, under the robust MDP paradigm.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

A Primal Condition for Approachability with Partial Monitoring

no code implementations • 23 May 2013 • Shie Mannor, Vianney Perchet, Gilles Stoltz

In this paper we provide primal conditions on a convex set to be approachable with partial monitoring.

Paper
Add Code

Online Convex Optimization Against Adversaries with Memory and Application to Statistical Arbitrage

no code implementations • 27 Feb 2013 • Oren Anava, Elad Hazan, Shie Mannor

The framework of online learning with memory naturally captures learning problems with temporal constraints, and was previously studied for the experts setting.

Paper
Add Code

The Perturbed Variation

no code implementations • NeurIPS 2012 • Maayan Harel, Shie Mannor

We introduce a new discrepancy score between two distributions that gives an indication on their \emph{similarity}.

Two-sample testing

Paper
Add Code

Policy Gradients with Variance Related Risk Criteria

no code implementations • 27 Jun 2012 • Dotan Di Castro, Aviv Tamar, Shie Mannor

In this paper we devise a framework for local policy gradient style algorithms for reinforcement learning for variance related criteria.

Reinforcement Learning (RL)

Paper
Add Code

Committing Bandits

no code implementations • NeurIPS 2011 • Loc X. Bui, Ramesh Johari, Shie Mannor

In the second phase the decision maker has to commit to one of the arms and stick with it.

Paper
Add Code

From Bandits to Experts: On the Value of Side-Observations

no code implementations • NeurIPS 2011 • Shie Mannor, Ohad Shamir

We consider an adversarial online learning setting where a decision maker can choose an action in every stage of the game.

Multi-Armed Bandits

Paper
Add Code

Online Classification with Specificity Constraints

no code implementations • NeurIPS 2010 • Andrey Bernstein, Shie Mannor, Nahum Shimkin

To our best knowledge, this is the first algorithm that addresses the problem of the average tp-rate maximization under average fp-rate constraints in the online setting.

Binary Classification Classification +2

Paper
Add Code

Distributionally Robust Markov Decision Processes

no code implementations • NeurIPS 2010 • Huan Xu, Shie Mannor

We consider Markov decision processes where the values of the parameters are uncertain.

Paper
Add Code

Robust Regression and Lasso

no code implementations • NeurIPS 2008 • Huan Xu, Constantine Caramanis, Shie Mannor

We generalize this robust formulation to consider more general uncertainty sets, which all lead to tractable convex optimization problems.

regression

Paper
Add Code

Regularized Policy Iteration

no code implementations • NeurIPS 2008 • Amir M. Farahmand, Mohammad Ghavamzadeh, Shie Mannor, Csaba Szepesvári

In this paper we consider approximate policy-iteration-based reinforcement learning algorithms.

L2 Regularization reinforcement-learning +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.