Search Results for author: Shie Mannor

Found 197 papers, 31 papers with code

On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes

no code implementations11 Mar 2024 Navdeep Kumar, Yashaswini Murthy, Itai Shufaro, Kfir Y. Levy, R. Srikant, Shie Mannor

We present the first finite time global convergence analysis of policy gradient in the context of infinite horizon average reward Markov decision processes (MDPs).

Conservative DDPG -- Pessimistic RL without Ensemble

no code implementations8 Mar 2024 Nitsan Soffair, Shie Mannor

DDPG is hindered by the overestimation bias problem, wherein its $Q$-estimates tend to overstate the actual $Q$-values.

Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization

no code implementations15 Feb 2024 Yihan Du, Anna Winnicki, Gal Dalal, Shie Mannor, R. Srikant

In PO-RLHF, knowledge of the reward function is not assumed and the algorithm relies on trajectory-based comparison feedback to infer the reward function.

Improving Token-Based World Models with Parallel Observation Prediction

1 code implementation8 Feb 2024 Lior Cohen, Kaixin Wang, Bingyi Kang, Shie Mannor

We incorporate POP in a novel TBWM agent named REM (Retentive Environment Model), showcasing a 15. 4x faster imagination compared to prior TBWMs.

SQT -- std $Q$-target

no code implementations3 Feb 2024 Nitsan Soffair, Dotan Di-Castro, Orly Avner, Shie Mannor

We implement SQT on top of TD3/TD7 code and test it against the state-of-the-art (SOTA) actor-critic algorithms, DDPG, TD3 and TD7 on seven popular MuJoCo and Bullet tasks.

Q-Learning

MinMaxMin $Q$-learning

no code implementations3 Feb 2024 Nitsan Soffair, Shie Mannor

MinMaxMin $Q$-learning is a novel optimistic Actor-Critic algorithm that addresses the problem of overestimation bias ($Q$-estimations are overestimating the real $Q$-values) inherent in conservative RL algorithms.

Q-Learning

Prospective Side Information for Latent MDPs

no code implementations11 Oct 2023 Jeongyeol Kwon, Yonathan Efroni, Shie Mannor, Constantine Caramanis

In such an environment, the latent information remains fixed throughout each episode, since the identity of the user does not change during an interaction.

Decision Making

Solving Non-Rectangular Reward-Robust MDPs via Frequency Regularization

no code implementations3 Sep 2023 Uri Gadot, Esther Derman, Navdeep Kumar, Maxence Mohamed Elfatihi, Kfir Levy, Shie Mannor

In robust Markov decision processes (RMDPs), it is assumed that the reward and the transition dynamics lie in a given uncertainty set.

Sobolev Space Regularised Pre Density Models

no code implementations25 Jul 2023 Mark Kozdoba, Binyamin Perets, Shie Mannor

We propose a new approach to non-parametric density estimation that is based on regularizing a Sobolev norm of the density.

Anomaly Detection Density Estimation +1

Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating The Worst Kernel

no code implementations9 Jun 2023 Kaixin Wang, Uri Gadot, Navdeep Kumar, Kfir Levy, Shie Mannor

Robust Markov Decision Processes (RMDPs) provide a framework for sequential decision-making that is robust to perturbations on the transition kernel.

Decision Making reinforcement-learning +1

CALM: Conditional Adversarial Latent Models for Directable Virtual Characters

no code implementations2 May 2023 Chen Tessler, Yoni Kasten, Yunrong Guo, Shie Mannor, Gal Chechik, Xue Bin Peng

In this work, we present Conditional Adversarial Latent Models (CALM), an approach for generating diverse and directable behaviors for user-controlled interactive virtual characters.

Imitation Learning

Twice Regularized Markov Decision Processes: The Equivalence between Robustness and Regularization

1 code implementation12 Mar 2023 Esther Derman, Yevgeniy Men, Matthieu Geist, Shie Mannor

We then generalize regularized MDPs to twice regularized MDPs ($\text{R}^2$ MDPs), i. e., MDPs with $\textit{both}$ value and policy regularization.

An Efficient Solution to s-Rectangular Robust Markov Decision Processes

no code implementations31 Jan 2023 Navdeep Kumar, Kfir Levy, Kaixin Wang, Shie Mannor

We present an efficient robust value iteration for \texttt{s}-rectangular robust Markov Decision Processes (MDPs) with a time complexity comparable to standard (non-robust) MDPs which is significantly faster than any existing method.

LEMMA

SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search

no code implementations30 Jan 2023 Gal Dalal, Assaf Hallak, Gugan Thoppe, Shie Mannor, Gal Chechik

We prove that the resulting variance decays exponentially with the planning horizon as a function of the expansion policy.

Policy Gradient Methods

Towards Deployable RL - What's Broken with RL Research and a Potential Fix

no code implementations3 Jan 2023 Shie Mannor, Aviv Tamar

Reinforcement learning (RL) has demonstrated great potential, but is currently full of overhyping and pipe dreams.

reinforcement-learning Reinforcement Learning (RL)

DiffStack: A Differentiable and Modular Control Stack for Autonomous Vehicles

no code implementations13 Dec 2022 Peter Karkus, Boris Ivanovic, Shie Mannor, Marco Pavone

To enable the joint optimization of AV stacks while retaining modularity, we present DiffStack, a differentiable and modular stack for prediction, planning, and control.

Autonomous Vehicles

Reward-Mixing MDPs with a Few Latent Contexts are Learnable

no code implementations5 Oct 2022 Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

We consider episodic reinforcement learning in reward-mixing Markov decision processes (RMMDPs): at the beginning of every episode nature randomly picks a latent reward model among $M$ candidates and an agent interacts with the MDP throughout the episode for $H$ time steps.

Tractable Optimality in Episodic Latent MABs

no code implementations5 Oct 2022 Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

Then, through a method-of-moments approach, we design a procedure that provably learns a near-optimal policy with $O(\texttt{poly}(A) + \texttt{poly}(M, H)^{\min(M, H)})$ interactions.

Policy Gradient for Reinforcement Learning with General Utilities

no code implementations3 Oct 2022 Navdeep Kumar, Kaixin Wang, Kfir Levy, Shie Mannor

The policy gradient theorem proves to be a cornerstone in Linear RL due to its elegance and ease of implementability.

reinforcement-learning Reinforcement Learning (RL)

SoftTreeMax: Policy Gradient with Tree Search

no code implementations28 Sep 2022 Gal Dalal, Assaf Hallak, Shie Mannor, Gal Chechik

This allows us to reduce the variance of gradients by three orders of magnitude and to benefit from better sample complexity compared with standard policy gradient.

Policy Gradient Methods

Actor-Critic based Improper Reinforcement Learning

no code implementations19 Jul 2022 Mohammadi Zaki, Avinash Mohan, Aditya Gopalan, Shie Mannor

For the AC-based approach we provide convergence rate guarantees to a stationary point in the basic AC case and to a global optimum in the NAC case.

reinforcement-learning Reinforcement Learning (RL)

Analysis of Stochastic Processes through Replay Buffers

no code implementations26 Jun 2022 Shirli Di Castro Shashua, Shie Mannor, Dotan Di-Castro

We provide an analysis of the properties of the sampled process such as stationarity, Markovity and autocorrelation in terms of the properties of the original process.

reinforcement-learning Reinforcement Learning (RL)

Reinforcement Learning with a Terminator

1 code implementation30 May 2022 Guy Tennenholtz, Nadav Merlis, Lior Shani, Shie Mannor, Uri Shalit, Gal Chechik, Assaf Hallak, Gal Dalal

We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.

Autonomous Driving reinforcement-learning +1

Efficient Policy Iteration for Robust Markov Decision Processes via Regularization

1 code implementation28 May 2022 Navdeep Kumar, Kfir Levy, Kaixin Wang, Shie Mannor

But we don't have a clear understanding to exploit this equivalence, to do policy improvement steps to get the optimal value function or policy.

Efficient Risk-Averse Reinforcement Learning

2 code implementations10 May 2022 Ido Greenberg, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor

In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.

Autonomous Driving reinforcement-learning +1

Learning Hidden Markov Models When the Locations of Missing Observations are Unknown

no code implementations12 Mar 2022 Binyamin Perets, Mark Kozdoba, Shie Mannor

However, standard HMM learning algorithms rely crucially on the assumption that the positions of the missing observations \emph{within the observation sequence} are known.

Learning to reason about and to act on physical cascading events

no code implementations2 Feb 2022 Yuval Atzmon, Eli A. Meirom, Shie Mannor, Gal Chechik

Reasoning and interacting with dynamic environments is a fundamental problem in AI, but it becomes extremely challenging when actions can trigger cascades of cross-dependent events.

counterfactual

Continuous Forecasting via Neural Eigen Decomposition

1 code implementation31 Jan 2022 Stav Belogolovsky, Ido Greenberg, Danny Eitan, Shie Mannor

Neural differential equations predict the derivative of a stochastic process.

The Geometry of Robust Value Functions

no code implementations30 Jan 2022 Kaixin Wang, Navdeep Kumar, Kuangqi Zhou, Bryan Hooi, Jiashi Feng, Shie Mannor

The key of this perspective is to decompose the value space, in a state-wise manner, into unions of hypersurfaces.

Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms

no code implementations30 Jan 2022 Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

This parallelization gain is fundamentally altered by the presence of adversarial users: unless there are super-polynomial number of users, we show a lower bound of $\tilde{\Omega}(\min(S, A) \cdot \alpha^2 / \epsilon^2)$ {\it per-user} interactions to learn an $\epsilon$-optimal policy for the good users.

Collaborative Filtering Multi-Armed Bandits +1

Planning and Learning with Adaptive Lookahead

no code implementations28 Jan 2022 Aviv Rosenberg, Assaf Hallak, Shie Mannor, Gal Chechik, Gal Dalal

Some of the most powerful reinforcement learning frameworks use planning for action selection.

On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning

no code implementations ICLR 2022 Guy Tennenholtz, Assaf Hallak, Gal Dalal, Shie Mannor, Gal Chechik, Uri Shalit

We analyze the limitations of learning from such data with and without external reward, and propose an adjustment of standard imitation learning algorithms to fit this setup.

Imitation Learning Recommendation Systems +2

Query-Reward Tradeoffs in Multi-Armed Bandits

no code implementations12 Oct 2021 Nadav Merlis, Yonathan Efroni, Shie Mannor

We consider a stochastic multi-armed bandit setting where reward must be actively queried for it to be observed.

Multi-Armed Bandits

Twice regularized MDPs and the equivalence between robustness and regularization

no code implementations NeurIPS 2021 Esther Derman, Matthieu Geist, Shie Mannor

We finally generalize regularized MDPs to twice regularized MDPs (R${}^2$ MDPs), i. e., MDPs with $\textit{both}$ value and policy regularization.

Continuous-Time Fitted Value Iteration for Robust Policies

1 code implementation5 Oct 2021 Michael Lutter, Boris Belousov, Shie Mannor, Dieter Fox, Animesh Garg, Jan Peters

Especially for continuous control, solving this differential equation and its extension the Hamilton-Jacobi-Isaacs equation, is important as it yields the optimal policy that achieves the maximum reward on a give task.

Continuous Control

Sim and Real: Better Together

no code implementations NeurIPS 2021 Shirli Di Castro Shashua, Dotan Di Castro, Shie Mannor

Simulation is used extensively in autonomous systems, particularly in robotic manipulation.

Two Regimes of Generalization for Non-Linear Metric Learning

no code implementations29 Sep 2021 Mark Kozdoba, Shie Mannor

Specifically, we discover and analyze two regimes of behavior of the networks, which are roughly related to the sparsity of the last layer.

Generalization Bounds Metric Learning +1

Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning

no code implementations22 Sep 2021 Roy Zohar, Shie Mannor, Guy Tennenholtz

Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.

Multi-agent Reinforcement Learning reinforcement-learning +1

Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction

1 code implementation NeurIPS 2021 Assaf Hallak, Gal Dalal, Steven Dalton, Iuri Frosio, Shie Mannor, Gal Chechik

We first discover and analyze a counter-intuitive phenomenon: action selection through TS and a pre-trained value function often leads to lower performance compared to the original pre-trained agent, even when having access to the exact state and reward in future steps.

Atari Games

Robust Value Iteration for Continuous Control Tasks

1 code implementation25 May 2021 Michael Lutter, Shie Mannor, Jan Peters, Dieter Fox, Animesh Garg

The adversarial perturbations encourage a optimal policy that is robust to changes in the dynamics.

Continuous Control reinforcement-learning +1

Value Iteration in Continuous Actions, States and Time

1 code implementation10 May 2021 Michael Lutter, Shie Mannor, Jan Peters, Dieter Fox, Animesh Garg

This algorithm enables dynamic programming for continuous states and actions with a known dynamics model.

The Fragility of Noise Estimation in Kalman Filter: Optimization Can Handle Model-Misspecification

1 code implementation6 Apr 2021 Ido Greenberg, Shie Mannor, Netanel Yannay

The Kalman Filter (KF) parameters are traditionally determined by noise estimation, since under the KF assumptions, the state prediction errors are minimized when the parameters correspond to the noise covariance.

Noise Estimation

Maximum Entropy Reinforcement Learning with Mixture Policies

no code implementations18 Mar 2021 Nir Baram, Guy Tennenholtz, Shie Mannor

However, using mixture policies in the Maximum Entropy (MaxEnt) framework is not straightforward.

Continuous Control reinforcement-learning +1

Uncertainty Estimation Using Riemannian Model~Dynamics for Offline Reinforcement Learning

no code implementations22 Feb 2021 Guy Tennenholtz, Shie Mannor

In this work, we combine parametric and nonparametric methods for uncertainty estimation through a novel latent space based metric.

Autonomous Driving Continuous Control +3

Action Redundancy in Reinforcement Learning

no code implementations22 Feb 2021 Nir Baram, Guy Tennenholtz, Shie Mannor

Maximum Entropy (MaxEnt) reinforcement learning is a powerful learning paradigm which seeks to maximize return under entropy regularization.

reinforcement-learning Reinforcement Learning (RL)

Improper Reinforcement Learning with Gradient-based Policy Optimization

no code implementations16 Feb 2021 Mohammadi Zaki, Avinash Mohan, Aditya Gopalan, Shie Mannor

We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process, and wishes to combine them optimally to produce a potentially new controller that can outperform each of the base ones.

reinforcement-learning Reinforcement Learning (RL)

Online Apprenticeship Learning

no code implementations13 Feb 2021 Lior Shani, Tom Zahavy, Shie Mannor

Finally, we implement a deep variant of our algorithm which shares some similarities to GAIL \cite{ho2016generative}, but where the discriminator is replaced with the costs learned by the OAL problem.

RL for Latent MDPs: Regret Guarantees and a Lower Bound

no code implementations NeurIPS 2021 Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

In this work, we consider the regret minimization problem for reinforcement learning in latent Markov Decision Processes (LMDP).

Dimension Free Generalization Bounds for Non Linear Metric Learning

no code implementations7 Feb 2021 Mark Kozdoba, Shie Mannor

In this work we study generalization guarantees for the metric learning problem, where the metric is induced by a neural network type embedding of the data.

Generalization Bounds Metric Learning

Online Limited Memory Neural-Linear Bandits with Likelihood Matching

3 code implementations7 Feb 2021 Ofir Nabati, Tom Zahavy, Shie Mannor

To alleviate this, we propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.

Efficient Exploration Multi-Armed Bandits +1

Confidence-Budget Matching for Sequential Budgeted Learning

no code implementations5 Feb 2021 Yonathan Efroni, Nadav Merlis, Aadirupa Saha, Shie Mannor

We analyze the performance of CBM based algorithms in different settings and show that they perform well in the presence of adversity in the contexts, initial states, and budgets.

Decision Making Decision Making Under Uncertainty +2

Acting in Delayed Environments with Non-Stationary Markov Policies

2 code implementations ICLR 2021 Esther Derman, Gal Dalal, Shie Mannor

We introduce a framework for learning and planning in MDPs where the decision-maker commits actions that are executed with a delay of $m$ steps.

Cloud Computing Q-Learning

Learning Safe Policies with Cost-sensitive Advantage Estimation

no code implementations1 Jan 2021 Bingyi Kang, Shie Mannor, Jiashi Feng

Reinforcement Learning (RL) with safety guarantee is critical for agents performing tasks in risky environments.

Reinforcement Learning (RL)

Online Limited Memory Neural-Linear Bandits

no code implementations1 Jan 2021 Tom Zahavy, Ofir Nabati, Leor Cohen, Shie Mannor

We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.

Efficient Exploration Multi-Armed Bandits +2

The Architectural Implications of Distributed Reinforcement Learning on CPU-GPU Systems

no code implementations8 Dec 2020 Ahmet Inci, Evgeny Bolotin, Yaosheng Fu, Gal Dalal, Shie Mannor, David Nellans, Diana Marculescu

With deep reinforcement learning (RL) methods achieving results that exceed human capabilities in games, robotics, and simulated environments, continued scaling of RL training is crucial to its deployment in solving complex real-world problems.

reinforcement-learning Reinforcement Learning (RL)

Detecting Rewards Deterioration in Episodic Reinforcement Learning

1 code implementation22 Oct 2020 Ido Greenberg, Shie Mannor

In many RL applications, once training ends, it is vital to detect any deterioration in the agent performance as soon as possible.

reinforcement-learning Reinforcement Learning (RL) +1

Controlling Graph Dynamics with Reinforcement Learning and Graph Neural Networks

no code implementations11 Oct 2020 Eli A. Meirom, Haggai Maron, Shie Mannor, Gal Chechik

We consider the problem of controlling a partially-observed dynamic process on a graph by a limited number of interventions.

Marketing reinforcement-learning +2

Drift Detection in Episodic Data: Detect When Your Agent Starts Faltering

no code implementations28 Sep 2020 Ido Greenberg, Shie Mannor

The statistical power of the new testing procedure is shown to outperform alternative tests - often by orders of magnitude - for a variety of environment modifications (which cause deterioration in agent performance).

Reinforcement Learning with Trajectory Feedback

no code implementations13 Aug 2020 Yonathan Efroni, Nadav Merlis, Shie Mannor

The standard feedback model of reinforcement learning requires revealing the reward of every visited state-action pair.

reinforcement-learning Reinforcement Learning (RL) +1

Lenient Regret for Multi-Armed Bandits

1 code implementation10 Aug 2020 Nadav Merlis, Shie Mannor

Importantly, we show that when the mean of the optimal arm is high enough, the lenient regret of $\epsilon$-TS is bounded by a constant.

Multi-Armed Bandits Thompson Sampling

Bandits with Partially Observable Confounded Data

no code implementations11 Jun 2020 Guy Tennenholtz, Uri Shalit, Shie Mannor, Yonathan Efroni

We construct a linear bandit algorithm that takes advantage of the projected information, and prove regret bounds.

Multi-Armed Bandits

Distributional Robustness and Regularization in Reinforcement Learning

no code implementations5 Mar 2020 Esther Derman, Shie Mannor

Distributionally Robust Optimization (DRO) has enabled to prove the equivalence between robustness and regularization in classification and regression, thus providing an analytical reason why regularization generalizes well in statistical learning.

Decision Making reinforcement-learning +1

Exploration-Exploitation in Constrained MDPs

no code implementations4 Mar 2020 Yonathan Efroni, Shie Mannor, Matteo Pirotta

In many sequential decision-making problems, the goal is to optimize a utility function while satisfying a set of constraints on different utilities.

Decision Making

Stealing Black-Box Functionality Using The Deep Neural Tree Architecture

1 code implementation23 Feb 2020 Daniel Teitelman, Itay Naeh, Shie Mannor

This paper makes a substantial step towards cloning the functionality of black-box models by introducing a Machine learning (ML) architecture named Deep Neural Trees (DNTs).

Active Learning

Optimistic Policy Optimization with Bandit Feedback

no code implementations ICML 2020 Yonathan Efroni, Lior Shani, Aviv Rosenberg, Shie Mannor

To the best of our knowledge, the two results are the first sub-linear regret bounds obtained for policy optimization algorithms with unknown transitions and bandit feedback.

Reinforcement Learning (RL)

Kalman meets Bellman: Improving Policy Evaluation through Value Tracking

1 code implementation17 Feb 2020 Shirli Di-Castro Shashua, Shie Mannor

These frameworks can learn uncertainties over the value parameters and exploit them for policy exploration.

Gaussian Processes Reinforcement Learning (RL)

Tight Lower Bounds for Combinatorial Multi-Armed Bandits

no code implementations13 Feb 2020 Nadav Merlis, Shie Mannor

The Combinatorial Multi-Armed Bandit problem is a sequential decision-making problem in which an agent selects a set of arms on each round, observes feedback for each of these arms and aims to maximize a known reward function of the arms it chose.

Decision Making Multi-Armed Bandits

Over-the-Air Adversarial Flickering Attacks against Video Recognition Networks

1 code implementation CVPR 2021 Roi Pony, Itay Naeh, Shie Mannor

In this work we present a manipulation scheme for fooling video classifiers by introducing a flickering temporal perturbation that in some cases may be unnoticeable by human observers and is implementable in the real world.

Action Classification Classification +5

Reward Tweaking: Maximizing the Total Reward While Planning for Short Horizons

no code implementations9 Feb 2020 Chen Tessler, Shie Mannor

In reinforcement learning, the discount factor $\gamma$ controls the agent's effective planning horizon.

Continuous Control reinforcement-learning +1

Never Worse, Mostly Better: Stable Policy Improvement in Deep Reinforcement Learning

no code implementations2 Oct 2019 Pranav Khanna, Guy Tennenholtz, Nadav Merlis, Shie Mannor, Chen Tessler

In recent years, there has been significant progress in applying deep reinforcement learning (RL) for solving challenging problems across a wide variety of domains.

Continuous Control reinforcement-learning +1

Partial Simulation for Imitation Learning

no code implementations25 Sep 2019 Nir Baram, Shie Mannor

Model-based imitation learning methods require full knowledge of the transition kernel for policy evaluation.

Imitation Learning Reinforcement Learning (RL)

Contextual Inverse Reinforcement Learning

no code implementations25 Sep 2019 Philip Korsunsky, Stav Belogolovsky, Tom Zahavy, Chen Tessler, Shie Mannor

In this setting, the reward, which is unknown to the agent, is a function of a static parameter referred to as the context.

reinforcement-learning Reinforcement Learning (RL)

Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients

no code implementations25 Sep 2019 Chen Tessler, Nadav Merlis, Shie Mannor

In recent years, advances in deep learning have enabled the application of reinforcement learning algorithms in complex domains.

reinforcement-learning Reinforcement Learning (RL)

Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood Matching

1 code implementation25 Sep 2019 Tom Zahavy, Shie Mannor

We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.

Efficient Exploration Multi-Armed Bandits +3

Online Planning with Lookahead Policies

no code implementations NeurIPS 2020 Yonathan Efroni, Mohammad Ghavamzadeh, Shie Mannor

This is the first work that proves improved sample complexity as a result of {\em increasing} the lookahead horizon in online planning.

Off-Policy Evaluation in Partially Observable Environments

no code implementations9 Sep 2019 Guy Tennenholtz, Shie Mannor, Uri Shalit

This work studies the problem of batch off-policy evaluation for Reinforcement Learning in partially observable environments.

Off-policy evaluation

Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs

no code implementations6 Sep 2019 Lior Shani, Yonathan Efroni, Shie Mannor

Trust region policy optimization (TRPO) is a popular and empirically successful policy search algorithm in Reinforcement Learning (RL) in which a surrogate problem, that restricts consecutive policies to be 'close' to one another, is iteratively solved.

Reinforcement Learning (RL)

Finite Sample Analysis Of Dynamic Regression Parameter Learning

no code implementations13 Jun 2019 Mark Kozdoba, Edward Moroshko, Shie Mannor, Koby Crammer

The proposed bounds depend on the shape of a certain spectrum related to the system operator, and thus provide the first known explicit geometric parameter of the data that can be used to bound estimation errors.

regression

Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies

1 code implementation NeurIPS 2019 Yonathan Efroni, Nadav Merlis, Mohammad Ghavamzadeh, Shie Mannor

In this paper, we focus on model-based RL in the finite-state finite-horizon MDP setting and establish that exploring with \emph{greedy policies} -- act by \emph{1-step planning} -- can achieve tight minimax performance in terms of regret, $\tilde{\mathcal{O}}(\sqrt{HSAT})$.

Model-based Reinforcement Learning reinforcement-learning +1

Action Assembly: Sparse Imitation Learning for Text Based Games with Combinatorial Action Spaces

no code implementations23 May 2019 Chen Tessler, Tom Zahavy, Deborah Cohen, Daniel J. Mankowitz, Shie Mannor

We propose a computationally efficient algorithm that combines compressed sensing with imitation learning to solve text-based games with combinatorial action spaces.

Imitation Learning text-based games +1

Inverse Reinforcement Learning in Contextual MDPs

2 code implementations23 May 2019 Stav Belogolovsky, Philip Korsunsky, Shie Mannor, Chen Tessler, Tom Zahavy

Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts).

Autonomous Driving reinforcement-learning +1

Distributional Policy Optimization: An Alternative Approach for Continuous Control

3 code implementations NeurIPS 2019 Chen Tessler, Guy Tennenholtz, Shie Mannor

We show that optimizing over such sets results in local movement in the action space and thus convergence to sub-optimal solutions.

Continuous Control Policy Gradient Methods

A Bayesian Approach to Robust Reinforcement Learning

no code implementations20 May 2019 Esther Derman, Daniel Mankowitz, Timothy Mann, Shie Mannor

Robust Markov Decision Processes (RMDPs) intend to ensure robustness with respect to changing or adversarial system behavior.

reinforcement-learning Reinforcement Learning (RL) +1

Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem

no code implementations8 May 2019 Nadav Merlis, Shie Mannor

We show that a linear dependence of the regret in the batch size in existing algorithms can be replaced by this smoothness parameter.

An adaptive stochastic optimization algorithm for resource allocation

no code implementations12 Feb 2019 Xavier Fontaine, Shie Mannor, Vianney Perchet

This can be recast as a specific stochastic optimization problem where the objective is to maximize the cumulative reward, or equivalently to minimize the regret.

Stochastic Optimization

The Natural Language of Actions

1 code implementation4 Feb 2019 Guy Tennenholtz, Shie Mannor

We introduce Act2Vec, a general framework for learning context-based action representation for Reinforcement Learning.

reinforcement-learning Reinforcement Learning (RL) +2

Value Propagation for Decentralized Networked Deep Multi-agent Reinforcement Learning

no code implementations NeurIPS 2019 Chao Qu, Shie Mannor, Huan Xu, Yuan Qi, Le Song, Junwu Xiong

To the best of our knowledge, it is the first MARL algorithm with convergence guarantee in the control, off-policy and non-linear function approximation setting.

Multi-agent Reinforcement Learning reinforcement-learning +1

Deep Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood Matching

no code implementations24 Jan 2019 Tom Zahavy, Shie Mannor

We study the neural-linear bandit model for solving sequential decision-making problems with high dimensional side information.

Decision Making Efficient Exploration +4

Trust Region Value Optimization using Kalman Filtering

no code implementations23 Jan 2019 Shirli Di-Castro Shashua, Shie Mannor

However, this approach ignores certain distributional properties of both the errors and value parameters.

Multi Instance Learning For Unbalanced Data

no code implementations17 Dec 2018 Mark Kozdoba, Edward Moroshko, Lior Shani, Takuya Takagi, Takashi Katoh, Shie Mannor, Koby Crammer

In the context of Multi Instance Learning, we analyze the Single Instance (SI) learning objective.

Exploration Conscious Reinforcement Learning Revisited

1 code implementation13 Dec 2018 Lior Shani, Yonathan Efroni, Shie Mannor

We continue and analyze properties of exploration-conscious optimal policies and characterize two general approaches to solve such criteria.

reinforcement-learning Reinforcement Learning (RL)

Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning

no code implementations NeurIPS 2018 Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control.

Model Predictive Control reinforcement-learning +1

On-Line Learning of Linear Dynamical Systems: Exponential Forgetting in Kalman Filters

1 code implementation AAAI 2019 Mark Kozdoba, Jakub Marecek, Tigran Tchrakian, Shie Mannor

Based on this insight, we devise an on-line algorithm for improper learning of a linear dynamical system (LDS), which considers only a few most recent observations.

regression Time Series +1

Inspiration Learning through Preferences

no code implementations16 Sep 2018 Nir Baram, Shie Mannor

We denote this setup as \textit{Inspiration Learning} - knowledge transfer between agents that operate in different action spaces.

Imitation Learning Transfer Learning

How to Combine Tree-Search Methods in Reinforcement Learning

no code implementations6 Sep 2018 Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Finite-horizon lookahead policies are abundantly used in Reinforcement Learning and demonstrate impressive empirical success.

reinforcement-learning Reinforcement Learning (RL)

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

no code implementations NeurIPS 2018 Tom Zahavy, Matan Haroush, Nadav Merlis, Daniel J. Mankowitz, Shie Mannor

Learning how to act when there are many available actions in each state is a challenging task for Reinforcement Learning (RL) agents, especially when many of the actions are redundant or irrelevant.

reinforcement-learning Reinforcement Learning (RL) +1

Multi-user Communication Networks: A Coordinated Multi-armed Bandit Approach

no code implementations14 Aug 2018 Orly Avner, Shie Mannor

Communication networks shared by many users are a widespread challenge nowadays.

A General Framework for Bandit Problems Beyond Cumulative Objectives

no code implementations4 Jun 2018 Asaf Cassel, Shie Mannor, Assaf Zeevi

Unlike the case of cumulative criteria, in the problems we study here the oracle policy, that knows the problem parameters a priori and is used to "center" the regret, is not trivial.

Multi-Armed Bandits

Multiple-Step Greedy Policies in Online and Approximate Reinforcement Learning

no code implementations21 May 2018 Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control.

Model Predictive Control reinforcement-learning +1

Interdependent Gibbs Samplers

no code implementations11 Apr 2018 Mark Kozdoba, Shie Mannor

Gibbs sampling, as a model learning method, is known to produce the most accurate results available in a variety of domains, and is a de facto standard in these domains.

Deep Learning Reconstruction of Ultra-Short Pulses

no code implementations15 Mar 2018 Tom Zahavy, Alex Dikopoltsev, Oren Cohen, Shie Mannor, Mordechai Segev

Ultra-short laser pulses with femtosecond to attosecond pulse duration are the shortest systematic events humans can create.

Soft-Robust Actor-Critic Policy-Gradient

no code implementations11 Mar 2018 Esther Derman, Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

It learns an optimal policy with respect to a distribution over an uncertainty set and stays robust to model uncertainty but avoids the conservativeness of robust strategies.

reinforcement-learning Reinforcement Learning (RL)

Train on Validation: Squeezing the Data Lemon

no code implementations16 Feb 2018 Guy Tennenholtz, Tom Zahavy, Shie Mannor

We define the notion of on-average-validation-stable algorithms as one in which using small portions of validation data for training does not overfit the model selection process.

Model Selection

Learning Robust Options

no code implementations9 Feb 2018 Daniel J. Mankowitz, Timothy A. Mann, Pierre-Luc Bacon, Doina Precup, Shie Mannor

We present a Robust Options Policy Iteration (ROPI) algorithm with convergence guarantees, which learns options that are robust to model uncertainty.

The Stochastic Firefighter Problem

no code implementations22 Nov 2017 Guy Tennenholtz, Constantine Caramanis, Shie Mannor

We devise a simple policy that only vaccinates neighbors of infected nodes and is optimal on regular trees and on general graphs for a sufficiently large budget.

Situationally Aware Options

no code implementations20 Nov 2017 Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

We learn reusable options in different scenarios in a RoboCup soccer domain (i. e., winning/losing).

End-to-End Differentiable Adversarial Imitation Learning

no code implementations ICML 2017 Nir Baram, Oron Anschel, Itai Caspi, Shie Mannor

Generative Adversarial Networks (GANs) have been successfully applied to the problem of policy imitation in a model-free setup.

Imitation Learning

Multi-objective Bandits: Optimizing the Generalized Gini Index

no code implementations ICML 2017 Robert Busa-Fekete, Balazs Szorenyi, Paul Weng, Shie Mannor

We study the multi-armed bandit (MAB) problem where the agent receives a vectorial feedback that encodes many possibly competing objectives to be optimized.

Shallow Updates for Deep Reinforcement Learning

no code implementations NeurIPS 2017 Nir Levine, Tom Zahavy, Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

In this work we propose a hybrid approach -- the Least Squares Deep Q-Network (LS-DQN), which combines rich feature representations learned by a DRL algorithm with the stability of a linear least squares method.

Atari Games Feature Engineering +2

Deep Robust Kalman Filter

no code implementations7 Mar 2017 Shirli Di-Castro Shashua, Shie Mannor

The Deep-RoK algorithm is a robust Bayesian method, based on the Extended Kalman Filter (EKF), that accounts for both the uncertainty in the weights of the approximated value function and the uncertainty in the transition probabilities, improving the robustness of the agent.

Decision Making

Online Learning with Many Experts

no code implementations25 Feb 2017 Alon Cohen, Shie Mannor

We study the problem of prediction with expert advice when the number of experts in question may be extremely large or even infinite.

Consistent On-Line Off-Policy Evaluation

no code implementations ICML 2017 Assaf Hallak, Shie Mannor

The problem of on-line off-policy evaluation (OPE) has been actively studied in the last decade due to its importance both as a stand-alone problem and as a module in a policy improvement scheme.

Off-policy evaluation

Rotting Bandits

no code implementations NeurIPS 2017 Nir Levine, Koby Crammer, Shie Mannor

In the classical MAB problem, a decision maker must choose an arm at each time step, upon which she receives a reward.

Multi-Armed Bandits

Outlier Robust Online Learning

no code implementations1 Jan 2017 Jiashi Feng, Huan Xu, Shie Mannor

We consider the problem of learning from noisy data in practical settings where the size of data is too large to store on a single machine.

Adaptive Lambda Least-Squares Temporal Difference Learning

no code implementations30 Dec 2016 Timothy A. Mann, Hugo Penedones, Shie Mannor, Todd Hester

Temporal Difference learning or TD($\lambda$) is a fundamental algorithm in the field of reinforcement learning.

Supervised Learning for Optimal Power Flow as a Real-Time Proxy

no code implementations20 Dec 2016 Raphael Canyasse, Gal Dalal, Shie Mannor

In this work we design and compare different supervised learning algorithms to compute the cost of Alternating Current Optimal Power Flow (ACOPF).

Model-based Adversarial Imitation Learning

no code implementations7 Dec 2016 Nir Baram, Oron Anschel, Shie Mannor

A model-based approach for the problem of adversarial imitation learning.

Imitation Learning

Adaptive Skills Adaptive Partitions (ASAP)

no code implementations NeurIPS 2016 Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that (1) learns skills (i. e., temporally extended actions or options) as well as (2) where to apply them.

Unit Commitment using Nearest Neighbor as a Short-Term Proxy

no code implementations30 Nov 2016 Gal Dalal, Elad Gilboa, Shie Mannor, Louis Wehenkel

We devise the Unit Commitment Nearest Neighbor (UCNN) algorithm to be used as a proxy for quickly approximating outcomes of short-term decisions, to make tractable hierarchical long-term assessment and planning for large power systems.

Situational Awareness by Risk-Conscious Skills

no code implementations10 Oct 2016 Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

In addition, the learned risk aware skills are able to mitigate reward-based model misspecification.

Hierarchical Reinforcement Learning

A nonparametric sequential test for online randomized experiments

no code implementations8 Oct 2016 Vineet Abhishek, Shie Mannor

The proposed test does not require knowledge of the underlying probability distribution generating the data.

Bayesian Reinforcement Learning: A Survey

no code implementations14 Sep 2016 Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, Aviv Tamar

The objective of the paper is to provide a comprehensive survey on Bayesian RL algorithms and their theoretical and empirical properties.

Bayesian Inference reinforcement-learning +1

How to Allocate Resources For Features Acquisition?

no code implementations10 Jul 2016 Oran Richman, Shie Mannor

We study classification problems where features are corrupted by noise and where the magnitude of the noise in each feature is influenced by the resources allocated to its acquisition.

General Classification

Visualizing Dynamics: from t-SNE to SEMI-MDPs

no code implementations22 Jun 2016 Nir Ben Zrihem, Tom Zahavy, Shie Mannor

Deep Reinforcement Learning (DRL) is a trending field of research, showing great promise in many challenging problems such as playing Atari, solving Go and controlling robots.

Deep Reinforcement Learning Discovers Internal Models

no code implementations16 Jun 2016 Nir Baram, Tom Zahavy, Shie Mannor

Deep Reinforcement Learning (DRL) is a trending field of research, showing great promise in challenging problems such as playing Atari, solving Go and controlling robots.

reinforcement-learning Reinforcement Learning (RL)

Bending the Curve: Improving the ROC Curve Through Error Redistribution

no code implementations21 May 2016 Oran Richman, Shie Mannor

Features that hold information about the "difficulty" of the data may be non-discriminative and are therefore disregarded in the classification process.

General Classification Meta-Learning

A Reinforcement Learning System to Encourage Physical Activity in Diabetes Patients

no code implementations13 May 2016 Irit Hochberg, Guy Feraru, Mark Kozdoba, Shie Mannor, Moshe Tennenholtz, Elad Yom-Tov

Messages were personalized through a Reinforcement Learning (RL) algorithm which optimized messages to improve each participant's compliance with the activity regimen.

reinforcement-learning Reinforcement Learning (RL)

Clustering Time Series and the Surprising Robustness of HMMs

no code implementations9 May 2016 Mark Kozdoba, Shie Mannor

Suppose that we are given a time series where consecutive samples are believed to come from a probabilistic source, that the source changes from time to time and that the total number of sources is fixed.

Clustering Time Series +1

A Deep Hierarchical Approach to Lifelong Learning in Minecraft

no code implementations25 Apr 2016 Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J. Mankowitz, Shie Mannor

Skill distillation enables the HDRLN to efficiently retain knowledge and therefore scale in lifelong learning, by accumulating knowledge and encapsulating multiple reusable skills into a single distilled network.

Hierarchical Decision Making In Electricity Grid Management

no code implementations6 Mar 2016 Gal Dalal, Elad Gilboa, Shie Mannor

The power grid is a complex and vital system that necessitates careful reliability management.

Decision Making Management +1

Iterative Hierarchical Optimization for Misspecified Problems (IHOMP)

no code implementations10 Feb 2016 Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

For complex, high-dimensional Markov Decision Processes (MDPs), it may be necessary to represent the policy with function approximation.

Adaptive Skills, Adaptive Partitions (ASAP)

no code implementations10 Feb 2016 Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that (1) learns skills (i. e., temporally extended actions or options) as well as (2) where to apply them.

Graying the black box: Understanding DQNs

no code implementations8 Feb 2016 Tom Zahavy, Nir Ben Zrihem, Shie Mannor

In recent years there is a growing interest in using deep representations for reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Ensemble Robustness and Generalization of Stochastic Deep Learning Algorithms

no code implementations ICLR 2018 Tom Zahavy, Bingyi Kang, Alex Sivak, Jiashi Feng, Huan Xu, Shie Mannor

As most deep learning algorithms are stochastic (e. g., Stochastic Gradient Descent, Dropout, and Bayes-by-backprop), we revisit the robustness arguments of Xu & Mannor, and introduce a new approach, ensemble robustness, that concerns the robustness of a population of hypotheses.

Online Learning for Adversaries with Memory: Price of Past Mistakes

no code implementations NeurIPS 2015 Oren Anava, Elad Hazan, Shie Mannor

In this work we extend the notion of learning with memory to the general Online Convex Optimization (OCO) framework, and present two algorithms that attain low regret.

Learn on Source, Refine on Target:A Model Transfer Learning Framework with Random Forests

2 code implementations4 Nov 2015 Noam Segev, Maayan Harel, Shie Mannor, Koby Crammer, Ran El-Yaniv

We propose novel model transfer-learning methods that refine a decision forest model M learned within a "source" domain using a training set sampled from a "target" domain, assumed to be a variation of the source.

Transfer Learning

Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis

no code implementations17 Sep 2015 Assaf Hallak, Aviv Tamar, Remi Munos, Shie Mannor

We consider the off-policy evaluation problem in Markov decision processes with function approximation.

Off-policy evaluation

Emphatic TD Bellman Operator is a Contraction

no code implementations14 Aug 2015 Assaf Hallak, Aviv Tamar, Shie Mannor

Recently, \citet{SuttonMW15} introduced the emphatic temporal differences (ETD) algorithm for off-policy evaluation in Markov decision processes.

Off-policy evaluation

Reinforcement Learning for the Unit Commitment Problem

no code implementations19 Jul 2015 Gal Dalal, Shie Mannor

In this work we solve the day-ahead unit commitment (UC) problem, by formulating it as a Markov decision process (MDP) and finding a low-cost policy for generation scheduling.

reinforcement-learning Reinforcement Learning (RL) +1

Bootstrapping Skills

no code implementations11 Jun 2015 Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

The monolithic approach to policy representation in Markov Decision Processes (MDPs) looks for a single policy that can be represented as a function from states to actions.

Reinforcement Learning (RL)

Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach

no code implementations NeurIPS 2015 Yin-Lam Chow, Aviv Tamar, Shie Mannor, Marco Pavone

Our first contribution is to show that a CVaR objective, besides capturing risk sensitivity, has an alternative interpretation as expected cost under worst-case modeling errors, for a given error budget.

Decision Making

Multi-user lax communications: a multi-armed bandit approach

no code implementations30 Apr 2015 Orly Avner, Shie Mannor

Inspired by cognitive radio networks, we consider a setting where multiple users share several channels modeled as a multi-user multi-armed bandit (MAB) problem.

Overlapping Community Detection by Online Cluster Aggregation

no code implementations26 Apr 2015 Mark Kozdoba, Shie Mannor

We present a new online algorithm for detecting overlapping communities.

Community Detection

Actively Learning to Attract Followers on Twitter

no code implementations16 Apr 2015 Nir Levine, Timothy A. Mann, Shie Mannor

Twitter, a popular social network, presents great opportunities for on-line machine learning research.

BIG-bench Machine Learning

Policy Gradient for Coherent Risk Measures

no code implementations NeurIPS 2015 Aviv Tamar, Yin-Lam Chow, Mohammad Ghavamzadeh, Shie Mannor

For static risk measures, our approach is in the spirit of policy gradient algorithms and combines a standard sampling approach with convex programming.

Policy Gradient Methods

Off-policy evaluation for MDPs with unknown structure

no code implementations11 Feb 2015 Assaf Hallak, François Schnitzler, Timothy Mann, Shie Mannor

Off-policy learning in dynamic decision problems is essential for providing strong evidence that a new policy is better than the one in use.

Off-policy evaluation

Contextual Markov Decision Processes

no code implementations8 Feb 2015 Assaf Hallak, Dotan Di Castro, Shie Mannor

The objective is to learn a strategy that maximizes the accumulated reward across all contexts.

Implicit Temporal Differences

no code implementations21 Dec 2014 Aviv Tamar, Panos Toulis, Shie Mannor, Edoardo M. Airoldi

In reinforcement learning, the TD($\lambda$) algorithm is a fundamental policy evaluation method with an efficient online implementation that is suitable for large-scale problems.

How hard is my MDP?" The distribution-norm to the rescue"

no code implementations NeurIPS 2014 Odalric-Ambrym Maillard, Timothy A. Mann, Shie Mannor

In Reinforcement Learning (RL), state-of-the-art algorithms require a large number of samples per state-action pair to estimate the transition kernel $p$.

Reinforcement Learning (RL)

Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback

no code implementations30 Sep 2014 Noga Alon, Nicolò Cesa-Bianchi, Claudio Gentile, Shie Mannor, Yishay Mansour, Ohad Shamir

This naturally models several situations where the losses of different actions are related, and knowing the loss of one action provides information on the loss of other actions.

Multi-Armed Bandits

Distributed Robust Learning

no code implementations21 Sep 2014 Jiashi Feng, Huan Xu, Shie Mannor

We propose a framework for distributed robust statistical learning on {\em big contaminated data}.

Thompson Sampling for Learning Parameterized Markov Decision Processes

no code implementations29 Jun 2014 Aditya Gopalan, Shie Mannor

We consider reinforcement learning in parameterized Markov Decision Processes (MDPs), where the parameterization may induce correlation across transition probabilities or rewards.

reinforcement-learning Reinforcement Learning (RL) +1

Concurrent bandits and cognitive radio networks

no code implementations22 Apr 2014 Orly Avner, Shie Mannor

Even the number of users may be unknown and can vary as users join or leave the network.

Collision Avoidance

Optimizing the CVaR via Sampling

1 code implementation15 Apr 2014 Aviv Tamar, Yonatan Glassner, Shie Mannor

Conditional Value at Risk (CVaR) is a prominent risk measure that is being used extensively in various domains.

reinforcement-learning Reinforcement Learning (RL)

Oracle-Based Robust Optimization via Online Learning

no code implementations25 Feb 2014 Aharon Ben-Tal, Elad Hazan, Tomer Koren, Shie Mannor

Robust optimization is a common framework in optimization under uncertainty when the problem parameters are not known, but it is rather known that the parameters belong to some given uncertainty set.

Approachability in unknown games: Online learning meets multi-objective optimization

no code implementations10 Feb 2014 Shie Mannor, Vianney Perchet, Gilles Stoltz

We show that it is impossible, in general, to approach the best target set in hindsight and propose achievable though ambitious alternative goals.

Localized epidemic detection in networks with overwhelming noise

no code implementations6 Feb 2014 Eli A. Meirom, Chris Milling, Constantine Caramanis, Shie Mannor, Ariel Orda, Sanjay Shakkottai

Our algorithm requires only local-neighbor knowledge of this graph, and in a broad array of settings that we describe, succeeds even when false negatives and false positives make up an overwhelming fraction of the data available.

Learning Multiple Models via Regularized Weighting

no code implementations NeurIPS 2013 Daniel Vainsencher, Shie Mannor, Huan Xu

We demonstrate the robustness benefits of our approach with some experimental results and prove for the important case of clustering that our approach has a non-trivial breakdown point, i. e., is guaranteed to be robust to a fixed percentage of adversarial unbounded outliers.

Clustering Generalization Bounds

Online PCA for Contaminated Data

no code implementations NeurIPS 2013 Jiashi Feng, Huan Xu, Shie Mannor, Shuicheng Yan

We consider the online Principal Component Analysis (PCA) for contaminated samples (containing outliers) which are revealed sequentially to the Principal Components (PCs) estimator.

Reinforcement Learning in Robust Markov Decision Processes

no code implementations NeurIPS 2013 Shiau Hong Lim, Huan Xu, Shie Mannor

An important challenge in Markov decision processes is to ensure robustness with respect to unexpected or adversarial system behavior while taking advantage of well-behaving parts of the system.

reinforcement-learning Reinforcement Learning (RL)

Thompson Sampling for Complex Bandit Problems

no code implementations3 Nov 2013 Aditya Gopalan, Shie Mannor, Yishay Mansour

We consider stochastic multi-armed bandit problems with complex actions over a set of basic arms, where the decision maker plays a complex action rather than a basic arm in each round.

Thompson Sampling

Variance Adjusted Actor Critic Algorithms

no code implementations14 Oct 2013 Aviv Tamar, Shie Mannor

We present an actor-critic framework for MDPs where the objective is the variance-adjusted expected return.

Scaling Up Robust MDPs by Reinforcement Learning

no code implementations26 Jun 2013 Aviv Tamar, Huan Xu, Shie Mannor

We consider large-scale Markov decision processes (MDPs) with parameter uncertainty, under the robust MDP paradigm.

reinforcement-learning Reinforcement Learning (RL)

A Primal Condition for Approachability with Partial Monitoring

no code implementations23 May 2013 Shie Mannor, Vianney Perchet, Gilles Stoltz

In this paper we provide primal conditions on a convex set to be approachable with partial monitoring.

Online Convex Optimization Against Adversaries with Memory and Application to Statistical Arbitrage

no code implementations27 Feb 2013 Oren Anava, Elad Hazan, Shie Mannor

The framework of online learning with memory naturally captures learning problems with temporal constraints, and was previously studied for the experts setting.

The Perturbed Variation

no code implementations NeurIPS 2012 Maayan Harel, Shie Mannor

We introduce a new discrepancy score between two distributions that gives an indication on their \emph{similarity}.

Two-sample testing

Policy Gradients with Variance Related Risk Criteria

no code implementations27 Jun 2012 Dotan Di Castro, Aviv Tamar, Shie Mannor

In this paper we devise a framework for local policy gradient style algorithms for reinforcement learning for variance related criteria.

Reinforcement Learning (RL)

Committing Bandits

no code implementations NeurIPS 2011 Loc X. Bui, Ramesh Johari, Shie Mannor

In the second phase the decision maker has to commit to one of the arms and stick with it.

From Bandits to Experts: On the Value of Side-Observations

no code implementations NeurIPS 2011 Shie Mannor, Ohad Shamir

We consider an adversarial online learning setting where a decision maker can choose an action in every stage of the game.

Multi-Armed Bandits

Online Classification with Specificity Constraints

no code implementations NeurIPS 2010 Andrey Bernstein, Shie Mannor, Nahum Shimkin

To our best knowledge, this is the first algorithm that addresses the problem of the average tp-rate maximization under average fp-rate constraints in the online setting.

Binary Classification Classification +2

Distributionally Robust Markov Decision Processes

no code implementations NeurIPS 2010 Huan Xu, Shie Mannor

We consider Markov decision processes where the values of the parameters are uncertain.

Robust Regression and Lasso

no code implementations NeurIPS 2008 Huan Xu, Constantine Caramanis, Shie Mannor

We generalize this robust formulation to consider more general uncertainty sets, which all lead to tractable convex optimization problems.

regression

Cannot find the paper you are looking for? You can Submit a new open access paper.