Search Results for author: Shie Mannor

Found 154 papers, 16 papers with code

On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning

no code implementations13 Oct 2021 Guy Tennenholtz, Assaf Hallak, Gal Dalal, Shie Mannor, Gal Chechik, Uri Shalit

We analyze the limitations of learning from such data with and without external reward, and propose an adjustment of standard imitation learning algorithms to fit this setup.

Imitation Learning Recommendation Systems

Twice regularized MDPs and the equivalence between robustness and regularization

no code implementations12 Oct 2021 Esther Derman, Matthieu Geist, Shie Mannor

We finally generalize regularized MDPs to twice regularized MDPs (R${}^2$ MDPs), i. e., MDPs with $\textit{both}$ value and policy regularization.

Dare not to Ask: Problem-Dependent Guarantees for Budgeted Bandits

no code implementations12 Oct 2021 Nadav Merlis, Yonathan Efroni, Shie Mannor

We consider a stochastic multi-armed bandit setting where feedback is limited by a (possibly time-dependent) budget, and reward must be actively inquired for it to be observed.

Continuous-Time Fitted Value Iteration for Robust Policies

no code implementations5 Oct 2021 Michael Lutter, Boris Belousov, Shie Mannor, Dieter Fox, Animesh Garg, Jan Peters

Especially for continuous control, solving this differential equation and its extension the Hamilton-Jacobi-Isaacs equation, is important as it yields the optimal policy that achieves the maximum reward on a give task.

Continuous Control

Sim and Real: Better Together

no code implementations1 Oct 2021 Shirli Di Castro Shashua, Dotan Di Castro, Shie Mannor

Simulation is used extensively in autonomous systems, particularly in robotic manipulation.

Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning

no code implementations22 Sep 2021 Roy Zohar, Shie Mannor, Guy Tennenholtz

Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.

Multi-agent Reinforcement Learning

Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction

no code implementations4 Jul 2021 Assaf Hallak, Gal Dalal, Steven Dalton, Iuri Frosio, Shie Mannor, Gal Chechik

We introduce Batch-BFS: a GPU breadth-first search that advances all nodes in each depth of the tree simultaneously.

Atari Games

Robust Value Iteration for Continuous Control Tasks

no code implementations25 May 2021 Michael Lutter, Shie Mannor, Jan Peters, Dieter Fox, Animesh Garg

The adversarial perturbations encourage a optimal policy that is robust to changes in the dynamics.

Continuous Control

Value Iteration in Continuous Actions, States and Time

no code implementations10 May 2021 Michael Lutter, Shie Mannor, Jan Peters, Dieter Fox, Animesh Garg

This algorithm enables dynamic programming for continuous states and actions with a known dynamics model.

Better than the Best: Gradient-based Improper Reinforcement Learning for Network Scheduling

no code implementations1 May 2021 Mohammani Zaki, Avi Mohan, Aditya Gopalan, Shie Mannor

We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay.

Noise Estimation Is Not Optimal: How to Use Kalman Filter the Right Way

1 code implementation6 Apr 2021 Ido Greenberg, Netanel Yannay, Shie Mannor

A huge body of research focuses on the task of estimation of the noise under various conditions, since precise noise estimation is considered equivalent to minimization of the filtering errors.

Noise Estimation

Maximum Entropy Reinforcement Learning with Mixture Policies

no code implementations18 Mar 2021 Nir Baram, Guy Tennenholtz, Shie Mannor

However, using mixture policies in the Maximum Entropy (MaxEnt) framework is not straightforward.

Continuous Control

GELATO: Geometrically Enriched Latent Model for Offline Reinforcement Learning

no code implementations22 Feb 2021 Guy Tennenholtz, Nir Baram, Shie Mannor

Offline reinforcement learning approaches can generally be divided to proximal and uncertainty-aware methods.

Offline RL

Action Redundancy in Reinforcement Learning

no code implementations22 Feb 2021 Nir Baram, Guy Tennenholtz, Shie Mannor

Maximum Entropy (MaxEnt) reinforcement learning is a powerful learning paradigm which seeks to maximize return under entropy regularization.

Improper Reinforcement Learning with Gradient-based Policy Optimization

no code implementations16 Feb 2021 Mohammadi Zaki, Avinash Mohan, Aditya Gopalan, Shie Mannor

We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process, and wishes to combine them optimally to produce a potentially new controller that can outperform each of the base ones.

Online Apprenticeship Learning

no code implementations13 Feb 2021 Lior Shani, Tom Zahavy, Shie Mannor

In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function.

RL for Latent MDPs: Regret Guarantees and a Lower Bound

no code implementations9 Feb 2021 Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

In this work, we consider the regret minimization problem for reinforcement learning in latent Markov Decision Processes (LMDP).

Online Limited Memory Neural-Linear Bandits with Likelihood Matching

no code implementations7 Feb 2021 Ofir Nabati, Tom Zahavy, Shie Mannor

To alleviate this, we propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.

Efficient Exploration Multi-Armed Bandits +1

Dimension Free Generalization Bounds for Non Linear Metric Learning

no code implementations7 Feb 2021 Mark Kozdoba, Shie Mannor

In this work we study generalization guarantees for the metric learning problem, where the metric is induced by a neural network type embedding of the data.

Generalization Bounds Metric Learning

Confidence-Budget Matching for Sequential Budgeted Learning

no code implementations5 Feb 2021 Yonathan Efroni, Nadav Merlis, Aadirupa Saha, Shie Mannor

We analyze the performance of CBM based algorithms in different settings and show that they perform well in the presence of adversity in the contexts, initial states, and budgets.

Decision Making Decision Making Under Uncertainty +2

Acting in Delayed Environments with Non-Stationary Markov Policies

1 code implementation ICLR 2021 Esther Derman, Gal Dalal, Shie Mannor

We introduce a framework for learning and planning in MDPs where the decision-maker commits actions that are executed with a delay of $m$ steps.

Q-Learning

Online Limited Memory Neural-Linear Bandits

no code implementations1 Jan 2021 Tom Zahavy, Ofir Nabati, Leor Cohen, Shie Mannor

We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.

Efficient Exploration Multi-Armed Bandits +2

Learning Safe Policies with Cost-sensitive Advantage Estimation

no code implementations1 Jan 2021 Bingyi Kang, Shie Mannor, Jiashi Feng

Reinforcement Learning (RL) with safety guarantee is critical for agents performing tasks in risky environments.

The Architectural Implications of Distributed Reinforcement Learning on CPU-GPU Systems

no code implementations8 Dec 2020 Ahmet Inci, Evgeny Bolotin, Yaosheng Fu, Gal Dalal, Shie Mannor, David Nellans, Diana Marculescu

With deep reinforcement learning (RL) methods achieving results that exceed human capabilities in games, robotics, and simulated environments, continued scaling of RL training is crucial to its deployment in solving complex real-world problems.

Detecting Rewards Deterioration in Episodic Reinforcement Learning

1 code implementation22 Oct 2020 Ido Greenberg, Shie Mannor

In many RL applications, once training ends, it is vital to detect any deterioration in the agent performance as soon as possible.

Two-sample testing

Controlling Graph Dynamics with Reinforcement Learning and Graph Neural Networks

no code implementations11 Oct 2020 Eli A. Meirom, Haggai Maron, Shie Mannor, Gal Chechik

We consider the problem of controlling a partially-observed dynamic process on a graph by a limited number of interventions.

Reinforcement Learning with Trajectory Feedback

no code implementations13 Aug 2020 Yonathan Efroni, Nadav Merlis, Shie Mannor

The standard feedback model of reinforcement learning requires revealing the reward of every visited state-action pair.

Lenient Regret for Multi-Armed Bandits

1 code implementation10 Aug 2020 Nadav Merlis, Shie Mannor

Importantly, we show that when the mean of the optimal arm is high enough, the lenient regret of $\epsilon$-TS is bounded by a constant.

Multi-Armed Bandits

Bandits with Partially Observable Confounded Data

no code implementations11 Jun 2020 Guy Tennenholtz, Uri Shalit, Shie Mannor, Yonathan Efroni

We construct a linear bandit algorithm that takes advantage of the projected information, and prove regret bounds.

Multi-Armed Bandits

Distributional Robustness and Regularization in Reinforcement Learning

no code implementations5 Mar 2020 Esther Derman, Shie Mannor

Distributionally Robust Optimization (DRO) has enabled to prove the equivalence between robustness and regularization in classification and regression, thus providing an analytical reason why regularization generalizes well in statistical learning.

Decision Making

Exploration-Exploitation in Constrained MDPs

no code implementations4 Mar 2020 Yonathan Efroni, Shie Mannor, Matteo Pirotta

In many sequential decision-making problems, the goal is to optimize a utility function while satisfying a set of constraints on different utilities.

Decision Making

Stealing Black-Box Functionality Using The Deep Neural Tree Architecture

1 code implementation23 Feb 2020 Daniel Teitelman, Itay Naeh, Shie Mannor

This paper makes a substantial step towards cloning the functionality of black-box models by introducing a Machine learning (ML) architecture named Deep Neural Trees (DNTs).

Active Learning

Optimistic Policy Optimization with Bandit Feedback

no code implementations ICML 2020 Yonathan Efroni, Lior Shani, Aviv Rosenberg, Shie Mannor

To the best of our knowledge, the two results are the first sub-linear regret bounds obtained for policy optimization algorithms with unknown transitions and bandit feedback.

Kalman meets Bellman: Improving Policy Evaluation through Value Tracking

1 code implementation17 Feb 2020 Shirli Di-Castro Shashua, Shie Mannor

These frameworks can learn uncertainties over the value parameters and exploit them for policy exploration.

Gaussian Processes

Tight Lower Bounds for Combinatorial Multi-Armed Bandits

no code implementations13 Feb 2020 Nadav Merlis, Shie Mannor

The Combinatorial Multi-Armed Bandit problem is a sequential decision-making problem in which an agent selects a set of arms on each round, observes feedback for each of these arms and aims to maximize a known reward function of the arms it chose.

Decision Making Multi-Armed Bandits

Over-the-Air Adversarial Flickering Attacks against Video Recognition Networks

1 code implementation CVPR 2021 Roi Pony, Itay Naeh, Shie Mannor

In this work we present a manipulation scheme for fooling video classifiers by introducing a flickering temporal perturbation that in some cases may be unnoticeable by human observers and is implementable in the real world.

Action Classification Classification +5

Reward Tweaking: Maximizing the Total Reward While Planning for Short Horizons

no code implementations9 Feb 2020 Chen Tessler, Shie Mannor

In reinforcement learning, the discount factor $\gamma$ controls the agent's effective planning horizon.

Continuous Control

Stabilizing Deep Reinforcement Learning with Conservative Updates

no code implementations2 Oct 2019 Chen Tessler, Nadav Merlis, Shie Mannor

In recent years, advances in deep learning have enabled the application of reinforcement learning algorithms in complex domains.

Language is Power: Representing States Using Natural Language in Reinforcement Learning

no code implementations2 Oct 2019 Erez Schwartz, Guy Tennenholtz, Chen Tessler, Shie Mannor

Recent advances in reinforcement learning have shown its potential to tackle complex real-life tasks.

Online Planning with Lookahead Policies

no code implementations NeurIPS 2020 Yonathan Efroni, Mohammad Ghavamzadeh, Shie Mannor

This is the first work that proves improved sample complexity as a result of {\em increasing} the lookahead horizon in online planning.

Off-Policy Evaluation in Partially Observable Environments

no code implementations9 Sep 2019 Guy Tennenholtz, Shie Mannor, Uri Shalit

This work studies the problem of batch off-policy evaluation for Reinforcement Learning in partially observable environments.

Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs

no code implementations6 Sep 2019 Lior Shani, Yonathan Efroni, Shie Mannor

Trust region policy optimization (TRPO) is a popular and empirically successful policy search algorithm in Reinforcement Learning (RL) in which a surrogate problem, that restricts consecutive policies to be 'close' to one another, is iteratively solved.

Practical Risk Measures in Reinforcement Learning

no code implementations22 Aug 2019 Dotan Di Castro, Joel Oren, Shie Mannor

Practical application of Reinforcement Learning (RL) often involves risk considerations.

Variance Estimation For Dynamic Regression via Spectrum Thresholding

no code implementations13 Jun 2019 Mark Kozdoba, Edward Moroshko, Shie Mannor, Koby Crammer

This problem can be modeled as a linear dynamical system, where the parameters that need to be learned are the variance of both the process noise and the observation noise.

Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies

1 code implementation NeurIPS 2019 Yonathan Efroni, Nadav Merlis, Mohammad Ghavamzadeh, Shie Mannor

In this paper, we focus on model-based RL in the finite-state finite-horizon MDP setting and establish that exploring with \emph{greedy policies} -- act by \emph{1-step planning} -- can achieve tight minimax performance in terms of regret, $\tilde{\mathcal{O}}(\sqrt{HSAT})$.

Model-based Reinforcement Learning

Inverse Reinforcement Learning in Contextual MDPs

2 code implementations23 May 2019 Stav Belogolovsky, Philip Korsunsky, Shie Mannor, Chen Tessler, Tom Zahavy

Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts).

Autonomous Driving

Action Assembly: Sparse Imitation Learning for Text Based Games with Combinatorial Action Spaces

no code implementations23 May 2019 Chen Tessler, Tom Zahavy, Deborah Cohen, Daniel J. Mankowitz, Shie Mannor

We propose a computationally efficient algorithm that combines compressed sensing with imitation learning to solve text-based games with combinatorial action spaces.

Decision Making Imitation Learning +2

Distributional Policy Optimization: An Alternative Approach for Continuous Control

3 code implementations NeurIPS 2019 Chen Tessler, Guy Tennenholtz, Shie Mannor

We show that optimizing over such sets results in local movement in the action space and thus convergence to sub-optimal solutions.

Continuous Control Policy Gradient Methods

A Bayesian Approach to Robust Reinforcement Learning

no code implementations20 May 2019 Esther Derman, Daniel Mankowitz, Timothy Mann, Shie Mannor

Robust Markov Decision Processes (RMDPs) intend to ensure robustness with respect to changing or adversarial system behavior.

Safe Exploration

Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem

no code implementations8 May 2019 Nadav Merlis, Shie Mannor

We show that a linear dependence of the regret in the batch size in existing algorithms can be replaced by this smoothness parameter.

An adaptive stochastic optimization algorithm for resource allocation

no code implementations12 Feb 2019 Xavier Fontaine, Shie Mannor, Vianney Perchet

This can be recast as a specific stochastic optimization problem where the objective is to maximize the cumulative reward, or equivalently to minimize the regret.

Stochastic Optimization

The Natural Language of Actions

1 code implementation4 Feb 2019 Guy Tennenholtz, Shie Mannor

We introduce Act2Vec, a general framework for learning context-based action representation for Reinforcement Learning.

Starcraft Starcraft II

Value Propagation for Decentralized Networked Deep Multi-agent Reinforcement Learning

no code implementations NeurIPS 2019 Chao Qu, Shie Mannor, Huan Xu, Yuan Qi, Le Song, Junwu Xiong

To the best of our knowledge, it is the first MARL algorithm with convergence guarantee in the control, off-policy and non-linear function approximation setting.

Multi-agent Reinforcement Learning

Deep Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood Matching

no code implementations24 Jan 2019 Tom Zahavy, Shie Mannor

We study the neural-linear bandit model for solving sequential decision-making problems with high dimensional side information.

Decision Making Efficient Exploration +3

Trust Region Value Optimization using Kalman Filtering

no code implementations23 Jan 2019 Shirli Di-Castro Shashua, Shie Mannor

However, this approach ignores certain distributional properties of both the errors and value parameters.

Multi Instance Learning For Unbalanced Data

no code implementations17 Dec 2018 Mark Kozdoba, Edward Moroshko, Lior Shani, Takuya Takagi, Takashi Katoh, Shie Mannor, Koby Crammer

In the context of Multi Instance Learning, we analyze the Single Instance (SI) learning objective.

Exploration Conscious Reinforcement Learning Revisited

1 code implementation13 Dec 2018 Lior Shani, Yonathan Efroni, Shie Mannor

We continue and analyze properties of exploration-conscious optimal policies and characterize two general approaches to solve such criteria.

Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning

no code implementations NeurIPS 2018 Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control.

On-Line Learning of Linear Dynamical Systems: Exponential Forgetting in Kalman Filters

1 code implementation AAAI 2019 Mark Kozdoba, Jakub Marecek, Tigran Tchrakian, Shie Mannor

Based on this insight, we devise an on-line algorithm for improper learning of a linear dynamical system (LDS), which considers only a few most recent observations.

Time Series Time Series Forecasting

Inspiration Learning through Preferences

no code implementations16 Sep 2018 Nir Baram, Shie Mannor

We denote this setup as \textit{Inspiration Learning} - knowledge transfer between agents that operate in different action spaces.

Imitation Learning Transfer Learning

How to Combine Tree-Search Methods in Reinforcement Learning

no code implementations6 Sep 2018 Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Finite-horizon lookahead policies are abundantly used in Reinforcement Learning and demonstrate impressive empirical success.

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

no code implementations NeurIPS 2018 Tom Zahavy, Matan Haroush, Nadav Merlis, Daniel J. Mankowitz, Shie Mannor

Learning how to act when there are many available actions in each state is a challenging task for Reinforcement Learning (RL) agents, especially when many of the actions are redundant or irrelevant.

text-based games

Multi-user Communication Networks: A Coordinated Multi-armed Bandit Approach

no code implementations14 Aug 2018 Orly Avner, Shie Mannor

Communication networks shared by many users are a widespread challenge nowadays.

Beyond the One-Step Greedy Approach in Reinforcement Learning

no code implementations ICML 2018 Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation.

A General Framework for Bandit Problems Beyond Cumulative Objectives

no code implementations4 Jun 2018 Asaf Cassel, Shie Mannor, Assaf Zeevi

Unlike the case of cumulative criteria, in the problems we study here the oracle policy, that knows the problem parameters a priori and is used to "center" the regret, is not trivial.

Multi-Armed Bandits

Multiple-Step Greedy Policies in Online and Approximate Reinforcement Learning

no code implementations21 May 2018 Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control.

Interdependent Gibbs Samplers

no code implementations11 Apr 2018 Mark Kozdoba, Shie Mannor

Gibbs sampling, as a model learning method, is known to produce the most accurate results available in a variety of domains, and is a de facto standard in these domains.

Deep Learning Reconstruction of Ultra-Short Pulses

no code implementations15 Mar 2018 Tom Zahavy, Alex Dikopoltsev, Oren Cohen, Shie Mannor, Mordechai Segev

Ultra-short laser pulses with femtosecond to attosecond pulse duration are the shortest systematic events humans can create.

Soft-Robust Actor-Critic Policy-Gradient

no code implementations11 Mar 2018 Esther Derman, Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

It learns an optimal policy with respect to a distribution over an uncertainty set and stays robust to model uncertainty but avoids the conservativeness of robust strategies.

Train on Validation: Squeezing the Data Lemon

no code implementations16 Feb 2018 Guy Tennenholtz, Tom Zahavy, Shie Mannor

We define the notion of on-average-validation-stable algorithms as one in which using small portions of validation data for training does not overfit the model selection process.

Model Selection

Beyond the One Step Greedy Approach in Reinforcement Learning

no code implementations10 Feb 2018 Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation.

Learning Robust Options

no code implementations9 Feb 2018 Daniel J. Mankowitz, Timothy A. Mann, Pierre-Luc Bacon, Doina Precup, Shie Mannor

We present a Robust Options Policy Iteration (ROPI) algorithm with convergence guarantees, which learns options that are robust to model uncertainty.

The Stochastic Firefighter Problem

no code implementations22 Nov 2017 Guy Tennenholtz, Constantine Caramanis, Shie Mannor

We devise a simple policy that only vaccinates neighbors of infected nodes and is optimal on regular trees and on general graphs for a sufficiently large budget.

Situationally Aware Options

no code implementations20 Nov 2017 Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

We learn reusable options in different scenarios in a RoboCup soccer domain (i. e., winning/losing).

End-to-End Differentiable Adversarial Imitation Learning

no code implementations ICML 2017 Nir Baram, Oron Anschel, Itai Caspi, Shie Mannor

Generative Adversarial Networks (GANs) have been successfully applied to the problem of policy imitation in a model-free setup.

Imitation Learning

Multi-objective Bandits: Optimizing the Generalized Gini Index

no code implementations ICML 2017 Robert Busa-Fekete, Balazs Szorenyi, Paul Weng, Shie Mannor

We study the multi-armed bandit (MAB) problem where the agent receives a vectorial feedback that encodes many possibly competing objectives to be optimized.

Shallow Updates for Deep Reinforcement Learning

no code implementations NeurIPS 2017 Nir Levine, Tom Zahavy, Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

In this work we propose a hybrid approach -- the Least Squares Deep Q-Network (LS-DQN), which combines rich feature representations learned by a DRL algorithm with the stability of a linear least squares method.

Atari Games Feature Engineering

Finite Sample Analyses for TD(0) with Function Approximation

no code implementations4 Apr 2017 Gal Dalal, Balázs Szörényi, Gugan Thoppe, Shie Mannor

TD(0) is one of the most commonly used algorithms in reinforcement learning.

Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning

no code implementations15 Mar 2017 Gal Dalal, Balazs Szorenyi, Gugan Thoppe, Shie Mannor

Using this, we provide a concentration bound, which is the first such result for a two-timescale SA.

Deep Robust Kalman Filter

no code implementations7 Mar 2017 Shirli Di-Castro Shashua, Shie Mannor

The Deep-RoK algorithm is a robust Bayesian method, based on the Extended Kalman Filter (EKF), that accounts for both the uncertainty in the weights of the approximated value function and the uncertainty in the transition probabilities, improving the robustness of the agent.

Decision Making

Online Learning with Many Experts

no code implementations25 Feb 2017 Alon Cohen, Shie Mannor

We study the problem of prediction with expert advice when the number of experts in question may be extremely large or even infinite.

Consistent On-Line Off-Policy Evaluation

no code implementations ICML 2017 Assaf Hallak, Shie Mannor

The problem of on-line off-policy evaluation (OPE) has been actively studied in the last decade due to its importance both as a stand-alone problem and as a module in a policy improvement scheme.

Rotting Bandits

no code implementations NeurIPS 2017 Nir Levine, Koby Crammer, Shie Mannor

In the classical MAB problem, a decision maker must choose an arm at each time step, upon which she receives a reward.

Multi-Armed Bandits

Outlier Robust Online Learning

no code implementations1 Jan 2017 Jiashi Feng, Huan Xu, Shie Mannor

We consider the problem of learning from noisy data in practical settings where the size of data is too large to store on a single machine.

Adaptive Lambda Least-Squares Temporal Difference Learning

no code implementations30 Dec 2016 Timothy A. Mann, Hugo Penedones, Shie Mannor, Todd Hester

Temporal Difference learning or TD($\lambda$) is a fundamental algorithm in the field of reinforcement learning.

Supervised Learning for Optimal Power Flow as a Real-Time Proxy

no code implementations20 Dec 2016 Raphael Canyasse, Gal Dalal, Shie Mannor

In this work we design and compare different supervised learning algorithms to compute the cost of Alternating Current Optimal Power Flow (ACOPF).

Model-based Adversarial Imitation Learning

no code implementations7 Dec 2016 Nir Baram, Oron Anschel, Shie Mannor

A model-based approach for the problem of adversarial imitation learning.

Imitation Learning

Adaptive Skills Adaptive Partitions (ASAP)

no code implementations NeurIPS 2016 Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that (1) learns skills (i. e., temporally extended actions or options) as well as (2) where to apply them.

Unit Commitment using Nearest Neighbor as a Short-Term Proxy

no code implementations30 Nov 2016 Gal Dalal, Elad Gilboa, Shie Mannor, Louis Wehenkel

We devise the Unit Commitment Nearest Neighbor (UCNN) algorithm to be used as a proxy for quickly approximating outcomes of short-term decisions, to make tractable hierarchical long-term assessment and planning for large power systems.

Situational Awareness by Risk-Conscious Skills

no code implementations10 Oct 2016 Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

In addition, the learned risk aware skills are able to mitigate reward-based model misspecification.

Hierarchical Reinforcement Learning

A nonparametric sequential test for online randomized experiments

no code implementations8 Oct 2016 Vineet Abhishek, Shie Mannor

The proposed test does not require knowledge of the underlying probability distribution generating the data.

Bayesian Reinforcement Learning: A Survey

no code implementations14 Sep 2016 Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, Aviv Tamar

The objective of the paper is to provide a comprehensive survey on Bayesian RL algorithms and their theoretical and empirical properties.

Bayesian Inference

How to Allocate Resources For Features Acquisition?

no code implementations10 Jul 2016 Oran Richman, Shie Mannor

We study classification problems where features are corrupted by noise and where the magnitude of the noise in each feature is influenced by the resources allocated to its acquisition.

General Classification

Visualizing Dynamics: from t-SNE to SEMI-MDPs

no code implementations22 Jun 2016 Nir Ben Zrihem, Tom Zahavy, Shie Mannor

Deep Reinforcement Learning (DRL) is a trending field of research, showing great promise in many challenging problems such as playing Atari, solving Go and controlling robots.

Deep Reinforcement Learning Discovers Internal Models

no code implementations16 Jun 2016 Nir Baram, Tom Zahavy, Shie Mannor

Deep Reinforcement Learning (DRL) is a trending field of research, showing great promise in challenging problems such as playing Atari, solving Go and controlling robots.

Bending the Curve: Improving the ROC Curve Through Error Redistribution

no code implementations21 May 2016 Oran Richman, Shie Mannor

Features that hold information about the "difficulty" of the data may be non-discriminative and are therefore disregarded in the classification process.

General Classification Meta-Learning

A Reinforcement Learning System to Encourage Physical Activity in Diabetes Patients

no code implementations13 May 2016 Irit Hochberg, Guy Feraru, Mark Kozdoba, Shie Mannor, Moshe Tennenholtz, Elad Yom-Tov

Messages were personalized through a Reinforcement Learning (RL) algorithm which optimized messages to improve each participant's compliance with the activity regimen.

Clustering Time Series and the Surprising Robustness of HMMs

no code implementations9 May 2016 Mark Kozdoba, Shie Mannor

Suppose that we are given a time series where consecutive samples are believed to come from a probabilistic source, that the source changes from time to time and that the total number of sources is fixed.

Time Series

A Deep Hierarchical Approach to Lifelong Learning in Minecraft

no code implementations25 Apr 2016 Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J. Mankowitz, Shie Mannor

Skill distillation enables the HDRLN to efficiently retain knowledge and therefore scale in lifelong learning, by accumulating knowledge and encapsulating multiple reusable skills into a single distilled network.

Minecraft

Hierarchical Decision Making In Electricity Grid Management

no code implementations6 Mar 2016 Gal Dalal, Elad Gilboa, Shie Mannor

The power grid is a complex and vital system that necessitates careful reliability management.

Decision Making

Iterative Hierarchical Optimization for Misspecified Problems (IHOMP)

no code implementations10 Feb 2016 Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

For complex, high-dimensional Markov Decision Processes (MDPs), it may be necessary to represent the policy with function approximation.

Adaptive Skills, Adaptive Partitions (ASAP)

no code implementations10 Feb 2016 Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that (1) learns skills (i. e., temporally extended actions or options) as well as (2) where to apply them.

Graying the black box: Understanding DQNs

no code implementations8 Feb 2016 Tom Zahavy, Nir Ben Zrihem, Shie Mannor

In recent years there is a growing interest in using deep representations for reinforcement learning.

Ensemble Robustness and Generalization of Stochastic Deep Learning Algorithms

no code implementations ICLR 2018 Tom Zahavy, Bingyi Kang, Alex Sivak, Jiashi Feng, Huan Xu, Shie Mannor

As most deep learning algorithms are stochastic (e. g., Stochastic Gradient Descent, Dropout, and Bayes-by-backprop), we revisit the robustness arguments of Xu & Mannor, and introduce a new approach, ensemble robustness, that concerns the robustness of a population of hypotheses.

Online Learning for Adversaries with Memory: Price of Past Mistakes

no code implementations NeurIPS 2015 Oren Anava, Elad Hazan, Shie Mannor

In this work we extend the notion of learning with memory to the general Online Convex Optimization (OCO) framework, and present two algorithms that attain low regret.

Learn on Source, Refine on Target:A Model Transfer Learning Framework with Random Forests

1 code implementation4 Nov 2015 Noam Segev, Maayan Harel, Shie Mannor, Koby Crammer, Ran El-Yaniv

We propose novel model transfer-learning methods that refine a decision forest model M learned within a "source" domain using a training set sampled from a "target" domain, assumed to be a variation of the source.

Transfer Learning

Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis

no code implementations17 Sep 2015 Assaf Hallak, Aviv Tamar, Remi Munos, Shie Mannor

We consider the off-policy evaluation problem in Markov decision processes with function approximation.

Emphatic TD Bellman Operator is a Contraction

no code implementations14 Aug 2015 Assaf Hallak, Aviv Tamar, Shie Mannor

Recently, \citet{SuttonMW15} introduced the emphatic temporal differences (ETD) algorithm for off-policy evaluation in Markov decision processes.

Reinforcement Learning for the Unit Commitment Problem

no code implementations19 Jul 2015 Gal Dalal, Shie Mannor

In this work we solve the day-ahead unit commitment (UC) problem, by formulating it as a Markov decision process (MDP) and finding a low-cost policy for generation scheduling.

Bootstrapping Skills

no code implementations11 Jun 2015 Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

The monolithic approach to policy representation in Markov Decision Processes (MDPs) looks for a single policy that can be represented as a function from states to actions.

Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach

no code implementations NeurIPS 2015 Yin-Lam Chow, Aviv Tamar, Shie Mannor, Marco Pavone

Our first contribution is to show that a CVaR objective, besides capturing risk sensitivity, has an alternative interpretation as expected cost under worst-case modeling errors, for a given error budget.

Decision Making

Multi-user lax communications: a multi-armed bandit approach

no code implementations30 Apr 2015 Orly Avner, Shie Mannor

Inspired by cognitive radio networks, we consider a setting where multiple users share several channels modeled as a multi-user multi-armed bandit (MAB) problem.

Overlapping Community Detection by Online Cluster Aggregation

no code implementations26 Apr 2015 Mark Kozdoba, Shie Mannor

We present a new online algorithm for detecting overlapping communities.

Community Detection

Actively Learning to Attract Followers on Twitter

no code implementations16 Apr 2015 Nir Levine, Timothy A. Mann, Shie Mannor

Twitter, a popular social network, presents great opportunities for on-line machine learning research.

Policy Gradient for Coherent Risk Measures

no code implementations NeurIPS 2015 Aviv Tamar, Yin-Lam Chow, Mohammad Ghavamzadeh, Shie Mannor

For static risk measures, our approach is in the spirit of policy gradient algorithms and combines a standard sampling approach with convex programming.

Policy Gradient Methods

Off-policy evaluation for MDPs with unknown structure

no code implementations11 Feb 2015 Assaf Hallak, François Schnitzler, Timothy Mann, Shie Mannor

Off-policy learning in dynamic decision problems is essential for providing strong evidence that a new policy is better than the one in use.

Contextual Markov Decision Processes

no code implementations8 Feb 2015 Assaf Hallak, Dotan Di Castro, Shie Mannor

The objective is to learn a strategy that maximizes the accumulated reward across all contexts.

Implicit Temporal Differences

no code implementations21 Dec 2014 Aviv Tamar, Panos Toulis, Shie Mannor, Edoardo M. Airoldi

In reinforcement learning, the TD($\lambda$) algorithm is a fundamental policy evaluation method with an efficient online implementation that is suitable for large-scale problems.

How hard is my MDP?" The distribution-norm to the rescue"

no code implementations NeurIPS 2014 Odalric-Ambrym Maillard, Timothy A. Mann, Shie Mannor

In Reinforcement Learning (RL), state-of-the-art algorithms require a large number of samples per state-action pair to estimate the transition kernel $p$.

Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback

no code implementations30 Sep 2014 Noga Alon, Nicolò Cesa-Bianchi, Claudio Gentile, Shie Mannor, Yishay Mansour, Ohad Shamir

This naturally models several situations where the losses of different actions are related, and knowing the loss of one action provides information on the loss of other actions.

Multi-Armed Bandits

Distributed Robust Learning

no code implementations21 Sep 2014 Jiashi Feng, Huan Xu, Shie Mannor

We propose a framework for distributed robust statistical learning on {\em big contaminated data}.

Thompson Sampling for Learning Parameterized Markov Decision Processes

no code implementations29 Jun 2014 Aditya Gopalan, Shie Mannor

We consider reinforcement learning in parameterized Markov Decision Processes (MDPs), where the parameterization may induce correlation across transition probabilities or rewards.

Concurrent bandits and cognitive radio networks

no code implementations22 Apr 2014 Orly Avner, Shie Mannor

Even the number of users may be unknown and can vary as users join or leave the network.

Optimizing the CVaR via Sampling

no code implementations15 Apr 2014 Aviv Tamar, Yonatan Glassner, Shie Mannor

Conditional Value at Risk (CVaR) is a prominent risk measure that is being used extensively in various domains.

Oracle-Based Robust Optimization via Online Learning

no code implementations25 Feb 2014 Aharon Ben-Tal, Elad Hazan, Tomer Koren, Shie Mannor

Robust optimization is a common framework in optimization under uncertainty when the problem parameters are not known, but it is rather known that the parameters belong to some given uncertainty set.

Approachability in unknown games: Online learning meets multi-objective optimization

no code implementations10 Feb 2014 Shie Mannor, Vianney Perchet, Gilles Stoltz

We show that it is impossible, in general, to approach the best target set in hindsight and propose achievable though ambitious alternative goals.

Localized epidemic detection in networks with overwhelming noise

no code implementations6 Feb 2014 Eli A. Meirom, Chris Milling, Constantine Caramanis, Shie Mannor, Ariel Orda, Sanjay Shakkottai

Our algorithm requires only local-neighbor knowledge of this graph, and in a broad array of settings that we describe, succeeds even when false negatives and false positives make up an overwhelming fraction of the data available.

Learning Multiple Models via Regularized Weighting

no code implementations NeurIPS 2013 Daniel Vainsencher, Shie Mannor, Huan Xu

We demonstrate the robustness benefits of our approach with some experimental results and prove for the important case of clustering that our approach has a non-trivial breakdown point, i. e., is guaranteed to be robust to a fixed percentage of adversarial unbounded outliers.

Generalization Bounds

Online PCA for Contaminated Data

no code implementations NeurIPS 2013 Jiashi Feng, Huan Xu, Shie Mannor, Shuicheng Yan

We consider the online Principal Component Analysis (PCA) for contaminated samples (containing outliers) which are revealed sequentially to the Principal Components (PCs) estimator.

Reinforcement Learning in Robust Markov Decision Processes

no code implementations NeurIPS 2013 Shiau Hong Lim, Huan Xu, Shie Mannor

An important challenge in Markov decision processes is to ensure robustness with respect to unexpected or adversarial system behavior while taking advantage of well-behaving parts of the system.

Thompson Sampling for Complex Bandit Problems

no code implementations3 Nov 2013 Aditya Gopalan, Shie Mannor, Yishay Mansour

We consider stochastic multi-armed bandit problems with complex actions over a set of basic arms, where the decision maker plays a complex action rather than a basic arm in each round.

Variance Adjusted Actor Critic Algorithms

no code implementations14 Oct 2013 Aviv Tamar, Shie Mannor

We present an actor-critic framework for MDPs where the objective is the variance-adjusted expected return.

Scaling Up Robust MDPs by Reinforcement Learning

no code implementations26 Jun 2013 Aviv Tamar, Huan Xu, Shie Mannor

We consider large-scale Markov decision processes (MDPs) with parameter uncertainty, under the robust MDP paradigm.

A Primal Condition for Approachability with Partial Monitoring

no code implementations23 May 2013 Shie Mannor, Vianney Perchet, Gilles Stoltz

In this paper we provide primal conditions on a convex set to be approachable with partial monitoring.

Online Convex Optimization Against Adversaries with Memory and Application to Statistical Arbitrage

no code implementations27 Feb 2013 Oren Anava, Elad Hazan, Shie Mannor

The framework of online learning with memory naturally captures learning problems with temporal constraints, and was previously studied for the experts setting.

The Perturbed Variation

no code implementations NeurIPS 2012 Maayan Harel, Shie Mannor

We introduce a new discrepancy score between two distributions that gives an indication on their \emph{similarity}.

Two-sample testing

From Bandits to Experts: On the Value of Side-Observations

no code implementations NeurIPS 2011 Shie Mannor, Ohad Shamir

We consider an adversarial online learning setting where a decision maker can choose an action in every stage of the game.

Multi-Armed Bandits

Committing Bandits

no code implementations NeurIPS 2011 Loc X. Bui, Ramesh Johari, Shie Mannor

In the second phase the decision maker has to commit to one of the arms and stick with it.

Distributionally Robust Markov Decision Processes

no code implementations NeurIPS 2010 Huan Xu, Shie Mannor

We consider Markov decision processes where the values of the parameters are uncertain.

Online Classification with Specificity Constraints

no code implementations NeurIPS 2010 Andrey Bernstein, Shie Mannor, Nahum Shimkin

To our best knowledge, this is the first algorithm that addresses the problem of the average tp-rate maximization under average fp-rate constraints in the online setting.

Classification General Classification

Regularized Policy Iteration

no code implementations NeurIPS 2008 Amir M. Farahmand, Mohammad Ghavamzadeh, Shie Mannor, Csaba Szepesvári

In this paper we consider approximate policy-iteration-based reinforcement learning algorithms.

L2 Regularization

Robust Regression and Lasso

no code implementations NeurIPS 2008 Huan Xu, Constantine Caramanis, Shie Mannor

We generalize this robust formulation to consider more general uncertainty sets, which all lead to tractable convex optimization problems.

Cannot find the paper you are looking for? You can Submit a new open access paper.