Search Results for author: Aviv Tamar

Found 46 papers, 17 papers with code

Validate on Sim, Detect on Real -- Model Selection for Domain Randomization

no code implementations1 Nov 2021 Gal Leibovich, Guy Jacob, Shadi Endrawis, Gal Novik, Aviv Tamar

We show that our score - VSDR - can significantly improve the accuracy of policy ranking without requiring additional real world data.

Model Selection Out-of-Distribution Detection +1

Regularization Guarantees Generalization in Bayesian Reinforcement Learning through Algorithmic Stability

no code implementations24 Sep 2021 Aviv Tamar, Daniel Soudry, Ev Zisselman

In the Bayesian reinforcement learning (RL) setting, a prior distribution over the unknown problem parameters -- the rewards and transitions -- is assumed, and a policy that optimizes the (posterior) expected return is sought.

reinforcement-learning

Offline Meta Reinforcement Learning -- Identifiability Challenges and Effective Data Collection Strategies

1 code implementation NeurIPS 2021 Ron Dorfman, Idan Shenfeld, Aviv Tamar

Consider the following instance of the Offline Meta Reinforcement Learning (OMRL) problem: given the complete training logs of $N$ conventional RL agents, trained on $N$ different tasks, design a meta-agent that can quickly maximize reward in a new, unseen task from the same task distribution.

Meta Reinforcement Learning reinforcement-learning

Efficient Self-Supervised Data Collection for Offline Robot Learning

no code implementations10 May 2021 Shadi Endrawis, Gal Leibovich, Guy Jacob, Gal Novik, Aviv Tamar

In this work, we propose that data collection policies should actively explore the environment to collect diverse data.

reinforcement-learning

Unsupervised Feature Learning for Manipulation with Contrastive Domain Randomization

1 code implementation ICLR Workshop SSL-RL 2021 Carmel Rabinovitz, Niko Grupen, Aviv Tamar

In this work, however, we show that a naive application of DR to unsupervised learning based on contrastive estimation does not promote invariance, as the loss function maximizes mutual information between the features and both the relevant and irrelevant visual properties.

Soft-IntroVAE: Analyzing and Improving the Introspective Variational Autoencoder

1 code implementation CVPR 2021 Tal Daniel, Aviv Tamar

However, the original IntroVAE loss function relied on a particular hinge-loss formulation that is very hard to stabilize in practice, and its theoretical convergence analysis ignored important terms in the loss.

Image Generation Out-of-Distribution Detection

Online Safety Assurance for Deep Reinforcement Learning

no code implementations7 Oct 2020 Noga H. Rotman, Michael Schapira, Aviv Tamar

We illustrate the usefulness of online safety assurance in the context of the proposed deep reinforcement learning (RL) approach to video streaming.

reinforcement-learning

Offline Meta Learning of Exploration

1 code implementation NeurIPS 2021 Ron Dorfman, Idan Shenfeld, Aviv Tamar

Consider the following instance of the Offline Meta Reinforcement Learning (OMRL) problem: given the complete training logs of $N$ conventional RL agents, trained on $N$ different tasks, design a meta-agent that can quickly maximize reward in a new, unseen task from the same task distribution.

Meta-Learning Meta Reinforcement Learning

Efficient MDP Analysis for Selfish-Mining in Blockchains

1 code implementation10 Jul 2020 Roi Bar Zur, Ittay Eyal, Aviv Tamar

We call this Probabilistic Termination Optimization (PTO), and the technique applies to any MDP whose utility is a ratio function.

Cryptography and Security

Sub-Goal Trees -- a Framework for Goal-Based Reinforcement Learning

no code implementations ICML 2020 Tom Jurgenson, Or Avner, Edward Groshev, Aviv Tamar

Reinforcement learning (RL), building on Bellman's optimality equation, naturally optimizes for a single goal, yet can be made multi-goal by augmenting the state with the goal.

Motion Planning reinforcement-learning

Hallucinative Topological Memory for Zero-Shot Visual Planning

1 code implementation ICML 2020 Kara Liu, Thanard Kurutach, Christine Tung, Pieter Abbeel, Aviv Tamar

In visual planning (VP), an agent learns to plan goal-directed behavior from observations of a dynamical system obtained offline, e. g., images obtained from self-supervised robot interaction.

Deep Residual Flow for Out of Distribution Detection

1 code implementation CVPR 2020 Ev Zisselman, Aviv Tamar

Specifically, we demonstrate the effectiveness of our method in ResNet and DenseNet architectures trained on various image datasets.

Out-of-Distribution Detection

Bayesian Relational Memory for Semantic Visual Navigation

1 code implementation ICCV 2019 Yi Wu, Yuxin Wu, Aviv Tamar, Stuart Russell, Georgia Gkioxari, Yuandong Tian

We introduce a new memory architecture, Bayesian Relational Memory (BRM), to improve the generalization ability for semantic visual navigation agents in unseen environments, where an agent is given a semantic target to navigate towards.

Visual Navigation

Sub-Goal Trees -- a Framework for Goal-Directed Trajectory Prediction and Optimization

no code implementations12 Jun 2019 Tom Jurgenson, Edward Groshev, Aviv Tamar

In such problems, the way we choose to represent a trajectory underlies algorithms for trajectory prediction and optimization.

Motion Planning reinforcement-learning +1

Harnessing Reinforcement Learning for Neural Motion Planning

1 code implementation1 Jun 2019 Tom Jurgenson, Aviv Tamar

We then propose a modification of the popular DDPG RL algorithm that is tailored to motion planning domains, by exploiting the known model in the problem and the set of solved plans in the data.

Motion Planning reinforcement-learning

Learning Robotic Manipulation through Visual Planning and Acting

no code implementations11 May 2019 Angelina Wang, Thanard Kurutach, Kara Liu, Pieter Abbeel, Aviv Tamar

We further demonstrate our approach on learning to imagine and execute in 3 environments, the final of which is deformable rope manipulation on a PR2 robot.

Visual Tracking

Domain Randomization for Active Pose Estimation

no code implementations10 Mar 2019 Xinyi Ren, Jianlan Luo, Eugen Solowjow, Juan Aparicio Ojea, Abhishek Gupta, Aviv Tamar, Pieter Abbeel

In this work, we investigate how to improve the accuracy of domain randomization based pose estimation.

Pose Estimation

Multi-Agent Reinforcement Learning with Multi-Step Generative Models

no code implementations29 Jan 2019 Orr Krupnik, Igor Mordatch, Aviv Tamar

We consider model-based reinforcement learning (MBRL) in 2-agent, high-fidelity continuous control problems -- an important domain for robots interacting with other agents in the same workspace.

Continuous Control Decision Making +3

Learning and Planning with a Semantic Model

no code implementations ICLR 2019 Yi Wu, Yuxin Wu, Aviv Tamar, Stuart Russell, Georgia Gkioxari, Yuandong Tian

Building deep reinforcement learning agents that can generalize and adapt to unseen environments remains a fundamental challenge for AI.

reinforcement-learning Visual Navigation

Safe Policy Learning from Observations

no code implementations27 Sep 2018 Elad Sarafian, Aviv Tamar, Sarit Kraus

The primary advantages of our approach, termed Rerouted Behavior Improvement (RBI), over other safe learning methods are its stability in the presence of value estimation errors and the elimination of a policy search process.

Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN

no code implementations6 Aug 2018 Dror Freirich, Ron Meir, Aviv Tamar

In this formulation, DiRL can be seen as learning a deep generative model of the value distribution, driven by the discrepancy between the distribution of the current value, and the distribution of the sum of current reward and next value.

Learning Plannable Representations with Causal InfoGAN

1 code implementation NeurIPS 2018 Thanard Kurutach, Aviv Tamar, Ge Yang, Stuart Russell, Pieter Abbeel

Finally, to generate a visual plan, we project the current and goal observations onto their respective states in the planning model, plan a trajectory, and then use the generative model to transform the trajectory to a sequence of observations.

Representation Learning

Constrained Policy Improvement for Safe and Efficient Reinforcement Learning

1 code implementation20 May 2018 Elad Sarafian, Aviv Tamar, Sarit Kraus

To minimize the improvement penalty, the RBI idea is to attenuate rapid policy changes of low probability actions which were less frequently sampled.

reinforcement-learning

Learning Robotic Assembly from CAD

no code implementations20 Mar 2018 Garrett Thomas, Melissa Chien, Aviv Tamar, Juan Aparicio Ojea, Pieter Abbeel

We propose to leverage this prior knowledge by guiding RL along a geometric motion plan, calculated using the CAD data.

Motion Planning

Model-Ensemble Trust-Region Policy Optimization

2 code implementations ICLR 2018 Thanard Kurutach, Ignasi Clavera, Yan Duan, Aviv Tamar, Pieter Abbeel

In this paper, we analyze the behavior of vanilla model-based reinforcement learning methods when deep neural networks are used to learn both the model and the policy, and show that the learned policy tends to exploit regions where insufficient data is available for the model to be learned, causing instability in training.

Continuous Control Model-based Reinforcement Learning +1

Safer Classification by Synthesis

no code implementations22 Nov 2017 William Wang, Angelina Wang, Aviv Tamar, Xi Chen, Pieter Abbeel

We posit that a generative approach is the natural remedy for this problem, and propose a method for classification using generative models.

Classification General Classification

Situationally Aware Options

no code implementations20 Nov 2017 Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

We learn reusable options in different scenarios in a RoboCup soccer domain (i. e., winning/losing).

Learning Generalized Reactive Policies using Deep Neural Networks

no code implementations24 Aug 2017 Edward Groshev, Maxwell Goldstein, Aviv Tamar, Siddharth Srivastava, Pieter Abbeel

We show that a deep neural network can be used to learn and represent a \emph{generalized reactive policy} (GRP) that maps a problem instance and a state to an action, and that the learned GRPs efficiently solve large classes of challenging problem instances.

Decision Making

A Machine Learning Approach to Routing

no code implementations10 Aug 2017 Asaf Valadarsky, Michael Schapira, Dafna Shahaf, Aviv Tamar

Can ideas and techniques from machine learning be leveraged to automatically generate "good" routing configurations?

reinforcement-learning

Constrained Policy Optimization

7 code implementations ICML 2017 Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel

For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function.

reinforcement-learning

Shallow Updates for Deep Reinforcement Learning

no code implementations NeurIPS 2017 Nir Levine, Tom Zahavy, Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

In this work we propose a hybrid approach -- the Least Squares Deep Q-Network (LS-DQN), which combines rich feature representations learned by a DRL algorithm with the stability of a linear least squares method.

Atari Games Feature Engineering +1

Situational Awareness by Risk-Conscious Skills

no code implementations10 Oct 2016 Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

In addition, the learned risk aware skills are able to mitigate reward-based model misspecification.

Hierarchical Reinforcement Learning

Learning from the Hindsight Plan -- Episodic MPC Improvement

1 code implementation28 Sep 2016 Aviv Tamar, Garrett Thomas, Tianhao Zhang, Sergey Levine, Pieter Abbeel

To bring the next real-world execution closer to the hindsight plan, our approach learns to re-shape the original cost function with the goal of satisfying the following property: short horizon planning (as realistic during real executions) with respect to the shaped cost should result in mimicking the hindsight plan.

Bayesian Reinforcement Learning: A Survey

no code implementations14 Sep 2016 Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, Aviv Tamar

The objective of the paper is to provide a comprehensive survey on Bayesian RL algorithms and their theoretical and empirical properties.

Bayesian Inference reinforcement-learning

Value Iteration Networks

8 code implementations NeurIPS 2016 Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, Pieter Abbeel

We introduce the value iteration network (VIN): a fully differentiable neural network with a `planning module' embedded within.

reinforcement-learning

Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis

no code implementations17 Sep 2015 Assaf Hallak, Aviv Tamar, Remi Munos, Shie Mannor

We consider the off-policy evaluation problem in Markov decision processes with function approximation.

Emphatic TD Bellman Operator is a Contraction

no code implementations14 Aug 2015 Assaf Hallak, Aviv Tamar, Shie Mannor

Recently, \citet{SuttonMW15} introduced the emphatic temporal differences (ETD) algorithm for off-policy evaluation in Markov decision processes.

Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach

no code implementations NeurIPS 2015 Yin-Lam Chow, Aviv Tamar, Shie Mannor, Marco Pavone

Our first contribution is to show that a CVaR objective, besides capturing risk sensitivity, has an alternative interpretation as expected cost under worst-case modeling errors, for a given error budget.

Decision Making

Policy Gradient for Coherent Risk Measures

no code implementations NeurIPS 2015 Aviv Tamar, Yin-Lam Chow, Mohammad Ghavamzadeh, Shie Mannor

For static risk measures, our approach is in the spirit of policy gradient algorithms and combines a standard sampling approach with convex programming.

Policy Gradient Methods reinforcement-learning

Implicit Temporal Differences

no code implementations21 Dec 2014 Aviv Tamar, Panos Toulis, Shie Mannor, Edoardo M. Airoldi

In reinforcement learning, the TD($\lambda$) algorithm is a fundamental policy evaluation method with an efficient online implementation that is suitable for large-scale problems.

reinforcement-learning

Optimizing the CVaR via Sampling

1 code implementation15 Apr 2014 Aviv Tamar, Yonatan Glassner, Shie Mannor

Conditional Value at Risk (CVaR) is a prominent risk measure that is being used extensively in various domains.

reinforcement-learning

Variance Adjusted Actor Critic Algorithms

no code implementations14 Oct 2013 Aviv Tamar, Shie Mannor

We present an actor-critic framework for MDPs where the objective is the variance-adjusted expected return.

Scaling Up Robust MDPs by Reinforcement Learning

no code implementations26 Jun 2013 Aviv Tamar, Huan Xu, Shie Mannor

We consider large-scale Markov decision processes (MDPs) with parameter uncertainty, under the robust MDP paradigm.

reinforcement-learning

Cannot find the paper you are looking for? You can Submit a new open access paper.