no code implementations • 24 Aug 2017 • Edward Groshev, Maxwell Goldstein, Aviv Tamar, Siddharth Srivastava, Pieter Abbeel
We show that a deep neural network can be used to learn and represent a \emph{generalized reactive policy} (GRP) that maps a problem instance and a state to an action, and that the learned GRPs efficiently solve large classes of challenging problem instances.
no code implementations • 20 Mar 2018 • Garrett Thomas, Melissa Chien, Aviv Tamar, Juan Aparicio Ojea, Pieter Abbeel
We propose to leverage this prior knowledge by guiding RL along a geometric motion plan, calculated using the CAD data.
no code implementations • 22 Nov 2017 • William Wang, Angelina Wang, Aviv Tamar, Xi Chen, Pieter Abbeel
We posit that a generative approach is the natural remedy for this problem, and propose a method for classification using generative models.
no code implementations • 20 Nov 2017 • Daniel J. Mankowitz, Aviv Tamar, Shie Mannor
We learn reusable options in different scenarios in a RoboCup soccer domain (i. e., winning/losing).
no code implementations • 10 Aug 2017 • Asaf Valadarsky, Michael Schapira, Dafna Shahaf, Aviv Tamar
Can ideas and techniques from machine learning be leveraged to automatically generate "good" routing configurations?
no code implementations • NeurIPS 2017 • Nir Levine, Tom Zahavy, Daniel J. Mankowitz, Aviv Tamar, Shie Mannor
In this work we propose a hybrid approach -- the Least Squares Deep Q-Network (LS-DQN), which combines rich feature representations learned by a DRL algorithm with the stability of a linear least squares method.
no code implementations • 10 Oct 2016 • Daniel J. Mankowitz, Aviv Tamar, Shie Mannor
In addition, the learned risk aware skills are able to mitigate reward-based model misspecification.
no code implementations • 14 Sep 2016 • Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, Aviv Tamar
The objective of the paper is to provide a comprehensive survey on Bayesian RL algorithms and their theoretical and empirical properties.
no code implementations • 17 Sep 2015 • Assaf Hallak, Aviv Tamar, Remi Munos, Shie Mannor
We consider the off-policy evaluation problem in Markov decision processes with function approximation.
no code implementations • 14 Aug 2015 • Assaf Hallak, Aviv Tamar, Shie Mannor
Recently, \citet{SuttonMW15} introduced the emphatic temporal differences (ETD) algorithm for off-policy evaluation in Markov decision processes.
no code implementations • NeurIPS 2015 • Aviv Tamar, Yin-Lam Chow, Mohammad Ghavamzadeh, Shie Mannor
For static risk measures, our approach is in the spirit of policy gradient algorithms and combines a standard sampling approach with convex programming.
no code implementations • NeurIPS 2015 • Yin-Lam Chow, Aviv Tamar, Shie Mannor, Marco Pavone
Our first contribution is to show that a CVaR objective, besides capturing risk sensitivity, has an alternative interpretation as expected cost under worst-case modeling errors, for a given error budget.
no code implementations • 21 Dec 2014 • Aviv Tamar, Panos Toulis, Shie Mannor, Edoardo M. Airoldi
In reinforcement learning, the TD($\lambda$) algorithm is a fundamental policy evaluation method with an efficient online implementation that is suitable for large-scale problems.
no code implementations • 14 Oct 2013 • Aviv Tamar, Shie Mannor
We present an actor-critic framework for MDPs where the objective is the variance-adjusted expected return.
no code implementations • 26 Jun 2013 • Aviv Tamar, Huan Xu, Shie Mannor
We consider large-scale Markov decision processes (MDPs) with parameter uncertainty, under the robust MDP paradigm.
no code implementations • 6 Aug 2018 • Dror Freirich, Ron Meir, Aviv Tamar
In this formulation, DiRL can be seen as learning a deep generative model of the value distribution, driven by the discrepancy between the distribution of the current value, and the distribution of the sum of current reward and next value.
no code implementations • ICLR 2019 • Yi Wu, Yuxin Wu, Aviv Tamar, Stuart Russell, Georgia Gkioxari, Yuandong Tian
Building deep reinforcement learning agents that can generalize and adapt to unseen environments remains a fundamental challenge for AI.
no code implementations • ICLR 2018 • Aviv Tamar, Khashayar Rohanimanesh, Yin-Lam Chow, Chris Vigorito, Ben Goodrich, Michael Kahane, Derik Pridmore
In this paper we present an LfD approach for learning multiple modes of behavior from visual data.
no code implementations • 29 Jan 2019 • Orr Krupnik, Igor Mordatch, Aviv Tamar
We consider model-based reinforcement learning (MBRL) in 2-agent, high-fidelity continuous control problems -- an important domain for robots interacting with other agents in the same workspace.
no code implementations • 10 Mar 2019 • Xinyi Ren, Jianlan Luo, Eugen Solowjow, Juan Aparicio Ojea, Abhishek Gupta, Aviv Tamar, Pieter Abbeel
In this work, we investigate how to improve the accuracy of domain randomization based pose estimation.
no code implementations • 11 May 2019 • Angelina Wang, Thanard Kurutach, Kara Liu, Pieter Abbeel, Aviv Tamar
We further demonstrate our approach on learning to imagine and execute in 3 environments, the final of which is deformable rope manipulation on a PR2 robot.
no code implementations • 12 Jun 2019 • Tom Jurgenson, Edward Groshev, Aviv Tamar
In such problems, the way we choose to represent a trajectory underlies algorithms for trajectory prediction and optimization.
no code implementations • 12 Nov 2019 • Tal Daniel, Thanard Kurutach, Aviv Tamar
In this work, we propose two variational methods for training VAEs for SSAD.
no code implementations • ICML 2020 • Tom Jurgenson, Or Avner, Edward Groshev, Aviv Tamar
Reinforcement learning (RL), building on Bellman's optimality equation, naturally optimizes for a single goal, yet can be made multi-goal by augmenting the state with the goal.
no code implementations • 7 Oct 2020 • Noga H. Rotman, Michael Schapira, Aviv Tamar
We illustrate the usefulness of online safety assurance in the context of the proposed deep reinforcement learning (RL) approach to video streaming.
no code implementations • 10 May 2021 • Shadi Endrawis, Gal Leibovich, Guy Jacob, Gal Novik, Aviv Tamar
In this work, we propose that data collection policies should actively explore the environment to collect diverse data.
no code implementations • 24 Sep 2021 • Aviv Tamar, Daniel Soudry, Ev Zisselman
In the Bayesian reinforcement learning (RL) setting, a prior distribution over the unknown problem parameters -- the rewards and transitions -- is assumed, and a policy that optimizes the (posterior) expected return is sought.
no code implementations • 1 Nov 2021 • Gal Leibovich, Guy Jacob, Shadi Endrawis, Gal Novik, Aviv Tamar
We show that our score - VSDR - can significantly improve the accuracy of policy ranking without requiring additional real world data.
no code implementations • 27 Sep 2018 • Elad Sarafian, Aviv Tamar, Sarit Kraus
The primary advantages of our approach, termed Rerouted Behavior Improvement (RBI), over other safe learning methods are its stability in the presence of value estimation errors and the elimination of a policy search process.
no code implementations • 27 Jun 2012 • Dotan Di Castro, Aviv Tamar, Shie Mannor
In this paper we devise a framework for local policy gradient style algorithms for reinforcement learning for variance related criteria.
no code implementations • 3 Nov 2022 • Gal Leibovich, Guy Jacob, Or Avner, Gal Novik, Aviv Tamar
The key challenge is a $\textit{distribution shift}$ between the desired outputs and the outputs of an initial random guess, and we prove that iterative inversion can steer the learning correctly, under rather strict conditions on the function.
no code implementations • 3 Jan 2023 • Shie Mannor, Aviv Tamar
Reinforcement learning (RL) has demonstrated great potential, but is currently full of overhyping and pipe dreams.
no code implementations • 15 Feb 2023 • Khashayar Rohanimanesh, Jake Metzger, William Richards, Aviv Tamar
However, we find that an approximate solution based on sparse tree search yields near optimal performance at a fraction of the time.
no code implementations • 1 Mar 2023 • Yarin Perry, Felipe Vieira Frujeri, Chaim Hoch, Srikanth Kandula, Ishai Menache, Michael Schapira, Aviv Tamar
Routing is, arguably, the most fundamental task in computer networking, and the most extensively studied one.
no code implementations • 17 May 2023 • Tom Jurgenson, Aviv Tamar
Based on this idea, we propose Trajectory Iterative Learner (TraIL), an extension of GCSL that further exploits the information in a trajectory, and uses it for learning to predict both actions and sub-goals.
no code implementations • 6 Jul 2023 • Idan Shenfeld, Zhang-Wei Hong, Aviv Tamar, Pulkit Agrawal
To combine the benefits of these different forms of learning, it is common to train a policy to maximize a combination of reinforcement and teacher-student learning objectives.
1 code implementation • 19 Oct 2023 • Orr Krupnik, Elisei Shafer, Tom Jurgenson, Aviv Tamar
Adaptable models could greatly benefit robotic agents operating in the real world, allowing them to deal with novel and varying conditions.
1 code implementation • 20 May 2018 • Elad Sarafian, Aviv Tamar, Sarit Kraus
To minimize the improvement penalty, the RBI idea is to attenuate rapid policy changes of low probability actions which were less frequently sampled.
1 code implementation • NeurIPS 2023 • Ev Zisselman, Itai Lavie, Daniel Soudry, Aviv Tamar
Our insight is that learning a policy that effectively $\textit{explores}$ the domain is harder to memorize than a policy that maximizes reward for a specific task, and therefore we expect such learned behavior to generalize well; we indeed demonstrate this empirically on several domains that are difficult for invariance-based approaches.
1 code implementation • ICLR Workshop SSL-RL 2021 • Carmel Rabinovitz, Niko Grupen, Aviv Tamar
In this work, however, we show that a naive application of DR to unsupervised learning based on contrastive estimation does not promote invariance, as the loss function maximizes mutual information between the features and both the relevant and irrelevant visual properties.
1 code implementation • 4 Jun 2023 • Era Choshen, Aviv Tamar
In meta reinforcement learning (meta RL), an agent seeks a Bayes-optimal policy -- the optimal policy when facing an unknown task that is sampled from some known task distribution.
1 code implementation • 21 Jun 2022 • Zohar Rimon, Aviv Tamar, Gilad Adler
We show that our approach leads to bounds that depend on the dimension of the task distribution.
1 code implementation • 14 Mar 2024 • Zohar Rimon, Tom Jurgenson, Orr Krupnik, Gilad Adler, Aviv Tamar
Meta-reinforcement learning (meta-RL) is a promising framework for tackling challenging domains requiring efficient exploration.
1 code implementation • 1 Apr 2024 • Dan Haramati, Tal Daniel, Aviv Tamar
Manipulating objects is a hallmark of human intelligence, and an important task in domains such as robotics.
1 code implementation • 10 Jul 2020 • Roi Bar Zur, Ittay Eyal, Aviv Tamar
We call this Probabilistic Termination Optimization (PTO), and the technique applies to any MDP whose utility is a ratio function.
Cryptography and Security
1 code implementation • 9 Jun 2023 • Tal Daniel, Aviv Tamar
We propose a new object-centric video prediction algorithm based on the deep latent particle (DLP) representation.
1 code implementation • ICML 2020 • Kara Liu, Thanard Kurutach, Christine Tung, Pieter Abbeel, Aviv Tamar
In visual planning (VP), an agent learns to plan goal-directed behavior from observations of a dynamical system obtained offline, e. g., images obtained from self-supervised robot interaction.
1 code implementation • 15 Apr 2014 • Aviv Tamar, Yonatan Glassner, Shie Mannor
Conditional Value at Risk (CVaR) is a prominent risk measure that is being used extensively in various domains.
1 code implementation • 1 Jun 2019 • Tom Jurgenson, Aviv Tamar
We then propose a modification of the popular DDPG RL algorithm that is tailored to motion planning domains, by exploiting the known model in the problem and the set of solved plans in the data.
1 code implementation • 31 May 2022 • Tal Daniel, Aviv Tamar
We propose a new representation of visual data that disentangles object position from appearance.
Ranked #1 on Unsupervised Facial Landmark Detection on MAFL
1 code implementation • NeurIPS 2021 • Ron Dorfman, Idan Shenfeld, Aviv Tamar
Consider the following instance of the Offline Meta Reinforcement Learning (OMRL) problem: given the complete training logs of $N$ conventional RL agents, trained on $N$ different tasks, design a meta-agent that can quickly maximize reward in a new, unseen task from the same task distribution.
1 code implementation • NeurIPS 2021 • Ron Dorfman, Idan Shenfeld, Aviv Tamar
Consider the following instance of the Offline Meta Reinforcement Learning (OMRL) problem: given the complete training logs of $N$ conventional RL agents, trained on $N$ different tasks, design a meta-agent that can quickly maximize reward in a new, unseen task from the same task distribution.
1 code implementation • CVPR 2020 • Ev Zisselman, Aviv Tamar
Specifically, we demonstrate the effectiveness of our method in ResNet and DenseNet architectures trained on various image datasets.
1 code implementation • ICCV 2019 • Yi Wu, Yuxin Wu, Aviv Tamar, Stuart Russell, Georgia Gkioxari, Yuandong Tian
We introduce a new memory architecture, Bayesian Relational Memory (BRM), to improve the generalization ability for semantic visual navigation agents in unseen environments, where an agent is given a semantic target to navigate towards.
1 code implementation • NeurIPS 2018 • Thanard Kurutach, Aviv Tamar, Ge Yang, Stuart Russell, Pieter Abbeel
Finally, to generate a visual plan, we project the current and goal observations onto their respective states in the planning model, plan a trajectory, and then use the generative model to transform the trajectory to a sequence of observations.
2 code implementations • ICLR 2018 • Thanard Kurutach, Ignasi Clavera, Yan Duan, Aviv Tamar, Pieter Abbeel
In this paper, we analyze the behavior of vanilla model-based reinforcement learning methods when deep neural networks are used to learn both the model and the policy, and show that the learned policy tends to exploit regions where insufficient data is available for the model to be learned, causing instability in training.
2 code implementations • CVPR 2021 • Tal Daniel, Aviv Tamar
However, the original IntroVAE loss function relied on a particular hinge-loss formulation that is very hard to stabilize in practice, and its theoretical convergence analysis ignored important terms in the loss.
1 code implementation • 28 Sep 2016 • Aviv Tamar, Garrett Thomas, Tianhao Zhang, Sergey Levine, Pieter Abbeel
To bring the next real-world execution closer to the hindsight plan, our approach learns to re-shape the original cost function with the goal of satisfying the following property: short horizon planning (as realistic during real executions) with respect to the shaped cost should result in mimicking the hindsight plan.
9 code implementations • ICML 2017 • Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel
For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function.
8 code implementations • NeurIPS 2016 • Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, Pieter Abbeel
We introduce the value iteration network (VIN): a fully differentiable neural network with a `planning module' embedded within.
84 code implementations • NeurIPS 2017 • Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, Igor Mordatch
We explore deep reinforcement learning methods for multi-agent domains.
Ranked #1 on SMAC+ on Def_Infantry_sequential