You need to log in to edit.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

no code implementations • 1 Nov 2021 • Gal Leibovich, Guy Jacob, Shadi Endrawis, Gal Novik, Aviv Tamar

We show that our score - VSDR - can significantly improve the accuracy of policy ranking without requiring additional real world data.

no code implementations • 24 Sep 2021 • Aviv Tamar, Daniel Soudry, Ev Zisselman

In the Bayesian reinforcement learning (RL) setting, a prior distribution over the unknown problem parameters -- the rewards and transitions -- is assumed, and a policy that optimizes the (posterior) expected return is sought.

1 code implementation • NeurIPS 2021 • Ron Dorfman, Idan Shenfeld, Aviv Tamar

Consider the following instance of the Offline Meta Reinforcement Learning (OMRL) problem: given the complete training logs of $N$ conventional RL agents, trained on $N$ different tasks, design a meta-agent that can quickly maximize reward in a new, unseen task from the same task distribution.

no code implementations • 10 May 2021 • Shadi Endrawis, Gal Leibovich, Guy Jacob, Gal Novik, Aviv Tamar

In this work, we propose that data collection policies should actively explore the environment to collect diverse data.

1 code implementation • ICLR Workshop SSL-RL 2021 • Carmel Rabinovitz, Niko Grupen, Aviv Tamar

In this work, however, we show that a naive application of DR to unsupervised learning based on contrastive estimation does not promote invariance, as the loss function maximizes mutual information between the features and both the relevant and irrelevant visual properties.

1 code implementation • CVPR 2021 • Tal Daniel, Aviv Tamar

However, the original IntroVAE loss function relied on a particular hinge-loss formulation that is very hard to stabilize in practice, and its theoretical convergence analysis ignored important terms in the loss.

no code implementations • 7 Oct 2020 • Noga H. Rotman, Michael Schapira, Aviv Tamar

We illustrate the usefulness of online safety assurance in the context of the proposed deep reinforcement learning (RL) approach to video streaming.

1 code implementation • NeurIPS 2021 • Ron Dorfman, Idan Shenfeld, Aviv Tamar

Consider the following instance of the Offline Meta Reinforcement Learning (OMRL) problem: given the complete training logs of $N$ conventional RL agents, trained on $N$ different tasks, design a meta-agent that can quickly maximize reward in a new, unseen task from the same task distribution.

1 code implementation • 10 Jul 2020 • Roi Bar Zur, Ittay Eyal, Aviv Tamar

We call this Probabilistic Termination Optimization (PTO), and the technique applies to any MDP whose utility is a ratio function.

Cryptography and Security

no code implementations • ICML 2020 • Tom Jurgenson, Or Avner, Edward Groshev, Aviv Tamar

Reinforcement learning (RL), building on Bellman's optimality equation, naturally optimizes for a single goal, yet can be made multi-goal by augmenting the state with the goal.

1 code implementation • ICML 2020 • Kara Liu, Thanard Kurutach, Christine Tung, Pieter Abbeel, Aviv Tamar

In visual planning (VP), an agent learns to plan goal-directed behavior from observations of a dynamical system obtained offline, e. g., images obtained from self-supervised robot interaction.

1 code implementation • CVPR 2020 • Ev Zisselman, Aviv Tamar

Specifically, we demonstrate the effectiveness of our method in ResNet and DenseNet architectures trained on various image datasets.

no code implementations • 12 Nov 2019 • Tal Daniel, Thanard Kurutach, Aviv Tamar

In this work, we propose two variational methods for training VAEs for SSAD.

1 code implementation • ICCV 2019 • Yi Wu, Yuxin Wu, Aviv Tamar, Stuart Russell, Georgia Gkioxari, Yuandong Tian

We introduce a new memory architecture, Bayesian Relational Memory (BRM), to improve the generalization ability for semantic visual navigation agents in unseen environments, where an agent is given a semantic target to navigate towards.

no code implementations • 12 Jun 2019 • Tom Jurgenson, Edward Groshev, Aviv Tamar

In such problems, the way we choose to represent a trajectory underlies algorithms for trajectory prediction and optimization.

1 code implementation • 1 Jun 2019 • Tom Jurgenson, Aviv Tamar

We then propose a modification of the popular DDPG RL algorithm that is tailored to motion planning domains, by exploiting the known model in the problem and the set of solved plans in the data.

no code implementations • 11 May 2019 • Angelina Wang, Thanard Kurutach, Kara Liu, Pieter Abbeel, Aviv Tamar

We further demonstrate our approach on learning to imagine and execute in 3 environments, the final of which is deformable rope manipulation on a PR2 robot.

no code implementations • 10 Mar 2019 • Xinyi Ren, Jianlan Luo, Eugen Solowjow, Juan Aparicio Ojea, Abhishek Gupta, Aviv Tamar, Pieter Abbeel

In this work, we investigate how to improve the accuracy of domain randomization based pose estimation.

no code implementations • 29 Jan 2019 • Orr Krupnik, Igor Mordatch, Aviv Tamar

We consider model-based reinforcement learning (MBRL) in 2-agent, high-fidelity continuous control problems -- an important domain for robots interacting with other agents in the same workspace.

no code implementations • ICLR 2019 • Yi Wu, Yuxin Wu, Aviv Tamar, Stuart Russell, Georgia Gkioxari, Yuandong Tian

Building deep reinforcement learning agents that can generalize and adapt to unseen environments remains a fundamental challenge for AI.

no code implementations • 27 Sep 2018 • Elad Sarafian, Aviv Tamar, Sarit Kraus

The primary advantages of our approach, termed Rerouted Behavior Improvement (RBI), over other safe learning methods are its stability in the presence of value estimation errors and the elimination of a policy search process.

no code implementations • 6 Aug 2018 • Dror Freirich, Ron Meir, Aviv Tamar

In this formulation, DiRL can be seen as learning a deep generative model of the value distribution, driven by the discrepancy between the distribution of the current value, and the distribution of the sum of current reward and next value.

1 code implementation • NeurIPS 2018 • Thanard Kurutach, Aviv Tamar, Ge Yang, Stuart Russell, Pieter Abbeel

Finally, to generate a visual plan, we project the current and goal observations onto their respective states in the planning model, plan a trajectory, and then use the generative model to transform the trajectory to a sequence of observations.

1 code implementation • 20 May 2018 • Elad Sarafian, Aviv Tamar, Sarit Kraus

To minimize the improvement penalty, the RBI idea is to attenuate rapid policy changes of low probability actions which were less frequently sampled.

no code implementations • 20 Mar 2018 • Garrett Thomas, Melissa Chien, Aviv Tamar, Juan Aparicio Ojea, Pieter Abbeel

We propose to leverage this prior knowledge by guiding RL along a geometric motion plan, calculated using the CAD data.

2 code implementations • ICLR 2018 • Thanard Kurutach, Ignasi Clavera, Yan Duan, Aviv Tamar, Pieter Abbeel

In this paper, we analyze the behavior of vanilla model-based reinforcement learning methods when deep neural networks are used to learn both the model and the policy, and show that the learned policy tends to exploit regions where insufficient data is available for the model to be learned, causing instability in training.

no code implementations • ICLR 2018 • Aviv Tamar, Khashayar Rohanimanesh, Yin-Lam Chow, Chris Vigorito, Ben Goodrich, Michael Kahane, Derik Pridmore

In this paper we present an LfD approach for learning multiple modes of behavior from visual data.

no code implementations • 22 Nov 2017 • William Wang, Angelina Wang, Aviv Tamar, Xi Chen, Pieter Abbeel

We posit that a generative approach is the natural remedy for this problem, and propose a method for classification using generative models.

no code implementations • 20 Nov 2017 • Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

We learn reusable options in different scenarios in a RoboCup soccer domain (i. e., winning/losing).

no code implementations • 24 Aug 2017 • Edward Groshev, Maxwell Goldstein, Aviv Tamar, Siddharth Srivastava, Pieter Abbeel

We show that a deep neural network can be used to learn and represent a \emph{generalized reactive policy} (GRP) that maps a problem instance and a state to an action, and that the learned GRPs efficiently solve large classes of challenging problem instances.

no code implementations • 10 Aug 2017 • Asaf Valadarsky, Michael Schapira, Dafna Shahaf, Aviv Tamar

Can ideas and techniques from machine learning be leveraged to automatically generate "good" routing configurations?

72 code implementations • NeurIPS 2017 • Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, Igor Mordatch

We explore deep reinforcement learning methods for multi-agent domains.

7 code implementations • ICML 2017 • Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel

For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function.

no code implementations • NeurIPS 2017 • Nir Levine, Tom Zahavy, Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

In this work we propose a hybrid approach -- the Least Squares Deep Q-Network (LS-DQN), which combines rich feature representations learned by a DRL algorithm with the stability of a linear least squares method.

no code implementations • 10 Oct 2016 • Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

In addition, the learned risk aware skills are able to mitigate reward-based model misspecification.

1 code implementation • 28 Sep 2016 • Aviv Tamar, Garrett Thomas, Tianhao Zhang, Sergey Levine, Pieter Abbeel

To bring the next real-world execution closer to the hindsight plan, our approach learns to re-shape the original cost function with the goal of satisfying the following property: short horizon planning (as realistic during real executions) with respect to the shaped cost should result in mimicking the hindsight plan.

no code implementations • 14 Sep 2016 • Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, Aviv Tamar

The objective of the paper is to provide a comprehensive survey on Bayesian RL algorithms and their theoretical and empirical properties.

8 code implementations • NeurIPS 2016 • Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, Pieter Abbeel

We introduce the value iteration network (VIN): a fully differentiable neural network with a `planning module' embedded within.

no code implementations • 17 Sep 2015 • Assaf Hallak, Aviv Tamar, Remi Munos, Shie Mannor

We consider the off-policy evaluation problem in Markov decision processes with function approximation.

no code implementations • 14 Aug 2015 • Assaf Hallak, Aviv Tamar, Shie Mannor

Recently, \citet{SuttonMW15} introduced the emphatic temporal differences (ETD) algorithm for off-policy evaluation in Markov decision processes.

no code implementations • NeurIPS 2015 • Yin-Lam Chow, Aviv Tamar, Shie Mannor, Marco Pavone

Our first contribution is to show that a CVaR objective, besides capturing risk sensitivity, has an alternative interpretation as expected cost under worst-case modeling errors, for a given error budget.

no code implementations • NeurIPS 2015 • Aviv Tamar, Yin-Lam Chow, Mohammad Ghavamzadeh, Shie Mannor

For static risk measures, our approach is in the spirit of policy gradient algorithms and combines a standard sampling approach with convex programming.

no code implementations • 21 Dec 2014 • Aviv Tamar, Panos Toulis, Shie Mannor, Edoardo M. Airoldi

In reinforcement learning, the TD($\lambda$) algorithm is a fundamental policy evaluation method with an efficient online implementation that is suitable for large-scale problems.

1 code implementation • 15 Apr 2014 • Aviv Tamar, Yonatan Glassner, Shie Mannor

Conditional Value at Risk (CVaR) is a prominent risk measure that is being used extensively in various domains.

no code implementations • 14 Oct 2013 • Aviv Tamar, Shie Mannor

We present an actor-critic framework for MDPs where the objective is the variance-adjusted expected return.

no code implementations • 26 Jun 2013 • Aviv Tamar, Huan Xu, Shie Mannor

We consider large-scale Markov decision processes (MDPs) with parameter uncertainty, under the robust MDP paradigm.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.