Search Results for author: Tom Zahavy

Found 43 papers, 7 papers with code

APART: Diverse Skill Discovery using All Pairs with Ascending Reward and DropouT

no code implementations24 Aug 2023 Hadar Schreiber Galler, Tom Zahavy, Guillaume Desjardins, Alon Cohen

This problem is formulated as mutual training of skills using an intrinsic reward and a discriminator trained to predict a skill given its trajectory.

Diversifying AI: Towards Creative Chess with AlphaZero

no code implementations17 Aug 2023 Tom Zahavy, Vivek Veeriah, Shaobo Hou, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut, Demis Hassabis, Satinder Singh

In particular, we investigate whether a team of diverse AI systems can outperform a single AI in challenging tasks by generating more ideas as a group and then selecting the best ones.

Decision Making Game of Chess

Acceleration in Policy Optimization

no code implementations18 Jun 2023 Veronica Chelu, Tom Zahavy, Arthur Guez, Doina Precup, Sebastian Flennerhag

We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates.

Meta-Learning Policy Gradient Methods +1

Discovering Attention-Based Genetic Algorithms via Meta-Black-Box Optimization

1 code implementation8 Apr 2023 Robert Tjarko Lange, Tom Schaul, Yutian Chen, Chris Lu, Tom Zahavy, Valentin Dalibard, Sebastian Flennerhag

Genetic algorithms constitute a family of black-box optimization algorithms, which take inspiration from the principles of biological evolution.

POMRL: No-Regret Learning-to-Plan with Increasing Horizons

no code implementations30 Dec 2022 Khimya Khetarpal, Claire Vernade, Brendan O'Donoghue, Satinder Singh, Tom Zahavy

We study the problem of planning under model uncertainty in an online meta-reinforcement learning (RL) setting where an agent is presented with a sequence of related tasks with limited interactions per task.

Meta Reinforcement Learning Reinforcement Learning (RL)

Palm up: Playing in the Latent Manifold for Unsupervised Pretraining

no code implementations19 Oct 2022 Hao liu, Tom Zahavy, Volodymyr Mnih, Satinder Singh

In this work, we aim to bring the best of both worlds and propose an algorithm that exhibits an exploratory behavior whilst it utilizes large diverse datasets.

reinforcement-learning Reinforcement Learning (RL) +2

Meta-Gradients in Non-Stationary Environments

no code implementations13 Sep 2022 Jelena Luketina, Sebastian Flennerhag, Yannick Schroecker, David Abel, Tom Zahavy, Satinder Singh

We support these results with a qualitative analysis of resulting meta-parameter schedules and learned functions of context features.

Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near Optimality

no code implementations26 May 2022 Tom Zahavy, Yannick Schroecker, Feryal Behbahani, Kate Baumli, Sebastian Flennerhag, Shaobo Hou, Satinder Singh

Finding different solutions to the same problem is a key aspect of intelligence associated with creativity and adaptation to novel situations.

Bootstrapped Meta-Learning

1 code implementation ICLR 2022 Sebastian Flennerhag, Yannick Schroecker, Tom Zahavy, Hado van Hasselt, David Silver, Satinder Singh

We achieve a new state-of-the art for model-free agents on the Atari ALE benchmark and demonstrate that it yields both performance and efficiency gains in multi-task meta-learning.

Efficient Exploration Few-Shot Learning +1

Discovering Diverse Nearly Optimal Policies with Successor Features

no code implementations ICML Workshop URL 2021 Tom Zahavy, Brendan O'Donoghue, Andre Barreto, Volodymyr Mnih, Sebastian Flennerhag, Satinder Singh

We propose Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features, while assuring that they are near optimal.

Reward is enough for convex MDPs

no code implementations NeurIPS 2021 Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, Satinder Singh

Maximising a cumulative reward function that is Markov and stationary, i. e., defined over state-action pairs and independent of time, is sufficient to capture many kinds of goals in a Markov decision process (MDP).

Reinforcement Learning (RL)

Online Apprenticeship Learning

no code implementations13 Feb 2021 Lior Shani, Tom Zahavy, Shie Mannor

Finally, we implement a deep variant of our algorithm which shares some similarities to GAIL \cite{ho2016generative}, but where the discriminator is replaced with the costs learned by the OAL problem.

Discovering a set of policies for the worst case reward

no code implementations ICLR 2021 Tom Zahavy, Andre Barreto, Daniel J Mankowitz, Shaobo Hou, Brendan O'Donoghue, Iurii Kemaev, Satinder Singh

Our main contribution is a policy iteration algorithm that builds a set of policies in order to maximize the worst-case performance of the resulting SMP on the set of tasks.

Online Limited Memory Neural-Linear Bandits with Likelihood Matching

3 code implementations7 Feb 2021 Ofir Nabati, Tom Zahavy, Shie Mannor

To alleviate this, we propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.

Efficient Exploration Multi-Armed Bandits +1

Online Limited Memory Neural-Linear Bandits

no code implementations1 Jan 2021 Tom Zahavy, Ofir Nabati, Leor Cohen, Shie Mannor

We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.

Efficient Exploration Multi-Armed Bandits +2

Balancing Constraints and Rewards with Meta-Gradient D4PG

no code implementations ICLR 2021 Dan A. Calian, Daniel J. Mankowitz, Tom Zahavy, Zhongwen Xu, Junhyuk Oh, Nir Levine, Timothy Mann

Deploying Reinforcement Learning (RL) agents to solve real-world applications often requires satisfying complex system constraints.

Reinforcement Learning (RL)

A Self-Tuning Actor-Critic Algorithm

no code implementations NeurIPS 2020 Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh

Reinforcement learning algorithms are highly sensitive to the choice of hyperparameters, typically requiring significant manual effort to identify hyperparameters that perform well on a new domain.

Atari Games reinforcement-learning +1

Deep Randomized Least Squares Value Iteration

no code implementations ICLR 2020 Guy Adam, Tom Zahavy, Oron Anschel, Nahum Shimkin

Rather than using hand-design state representation, we use a state representation that is being learned directly from the data by a DQN agent.

reinforcement-learning Reinforcement Learning (RL)

Deep learning reconstruction of ultrashort pulses from 2D spatial intensity patterns recorded by an all-in-line system in a single-shot

no code implementations23 Nov 2019 Ron Ziv, Alex Dikopoltsev, Tom Zahavy, Ittai Rubinstein, Pavel Sidorenko, Oren Cohen, Mordechai Segev

We propose a simple all-in-line single-shot scheme for diagnostics of ultrashort laser pulses, consisting of a multi-mode fiber, a nonlinear crystal and a CCD camera.

Apprenticeship Learning via Frank-Wolfe

no code implementations5 Nov 2019 Tom Zahavy, Alon Cohen, Haim Kaplan, Yishay Mansour

Specifically, we show that a variation of the FW method that is based on taking "away steps" achieves a linear rate of convergence when applied to AL and that a stochastic version of the FW algorithm can be used to avoid precise estimation of feature expectations.

Contextual Inverse Reinforcement Learning

no code implementations25 Sep 2019 Philip Korsunsky, Stav Belogolovsky, Tom Zahavy, Chen Tessler, Shie Mannor

In this setting, the reward, which is unknown to the agent, is a function of a static parameter referred to as the context.

reinforcement-learning Reinforcement Learning (RL)

Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood Matching

1 code implementation25 Sep 2019 Tom Zahavy, Shie Mannor

We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.

Efficient Exploration Multi-Armed Bandits +3

Unknown mixing times in apprenticeship and reinforcement learning

no code implementations23 May 2019 Tom Zahavy, Alon Cohen, Haim Kaplan, Yishay Mansour

We derive and analyze learning algorithms for apprenticeship learning, policy evaluation, and policy gradient for average reward criteria.

reinforcement-learning Reinforcement Learning (RL)

Inverse Reinforcement Learning in Contextual MDPs

2 code implementations23 May 2019 Stav Belogolovsky, Philip Korsunsky, Shie Mannor, Chen Tessler, Tom Zahavy

Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts).

Autonomous Driving reinforcement-learning +1

Action Assembly: Sparse Imitation Learning for Text Based Games with Combinatorial Action Spaces

no code implementations23 May 2019 Chen Tessler, Tom Zahavy, Deborah Cohen, Daniel J. Mankowitz, Shie Mannor

We propose a computationally efficient algorithm that combines compressed sensing with imitation learning to solve text-based games with combinatorial action spaces.

Imitation Learning text-based games +1

Deep Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood Matching

no code implementations24 Jan 2019 Tom Zahavy, Shie Mannor

We study the neural-linear bandit model for solving sequential decision-making problems with high dimensional side information.

Decision Making Efficient Exploration +4

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

no code implementations NeurIPS 2018 Tom Zahavy, Matan Haroush, Nadav Merlis, Daniel J. Mankowitz, Shie Mannor

Learning how to act when there are many available actions in each state is a challenging task for Reinforcement Learning (RL) agents, especially when many of the actions are redundant or irrelevant.

reinforcement-learning Reinforcement Learning (RL) +1

Deep Learning Reconstruction of Ultra-Short Pulses

no code implementations15 Mar 2018 Tom Zahavy, Alex Dikopoltsev, Oren Cohen, Shie Mannor, Mordechai Segev

Ultra-short laser pulses with femtosecond to attosecond pulse duration are the shortest systematic events humans can create.

Train on Validation: Squeezing the Data Lemon

no code implementations16 Feb 2018 Guy Tennenholtz, Tom Zahavy, Shie Mannor

We define the notion of on-average-validation-stable algorithms as one in which using small portions of validation data for training does not overfit the model selection process.

Model Selection

Shallow Updates for Deep Reinforcement Learning

no code implementations NeurIPS 2017 Nir Levine, Tom Zahavy, Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

In this work we propose a hybrid approach -- the Least Squares Deep Q-Network (LS-DQN), which combines rich feature representations learned by a DRL algorithm with the stability of a linear least squares method.

Atari Games Feature Engineering +2

Visualizing Dynamics: from t-SNE to SEMI-MDPs

no code implementations22 Jun 2016 Nir Ben Zrihem, Tom Zahavy, Shie Mannor

Deep Reinforcement Learning (DRL) is a trending field of research, showing great promise in many challenging problems such as playing Atari, solving Go and controlling robots.

Deep Reinforcement Learning Discovers Internal Models

no code implementations16 Jun 2016 Nir Baram, Tom Zahavy, Shie Mannor

Deep Reinforcement Learning (DRL) is a trending field of research, showing great promise in challenging problems such as playing Atari, solving Go and controlling robots.

reinforcement-learning Reinforcement Learning (RL)

A Deep Hierarchical Approach to Lifelong Learning in Minecraft

no code implementations25 Apr 2016 Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J. Mankowitz, Shie Mannor

Skill distillation enables the HDRLN to efficiently retain knowledge and therefore scale in lifelong learning, by accumulating knowledge and encapsulating multiple reusable skills into a single distilled network.

Graying the black box: Understanding DQNs

no code implementations8 Feb 2016 Tom Zahavy, Nir Ben Zrihem, Shie Mannor

In recent years there is a growing interest in using deep representations for reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Ensemble Robustness and Generalization of Stochastic Deep Learning Algorithms

no code implementations ICLR 2018 Tom Zahavy, Bingyi Kang, Alex Sivak, Jiashi Feng, Huan Xu, Shie Mannor

As most deep learning algorithms are stochastic (e. g., Stochastic Gradient Descent, Dropout, and Bayes-by-backprop), we revisit the robustness arguments of Xu & Mannor, and introduce a new approach, ensemble robustness, that concerns the robustness of a population of hypotheses.

Cannot find the paper you are looking for? You can Submit a new open access paper.