Search Results for author: Tom Zahavy

Found 43 papers, 7 papers with code

APART: Diverse Skill Discovery using All Pairs with Ascending Reward and DropouT

no code implementations • 24 Aug 2023 • Hadar Schreiber Galler, Tom Zahavy, Guillaume Desjardins, Alon Cohen

This problem is formulated as mutual training of skills using an intrinsic reward and a discriminator trained to predict a skill given its trajectory.

Paper
Add Code

Diversifying AI: Towards Creative Chess with AlphaZero

no code implementations • 17 Aug 2023 • Tom Zahavy, Vivek Veeriah, Shaobo Hou, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut, Demis Hassabis, Satinder Singh

In particular, we investigate whether a team of diverse AI systems can outperform a single AI in challenging tasks by generating more ideas as a group and then selecting the best ones.

Decision Making Game of Chess

Paper
Add Code

Acceleration in Policy Optimization

no code implementations • 18 Jun 2023 • Veronica Chelu, Tom Zahavy, Arthur Guez, Doina Precup, Sebastian Flennerhag

We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates.

Meta-Learning Policy Gradient Methods +1

Paper
Add Code

Discovering Attention-Based Genetic Algorithms via Meta-Black-Box Optimization

1 code implementation • 8 Apr 2023 • Robert Tjarko Lange, Tom Schaul, Yutian Chen, Chris Lu, Tom Zahavy, Valentin Dalibard, Sebastian Flennerhag

Genetic algorithms constitute a family of black-box optimization algorithms, which take inspiration from the principles of biological evolution.

446

Paper
Code

ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs

no code implementations • 2 Feb 2023 • Ted Moskovitz, Brendan O'Donoghue, Vivek Veeriah, Sebastian Flennerhag, Satinder Singh, Tom Zahavy

Such applications often require to put constraints on the agent's behavior.

Continuous Control reinforcement-learning +1

Paper
Add Code

POMRL: No-Regret Learning-to-Plan with Increasing Horizons

no code implementations • 30 Dec 2022 • Khimya Khetarpal, Claire Vernade, Brendan O'Donoghue, Satinder Singh, Tom Zahavy

We study the problem of planning under model uncertainty in an online meta-reinforcement learning (RL) setting where an agent is presented with a sequence of related tasks with limited interactions per task.

Meta Reinforcement Learning Reinforcement Learning (RL)

Paper
Add Code

Discovering Evolution Strategies via Meta-Black-Box Optimization

1 code implementation • 21 Nov 2022 • Robert Tjarko Lange, Tom Schaul, Yutian Chen, Tom Zahavy, Valentin Dallibard, Chris Lu, Satinder Singh, Sebastian Flennerhag

Optimizing functions without access to gradients is the remit of black-box methods such as evolution strategies.

Continuous Control Meta-Learning

446

Paper
Code

Palm up: Playing in the Latent Manifold for Unsupervised Pretraining

no code implementations • 19 Oct 2022 • Hao liu, Tom Zahavy, Volodymyr Mnih, Satinder Singh

In this work, we aim to bring the best of both worlds and propose an algorithm that exhibits an exploratory behavior whilst it utilizes large diverse datasets.

reinforcement-learning Reinforcement Learning (RL) +2

Paper
Add Code

Meta-Gradients in Non-Stationary Environments

no code implementations • 13 Sep 2022 • Jelena Luketina, Sebastian Flennerhag, Yannick Schroecker, David Abel, Tom Zahavy, Satinder Singh

We support these results with a qualitative analysis of resulting meta-parameter schedules and learned functions of context features.

Paper
Add Code

Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near Optimality

no code implementations • 26 May 2022 • Tom Zahavy, Yannick Schroecker, Feryal Behbahani, Kate Baumli, Sebastian Flennerhag, Shaobo Hou, Satinder Singh

Finding different solutions to the same problem is a key aspect of intelligence associated with creativity and adaptation to novel situations.

Paper
Add Code

Bootstrapped Meta-Learning

1 code implementation • ICLR 2022 • Sebastian Flennerhag, Yannick Schroecker, Tom Zahavy, Hado van Hasselt, David Silver, Satinder Singh

We achieve a new state-of-the art for model-free agents on the Atari ALE benchmark and demonstrate that it yields both performance and efficiency gains in multi-task meta-learning.

Efficient Exploration Few-Shot Learning +1

Paper
Code

Emphatic Algorithms for Deep Reinforcement Learning

no code implementations • 21 Jun 2021 • Ray Jiang, Tom Zahavy, Zhongwen Xu, Adam White, Matteo Hessel, Charles Blundell, Hado van Hasselt

In this paper, we extend the use of emphatic methods to deep reinforcement learning agents.

Atari Games reinforcement-learning +1

Paper
Add Code

Discovering Diverse Nearly Optimal Policies with Successor Features

no code implementations • ICML Workshop URL 2021 • Tom Zahavy, Brendan O'Donoghue, Andre Barreto, Volodymyr Mnih, Sebastian Flennerhag, Satinder Singh

We propose Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features, while assuring that they are near optimal.

Paper
Add Code

Reward is enough for convex MDPs

no code implementations • NeurIPS 2021 • Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, Satinder Singh

Maximising a cumulative reward function that is Markov and stationary, i. e., defined over state-action pairs and independent of time, is sufficient to capture many kinds of goals in a Markov decision process (MDP).

Reinforcement Learning (RL)

Paper
Add Code

Online Apprenticeship Learning

no code implementations • 13 Feb 2021 • Lior Shani, Tom Zahavy, Shie Mannor

Finally, we implement a deep variant of our algorithm which shares some similarities to GAIL \cite{ho2016generative}, but where the discriminator is replaced with the costs learned by the OAL problem.

Paper
Add Code

Discovery of Options via Meta-Learned Subgoals

no code implementations • NeurIPS 2021 • Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado van Hasselt, David Silver, Satinder Singh

Temporal abstractions in the form of options have been shown to help reinforcement learning (RL) agents learn faster.

Reinforcement Learning (RL)

Paper
Add Code

Discovering a set of policies for the worst case reward

no code implementations • ICLR 2021 • Tom Zahavy, Andre Barreto, Daniel J Mankowitz, Shaobo Hou, Brendan O'Donoghue, Iurii Kemaev, Satinder Singh

Our main contribution is a policy iteration algorithm that builds a set of policies in order to maximize the worst-case performance of the resulting SMP on the set of tasks.

Paper
Add Code

Online Limited Memory Neural-Linear Bandits with Likelihood Matching

3 code implementations • 7 Feb 2021 • Ofir Nabati, Tom Zahavy, Shie Mannor

To alleviate this, we propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.

Efficient Exploration Multi-Armed Bandits +1

Paper
Code

Online Limited Memory Neural-Linear Bandits

no code implementations • 1 Jan 2021 • Tom Zahavy, Ofir Nabati, Leor Cohen, Shie Mannor

We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.

Efficient Exploration Multi-Armed Bandits +2

Paper
Add Code

Balancing Constraints and Rewards with Meta-Gradient D4PG

no code implementations • ICLR 2021 • Dan A. Calian, Daniel J. Mankowitz, Tom Zahavy, Zhongwen Xu, Junhyuk Oh, Nir Levine, Timothy Mann

Deploying Reinforcement Learning (RL) agents to solve real-world applications often requires satisfying complex system constraints.

Reinforcement Learning (RL)

Paper
Add Code

Learning to Ask Medical Questions using Reinforcement Learning

1 code implementation • 31 Mar 2020 • Uri Shaham, Tom Zahavy, Cesar Caraballo, Shiwani Mahajan, Daisy Massey, Harlan Krumholz

We propose a novel reinforcement learning-based approach for adaptive and iterative feature selection.

feature selection reinforcement-learning +1

Paper
Code

A Self-Tuning Actor-Critic Algorithm

no code implementations • NeurIPS 2020 • Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh

Reinforcement learning algorithms are highly sensitive to the choice of hyperparameters, typically requiring significant manual effort to identify hyperparameters that perform well on a new domain.

Atari Games reinforcement-learning +1

Paper
Add Code

Deep Randomized Least Squares Value Iteration

no code implementations • ICLR 2020 • Guy Adam, Tom Zahavy, Oron Anschel, Nahum Shimkin

Rather than using hand-design state representation, we use a state representation that is being learned directly from the data by a DQN agent.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Deep learning reconstruction of ultrashort pulses from 2D spatial intensity patterns recorded by an all-in-line system in a single-shot

no code implementations • 23 Nov 2019 • Ron Ziv, Alex Dikopoltsev, Tom Zahavy, Ittai Rubinstein, Pavel Sidorenko, Oren Cohen, Mordechai Segev

We propose a simple all-in-line single-shot scheme for diagnostics of ultrashort laser pulses, consisting of a multi-mode fiber, a nonlinear crystal and a CCD camera.

Paper
Add Code

Apprenticeship Learning via Frank-Wolfe

no code implementations • 5 Nov 2019 • Tom Zahavy, Alon Cohen, Haim Kaplan, Yishay Mansour

Specifically, we show that a variation of the FW method that is based on taking "away steps" achieves a linear rate of convergence when applied to AL and that a stochastic version of the FW algorithm can be used to avoid precise estimation of feature expectations.

Paper
Add Code

Contextual Inverse Reinforcement Learning

no code implementations • 25 Sep 2019 • Philip Korsunsky, Stav Belogolovsky, Tom Zahavy, Chen Tessler, Shie Mannor

In this setting, the reward, which is unknown to the agent, is a function of a static parameter referred to as the context.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood Matching

1 code implementation • 25 Sep 2019 • Tom Zahavy, Shie Mannor

We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.

Efficient Exploration Multi-Armed Bandits +3

Paper
Code

Unknown mixing times in apprenticeship and reinforcement learning

no code implementations • 23 May 2019 • Tom Zahavy, Alon Cohen, Haim Kaplan, Yishay Mansour

We derive and analyze learning algorithms for apprenticeship learning, policy evaluation, and policy gradient for average reward criteria.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Inverse Reinforcement Learning in Contextual MDPs

2 code implementations • 23 May 2019 • Stav Belogolovsky, Philip Korsunsky, Shie Mannor, Chen Tessler, Tom Zahavy

Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts).

Autonomous Driving reinforcement-learning +1

Paper
Code

Action Assembly: Sparse Imitation Learning for Text Based Games with Combinatorial Action Spaces

no code implementations • 23 May 2019 • Chen Tessler, Tom Zahavy, Deborah Cohen, Daniel J. Mankowitz, Shie Mannor

We propose a computationally efficient algorithm that combines compressed sensing with imitation learning to solve text-based games with combinatorial action spaces.

Imitation Learning text-based games +1

Paper
Add Code

Planning in Hierarchical Reinforcement Learning: Guarantees for Using Local Policies

no code implementations • 26 Feb 2019 • Tom Zahavy, Avinatan Hasidim, Haim Kaplan, Yishay Mansour

We consider a settings of hierarchical reinforcement learning, in which the reward is a sum of components.

Hierarchical Reinforcement Learning reinforcement-learning +2

Paper
Add Code

Deep Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood Matching

no code implementations • 24 Jan 2019 • Tom Zahavy, Shie Mannor

We study the neural-linear bandit model for solving sequential decision-making problems with high dimensional side information.

Decision Making Efficient Exploration +4

Paper
Add Code

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

no code implementations • NeurIPS 2018 • Tom Zahavy, Matan Haroush, Nadav Merlis, Daniel J. Mankowitz, Shie Mannor

Learning how to act when there are many available actions in each state is a challenging task for Reinforcement Learning (RL) agents, especially when many of the actions are redundant or irrelevant.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Deep Learning Reconstruction of Ultra-Short Pulses

no code implementations • 15 Mar 2018 • Tom Zahavy, Alex Dikopoltsev, Oren Cohen, Shie Mannor, Mordechai Segev

Ultra-short laser pulses with femtosecond to attosecond pulse duration are the shortest systematic events humans can create.

Paper
Add Code

Hierarchical Reinforcement Learning: Approximating Optimal Discounted TSP Using Local Policies

no code implementations • 13 Mar 2018 • Tom Zahavy, Avinatan Hasidim, Haim Kaplan, Yishay Mansour

In this work, we provide theoretical guarantees for reward decomposition in deterministic MDPs.

Hierarchical Reinforcement Learning reinforcement-learning +2

Paper
Add Code

Train on Validation: Squeezing the Data Lemon

no code implementations • 16 Feb 2018 • Guy Tennenholtz, Tom Zahavy, Shie Mannor

We define the notion of on-average-validation-stable algorithms as one in which using small portions of validation data for training does not overfit the model selection process.

Model Selection

Paper
Add Code

Shallow Updates for Deep Reinforcement Learning

no code implementations • NeurIPS 2017 • Nir Levine, Tom Zahavy, Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

In this work we propose a hybrid approach -- the Least Squares Deep Q-Network (LS-DQN), which combines rich feature representations learned by a DRL algorithm with the stability of a linear least squares method.

Atari Games Feature Engineering +2

Paper
Add Code

Is a picture worth a thousand words? A Deep Multi-Modal Fusion Architecture for Product Classification in e-commerce

no code implementations • 29 Nov 2016 • Tom Zahavy, Alessandro Magnani, Abhinandan Krishnan, Shie Mannor

Classifying products into categories precisely and efficiently is a major challenge in modern e-commerce.

General Classification

Paper
Add Code

Visualizing Dynamics: from t-SNE to SEMI-MDPs

no code implementations • 22 Jun 2016 • Nir Ben Zrihem, Tom Zahavy, Shie Mannor

Deep Reinforcement Learning (DRL) is a trending field of research, showing great promise in many challenging problems such as playing Atari, solving Go and controlling robots.

Paper
Add Code

Deep Reinforcement Learning Discovers Internal Models

no code implementations • 16 Jun 2016 • Nir Baram, Tom Zahavy, Shie Mannor

Deep Reinforcement Learning (DRL) is a trending field of research, showing great promise in challenging problems such as playing Atari, solving Go and controlling robots.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

A Deep Hierarchical Approach to Lifelong Learning in Minecraft

no code implementations • 25 Apr 2016 • Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J. Mankowitz, Shie Mannor

Skill distillation enables the HDRLN to efficiently retain knowledge and therefore scale in lifelong learning, by accumulating knowledge and encapsulating multiple reusable skills into a single distilled network.

Paper
Add Code

Graying the black box: Understanding DQNs

no code implementations • 8 Feb 2016 • Tom Zahavy, Nir Ben Zrihem, Shie Mannor

In recent years there is a growing interest in using deep representations for reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Ensemble Robustness and Generalization of Stochastic Deep Learning Algorithms

no code implementations • ICLR 2018 • Tom Zahavy, Bingyi Kang, Alex Sivak, Jiashi Feng, Huan Xu, Shie Mannor

As most deep learning algorithms are stochastic (e. g., Stochastic Gradient Descent, Dropout, and Bayes-by-backprop), we revisit the robustness arguments of Xu & Mannor, and introduce a new approach, ensemble robustness, that concerns the robustness of a population of hypotheses.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.