no code implementations • 2 Dec 2024 • John Schultz, Jakub Adamek, Matej Jusup, Marc Lanctot, Michael Kaisers, Sarah Perrin, Daniel Hennes, Jeremy Shar, Cannada Lewis, Anian Ruoss, Tom Zahavy, Petar Veličković, Laurel Prince, Satinder Singh, Eric Malmi, Nenad Tomašev
While large language models perform well on a range of complex tasks (e. g., text generation, question answering, summarization), robust multi-step planning and reasoning remains a considerable challenge for them.
no code implementations • 24 Aug 2023 • Hadar Schreiber Galler, Tom Zahavy, Guillaume Desjardins, Alon Cohen
This problem is formulated as mutual training of skills using an intrinsic reward and a discriminator trained to predict a skill given its trajectory.
no code implementations • 17 Aug 2023 • Tom Zahavy, Vivek Veeriah, Shaobo Hou, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut, Demis Hassabis, Satinder Singh
In particular, we investigate whether a team of diverse AI systems can outperform a single AI in challenging tasks by generating more ideas as a group and then selecting the best ones.
no code implementations • 18 Jun 2023 • Veronica Chelu, Tom Zahavy, Arthur Guez, Doina Precup, Sebastian Flennerhag
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates.
1 code implementation • 8 Apr 2023 • Robert Tjarko Lange, Tom Schaul, Yutian Chen, Chris Lu, Tom Zahavy, Valentin Dalibard, Sebastian Flennerhag
Genetic algorithms constitute a family of black-box optimization algorithms, which take inspiration from the principles of biological evolution.
no code implementations • 2 Feb 2023 • Ted Moskovitz, Brendan O'Donoghue, Vivek Veeriah, Sebastian Flennerhag, Satinder Singh, Tom Zahavy
Such applications often require to put constraints on the agent's behavior.
no code implementations • 30 Dec 2022 • Khimya Khetarpal, Claire Vernade, Brendan O'Donoghue, Satinder Singh, Tom Zahavy
We study the problem of planning under model uncertainty in an online meta-reinforcement learning (RL) setting where an agent is presented with a sequence of related tasks with limited interactions per task.
1 code implementation • 21 Nov 2022 • Robert Tjarko Lange, Tom Schaul, Yutian Chen, Tom Zahavy, Valentin Dallibard, Chris Lu, Satinder Singh, Sebastian Flennerhag
Optimizing functions without access to gradients is the remit of black-box methods such as evolution strategies.
no code implementations • 19 Oct 2022 • Hao liu, Tom Zahavy, Volodymyr Mnih, Satinder Singh
In this work, we aim to bring the best of both worlds and propose an algorithm that exhibits an exploratory behavior whilst it utilizes large diverse datasets.
no code implementations • 13 Sep 2022 • Jelena Luketina, Sebastian Flennerhag, Yannick Schroecker, David Abel, Tom Zahavy, Satinder Singh
We support these results with a qualitative analysis of resulting meta-parameter schedules and learned functions of context features.
no code implementations • 26 May 2022 • Tom Zahavy, Yannick Schroecker, Feryal Behbahani, Kate Baumli, Sebastian Flennerhag, Shaobo Hou, Satinder Singh
Finding different solutions to the same problem is a key aspect of intelligence associated with creativity and adaptation to novel situations.
1 code implementation • ICLR 2022 • Sebastian Flennerhag, Yannick Schroecker, Tom Zahavy, Hado van Hasselt, David Silver, Satinder Singh
We achieve a new state-of-the art for model-free agents on the Atari ALE benchmark and demonstrate that it yields both performance and efficiency gains in multi-task meta-learning.
no code implementations • 21 Jun 2021 • Ray Jiang, Tom Zahavy, Zhongwen Xu, Adam White, Matteo Hessel, Charles Blundell, Hado van Hasselt
In this paper, we extend the use of emphatic methods to deep reinforcement learning agents.
no code implementations • NeurIPS 2021 • Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, Satinder Singh
Maximising a cumulative reward function that is Markov and stationary, i. e., defined over state-action pairs and independent of time, is sufficient to capture many kinds of goals in a Markov decision process (MDP).
no code implementations • ICML Workshop URL 2021 • Tom Zahavy, Brendan O'Donoghue, Andre Barreto, Volodymyr Mnih, Sebastian Flennerhag, Satinder Singh
We propose Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features, while assuring that they are near optimal.
no code implementations • 13 Feb 2021 • Lior Shani, Tom Zahavy, Shie Mannor
Finally, we implement a deep variant of our algorithm which shares some similarities to GAIL \cite{ho2016generative}, but where the discriminator is replaced with the costs learned by the OAL problem.
no code implementations • NeurIPS 2021 • Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado van Hasselt, David Silver, Satinder Singh
Temporal abstractions in the form of options have been shown to help reinforcement learning (RL) agents learn faster.
no code implementations • ICLR 2021 • Tom Zahavy, Andre Barreto, Daniel J Mankowitz, Shaobo Hou, Brendan O'Donoghue, Iurii Kemaev, Satinder Singh
Our main contribution is a policy iteration algorithm that builds a set of policies in order to maximize the worst-case performance of the resulting SMP on the set of tasks.
3 code implementations • 7 Feb 2021 • Ofir Nabati, Tom Zahavy, Shie Mannor
To alleviate this, we propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.
no code implementations • 1 Jan 2021 • Tom Zahavy, Ofir Nabati, Leor Cohen, Shie Mannor
We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.
no code implementations • ICLR 2021 • Dan A. Calian, Daniel J. Mankowitz, Tom Zahavy, Zhongwen Xu, Junhyuk Oh, Nir Levine, Timothy Mann
Deploying Reinforcement Learning (RL) agents to solve real-world applications often requires satisfying complex system constraints.
1 code implementation • 31 Mar 2020 • Uri Shaham, Tom Zahavy, Cesar Caraballo, Shiwani Mahajan, Daisy Massey, Harlan Krumholz
We propose a novel reinforcement learning-based approach for adaptive and iterative feature selection.
no code implementations • NeurIPS 2020 • Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh
Reinforcement learning algorithms are highly sensitive to the choice of hyperparameters, typically requiring significant manual effort to identify hyperparameters that perform well on a new domain.
no code implementations • ICLR 2020 • Guy Adam, Tom Zahavy, Oron Anschel, Nahum Shimkin
Rather than using hand-design state representation, we use a state representation that is being learned directly from the data by a DQN agent.
no code implementations • 23 Nov 2019 • Ron Ziv, Alex Dikopoltsev, Tom Zahavy, Ittai Rubinstein, Pavel Sidorenko, Oren Cohen, Mordechai Segev
We propose a simple all-in-line single-shot scheme for diagnostics of ultrashort laser pulses, consisting of a multi-mode fiber, a nonlinear crystal and a CCD camera.
no code implementations • 5 Nov 2019 • Tom Zahavy, Alon Cohen, Haim Kaplan, Yishay Mansour
Specifically, we show that a variation of the FW method that is based on taking "away steps" achieves a linear rate of convergence when applied to AL and that a stochastic version of the FW algorithm can be used to avoid precise estimation of feature expectations.
no code implementations • 25 Sep 2019 • Philip Korsunsky, Stav Belogolovsky, Tom Zahavy, Chen Tessler, Shie Mannor
In this setting, the reward, which is unknown to the agent, is a function of a static parameter referred to as the context.
1 code implementation • 25 Sep 2019 • Tom Zahavy, Shie Mannor
We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.
no code implementations • 23 May 2019 • Tom Zahavy, Alon Cohen, Haim Kaplan, Yishay Mansour
We derive and analyze learning algorithms for apprenticeship learning, policy evaluation, and policy gradient for average reward criteria.
no code implementations • 23 May 2019 • Chen Tessler, Tom Zahavy, Deborah Cohen, Daniel J. Mankowitz, Shie Mannor
We propose a computationally efficient algorithm that combines compressed sensing with imitation learning to solve text-based games with combinatorial action spaces.
2 code implementations • 23 May 2019 • Stav Belogolovsky, Philip Korsunsky, Shie Mannor, Chen Tessler, Tom Zahavy
Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts).
no code implementations • 26 Feb 2019 • Tom Zahavy, Avinatan Hasidim, Haim Kaplan, Yishay Mansour
We consider a settings of hierarchical reinforcement learning, in which the reward is a sum of components.
Hierarchical Reinforcement Learning
reinforcement-learning
+3
no code implementations • 24 Jan 2019 • Tom Zahavy, Shie Mannor
We study the neural-linear bandit model for solving sequential decision-making problems with high dimensional side information.
no code implementations • NeurIPS 2018 • Tom Zahavy, Matan Haroush, Nadav Merlis, Daniel J. Mankowitz, Shie Mannor
Learning how to act when there are many available actions in each state is a challenging task for Reinforcement Learning (RL) agents, especially when many of the actions are redundant or irrelevant.
no code implementations • 15 Mar 2018 • Tom Zahavy, Alex Dikopoltsev, Oren Cohen, Shie Mannor, Mordechai Segev
Ultra-short laser pulses with femtosecond to attosecond pulse duration are the shortest systematic events humans can create.
no code implementations • 13 Mar 2018 • Tom Zahavy, Avinatan Hasidim, Haim Kaplan, Yishay Mansour
In this work, we provide theoretical guarantees for reward decomposition in deterministic MDPs.
Hierarchical Reinforcement Learning
reinforcement-learning
+3
no code implementations • 16 Feb 2018 • Guy Tennenholtz, Tom Zahavy, Shie Mannor
We define the notion of on-average-validation-stable algorithms as one in which using small portions of validation data for training does not overfit the model selection process.
no code implementations • NeurIPS 2017 • Nir Levine, Tom Zahavy, Daniel J. Mankowitz, Aviv Tamar, Shie Mannor
In this work we propose a hybrid approach -- the Least Squares Deep Q-Network (LS-DQN), which combines rich feature representations learned by a DRL algorithm with the stability of a linear least squares method.
no code implementations • 29 Nov 2016 • Tom Zahavy, Alessandro Magnani, Abhinandan Krishnan, Shie Mannor
Classifying products into categories precisely and efficiently is a major challenge in modern e-commerce.
no code implementations • 22 Jun 2016 • Nir Ben Zrihem, Tom Zahavy, Shie Mannor
Deep Reinforcement Learning (DRL) is a trending field of research, showing great promise in many challenging problems such as playing Atari, solving Go and controlling robots.
no code implementations • 16 Jun 2016 • Nir Baram, Tom Zahavy, Shie Mannor
Deep Reinforcement Learning (DRL) is a trending field of research, showing great promise in challenging problems such as playing Atari, solving Go and controlling robots.
no code implementations • 25 Apr 2016 • Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J. Mankowitz, Shie Mannor
Skill distillation enables the HDRLN to efficiently retain knowledge and therefore scale in lifelong learning, by accumulating knowledge and encapsulating multiple reusable skills into a single distilled network.
no code implementations • 8 Feb 2016 • Tom Zahavy, Nir Ben Zrihem, Shie Mannor
In recent years there is a growing interest in using deep representations for reinforcement learning.
no code implementations • ICLR 2018 • Tom Zahavy, Bingyi Kang, Alex Sivak, Jiashi Feng, Huan Xu, Shie Mannor
As most deep learning algorithms are stochastic (e. g., Stochastic Gradient Descent, Dropout, and Bayes-by-backprop), we revisit the robustness arguments of Xu & Mannor, and introduce a new approach, ensemble robustness, that concerns the robustness of a population of hypotheses.