1 code implementation • 8 Aug 2024 • Marc Pickett, Aakash Kumar Nain, Joseph Modayil, Llion Jones
This paper examines a simplified version of the general problem, where an unsupervised learner is presented with a sequence of images for the characters in a text corpus, and this learner is later evaluated on its ability to recognize specific (possibly rare) sequential patterns.
no code implementations • 3 Nov 2023 • Joseph Modayil, Zaheer Abbas
RL practitioners lack a systematic way to study how well a single RL algorithm performs when instantiated across a range of problem scales, and they lack function approximation techniques that scale well with unstructured observations.
no code implementations • 13 Mar 2023 • Zaheer Abbas, Rosie Zhao, Joseph Modayil, Adam White, Marlos C. Machado
The ability to learn continually is essential in a complex and changing world.
no code implementations • 17 Mar 2022 • Patrick M. Pilarski, Andrew Butcher, Elnaz Davoodi, Michael Bradley Johanson, Dylan J. A. Brenneis, Adam S. R. Parker, Leslie Acker, Matthew M. Botvinick, Joseph Modayil, Adam White
Our results showcase the speed of learning for Pavlovian signalling, the impact that different temporal representations do (and do not) have on agent-agent coordination, and how temporal aliasing impacts agent-agent and human-agent interactions differently.
no code implementations • 11 Jan 2022 • Andrew Butcher, Michael Bradley Johanson, Elnaz Davoodi, Dylan J. A. Brenneis, Leslie Acker, Adam S. R. Parker, Adam White, Joseph Modayil, Patrick M. Pilarski
We further show how to computationally build this adaptive signalling process out of a fixed signalling process, characterized by fast continual prediction learning and minimal constraints on the nature of the agent receiving signals.
no code implementations • 14 Dec 2021 • Dylan J. A. Brenneis, Adam S. Parker, Michael Bradley Johanson, Andrew Butcher, Elnaz Davoodi, Leslie Acker, Matthew M. Botvinick, Joseph Modayil, Adam White, Patrick M. Pilarski
Additionally, we compare two different agent architectures to assess how representational choices in agent design affect the human-agent interaction.
no code implementations • 17 Jun 2021 • John D. Martin, Joseph Modayil
However, prevailing optimization techniques are not designed for strictly-incremental online updates.
no code implementations • 5 Jul 2019 • Matteo Hessel, Hado van Hasselt, Joseph Modayil, David Silver
These inductive biases can take many forms, including domain knowledge and pretuned hyper-parameters.
no code implementations • 25 Apr 2019 • Tom Schaul, Diana Borsa, Joseph Modayil, Razvan Pascanu
Rather than proposing a new method, this paper investigates an issue present in existing learning algorithms.
no code implementations • 6 Dec 2018 • Hado van Hasselt, Yotam Doron, Florian Strub, Matteo Hessel, Nicolas Sonnerat, Joseph Modayil
In this work, we investigate the impact of the deadly triad in practice, in the context of a family of popular deep reinforcement learning models - deep Q-networks trained with experience replay - analysing how the components of this system play a role in the emergence of the deadly triad, and in the agent's performance
no code implementations • 16 Nov 2018 • Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc Bellemare, Doina Precup
We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments.
no code implementations • NeurIPS 2017 • Zhongwen Xu, Joseph Modayil, Hado P. Van Hasselt, Andre Barreto, David Silver, Tom Schaul
Neural networks have a smooth initial inductive bias, such that small changes in input do not lead to large changes in output.
32 code implementations • 6 Oct 2017 • Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver
The deep reinforcement learning community has made several independent improvements to the DQN algorithm.
no code implementations • NeurIPS 2014 • Hengshuai Yao, Csaba Szepesvari, Richard S. Sutton, Joseph Modayil, Shalabh Bhatnagar
We prove that the UOM of an option can construct a traditional option model given a reward function, and the option-conditional return is computed directly by a single dot-product of the UOM with the reward function.
no code implementations • 6 Dec 2011 • Joseph Modayil, Adam White, Richard S. Sutton
The term "nexting" has been used by psychologists to refer to the propensity of people and many other animals to continually predict what will happen next in an immediate, local, and personal sense.