no code implementations • 15 Aug 2024 • Ali Pourranjbar, Georges Kaddoum, Verdier Assoume Mba, Sahil Garg, Satinder Singh
Unlike previous works that assume an ideal environment with precise knowledge of subcarrier count and cyclic prefix location, we consider blind modulation detection while accounting for realistic environmental parameters and imperfections.
no code implementations • 23 Feb 2024 • Jake Bruce, Michael Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Bechtle, Feryal Behbahani, Stephanie Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott Reed, Jingwei Zhang, Konrad Zolna, Jeff Clune, Nando de Freitas, Satinder Singh, Tim Rocktäschel
We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos.
no code implementations • 17 Aug 2023 • Tom Zahavy, Vivek Veeriah, Shaobo Hou, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut, Demis Hassabis, Satinder Singh
In particular, we investigate whether a team of diverse AI systems can outperform a single AI in challenging tasks by generating more ideas as a group and then selecting the best ones.
no code implementations • NeurIPS 2023 • David Abel, André Barreto, Benjamin Van Roy, Doina Precup, Hado van Hasselt, Satinder Singh
Using this new language, we define a continual learning agent as one that can be understood as carrying out an implicit search process indefinitely, and continual reinforcement learning as the setting in which the best agents are all continual learning agents.
no code implementations • 20 Jul 2023 • David Abel, André Barreto, Hado van Hasselt, Benjamin Van Roy, Doina Precup, Satinder Singh
Standard models of the reinforcement learning problem give rise to a straightforward definition of convergence: An agent converges when its behavior or performance in each environment state stops changing.
2 code implementations • NeurIPS 2023 • Chris Lu, Yannick Schroecker, Albert Gu, Emilio Parisotto, Jakob Foerster, Satinder Singh, Feryal Behbahani
We propose a modification to a variant of S4 that enables us to initialise and reset the hidden state in parallel, allowing us to tackle reinforcement learning tasks.
no code implementations • 28 Feb 2023 • Bernardo Avila Pires, Feryal Behbahani, Hubert Soyer, Kyriacos Nikiforou, Thomas Keck, Satinder Singh
Hierarchical Reinforcement Learning (HRL) agents have the potential to demonstrate appealing capabilities such as planning and exploration with abstraction, transfer, and skill reuse.
Hierarchical Reinforcement Learning reinforcement-learning +2
no code implementations • 2 Feb 2023 • Ted Moskovitz, Brendan O'Donoghue, Vivek Veeriah, Sebastian Flennerhag, Satinder Singh, Tom Zahavy
Such applications often require to put constraints on the agent's behavior.
1 code implementation • 28 Jan 2023 • Wilka Carvalho, Angelos Filos, Richard L. Lewis, Honglak Lee, Satinder Singh
Recently, the Successor Features and Generalized Policy Improvement (SF&GPI) framework has been proposed as a method for learning, composing, and transferring predictive knowledge and behavior.
no code implementations • 30 Dec 2022 • Khimya Khetarpal, Claire Vernade, Brendan O'Donoghue, Satinder Singh, Tom Zahavy
We study the problem of planning under model uncertainty in an online meta-reinforcement learning (RL) setting where an agent is presented with a sequence of related tasks with limited interactions per task.
1 code implementation • 21 Nov 2022 • Robert Tjarko Lange, Tom Schaul, Yutian Chen, Tom Zahavy, Valentin Dallibard, Chris Lu, Satinder Singh, Sebastian Flennerhag
Optimizing functions without access to gradients is the remit of black-box methods such as evolution strategies.
no code implementations • 30 Oct 2022 • Dilip Arumugam, Satinder Singh
The Bayes-Adaptive Markov Decision Process (BAMDP) formalism pursues the Bayes-optimal solution to the exploration-exploitation trade-off in reinforcement learning.
1 code implementation • 25 Oct 2022 • Michael Laskin, Luyu Wang, Junhyuk Oh, Emilio Parisotto, Stephen Spencer, Richie Steigerwald, DJ Strouse, Steven Hansen, Angelos Filos, Ethan Brooks, Maxime Gazeau, Himanshu Sahni, Satinder Singh, Volodymyr Mnih
We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model.
no code implementations • 19 Oct 2022 • Hao liu, Tom Zahavy, Volodymyr Mnih, Satinder Singh
In this work, we aim to bring the best of both worlds and propose an algorithm that exhibits an exploratory behavior whilst it utilizes large diverse datasets.
no code implementations • 13 Sep 2022 • Jelena Luketina, Sebastian Flennerhag, Yannick Schroecker, David Abel, Tom Zahavy, Satinder Singh
We support these results with a qualitative analysis of resulting meta-parameter schedules and learned functions of context features.
1 code implementation • 30 Jun 2022 • Julien Perolat, Bart De Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch, Thomas Anthony, Stephen Mcaleer, Romuald Elie, Sarah H. Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot, Shayegan Omidshafiei, Edward Lockhart, Laurent SIfre, Nathalie Beauguerlange, Remi Munos, David Silver, Satinder Singh, Demis Hassabis, Karl Tuyls
It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes).
no code implementations • 26 May 2022 • Tom Zahavy, Yannick Schroecker, Feryal Behbahani, Kate Baumli, Sebastian Flennerhag, Shaobo Hou, Satinder Singh
Finding different solutions to the same problem is a key aspect of intelligence associated with creativity and adaptation to novel situations.
no code implementations • 8 Feb 2022 • Vivek Veeriah, Zeyu Zheng, Richard Lewis, Satinder Singh
Our empirical work shows that it is feasible to learn to select both primitive-action and option affordances, and that simultaneously learning to select affordances and planning with a learned value-equivalent model can outperform model-free RL.
no code implementations • NeurIPS 2021 • David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, Satinder Singh
We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists.
no code implementations • EMNLP (NLP4ConvAI) 2021 • Janarthanan Rajendran, Jonathan K. Kummerfeld, Satinder Singh
For each goal-oriented dialog task of interest, large amounts of data need to be collected for end-to-end learning of a neural dialog system.
1 code implementation • ICLR 2022 • Sebastian Flennerhag, Yannick Schroecker, Tom Zahavy, Hado van Hasselt, David Silver, Satinder Singh
We achieve a new state-of-the art for model-free agents on the Atari ALE benchmark and demonstrate that it yields both performance and efficiency gains in multi-task meta-learning.
1 code implementation • NeurIPS 2021 • Christopher Grimm, André Barreto, Gregory Farquhar, David Silver, Satinder Singh
The value-equivalence (VE) principle proposes a simple answer to this question: a model should capture the aspects of the environment that are relevant for value-based planning.
Model-based Reinforcement Learning Reinforcement Learning (RL)
no code implementations • NeurIPS 2021 • Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, Satinder Singh
Maximising a cumulative reward function that is Markov and stationary, i. e., defined over state-action pairs and independent of time, is sufficient to capture many kinds of goals in a Markov decision process (MDP).
no code implementations • ICML Workshop URL 2021 • Tom Zahavy, Brendan O'Donoghue, Andre Barreto, Volodymyr Mnih, Sebastian Flennerhag, Satinder Singh
We propose Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features, while assuring that they are near optimal.
no code implementations • 25 Feb 2021 • Ethan A. Brooks, Janarthanan Rajendran, Richard L. Lewis, Satinder Singh
Learning to flexibly follow task instructions in dynamic environments poses interesting challenges for reinforcement learning agents.
no code implementations • NeurIPS 2021 • Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado van Hasselt, David Silver, Satinder Singh
Temporal abstractions in the form of options have been shown to help reinforcement learning (RL) agents learn faster.
1 code implementation • NeurIPS 2021 • Zeyu Zheng, Vivek Veeriah, Risto Vuorio, Richard Lewis, Satinder Singh
Our main contribution in this work is an empirical finding that random General Value Functions (GVFs), i. e., deep action-conditional predictions -- random both in what feature of observations they predict as well as in the sequence of actions the predictions are conditioned upon -- form good auxiliary tasks for reinforcement learning (RL) problems.
no code implementations • 9 Feb 2021 • Zeyu Zheng, Risto Vuorio, Richard Lewis, Satinder Singh
In this empirical paper, we explore heuristics based on more general pairwise weightings that are functions of the state in which the action was taken, the state at the time of the reward, as well as the time interval between the two.
no code implementations • ICLR 2021 • Tom Zahavy, Andre Barreto, Daniel J Mankowitz, Shaobo Hou, Brendan O'Donoghue, Iurii Kemaev, Satinder Singh
Our main contribution is a policy iteration algorithm that builds a set of policies in order to maximize the worst-case performance of the resulting SMP on the set of tasks.
no code implementations • 14 Dec 2020 • Qi Zhang, Edmund H. Durfee, Satinder Singh
Multiagent systems can use commitments as the core of a general coordination infrastructure, supporting both cooperative and non-cooperative interactions.
no code implementations • NeurIPS 2020 • Zheng Wen, Doina Precup, Morteza Ibrahimi, Andre Barreto, Benjamin Van Roy, Satinder Singh
Hierarchical Reinforcement Learning (HRL) approaches promise to provide more efficient solutions to sequential decision making problems, both in terms of statistical as well as computational efficiency.
no code implementations • NeurIPS 2020 • Christopher Grimm, André Barreto, Satinder Singh, David Silver
As our main contribution, we introduce the principle of value equivalence: two models are value equivalent with respect to a set of functions and policies if they yield the same Bellman updates.
Model-based Reinforcement Learning reinforcement-learning +3
no code implementations • 28 Oct 2020 • Wilka Carvalho, Anthony Liang, Kimin Lee, Sungryull Sohn, Honglak Lee, Richard L. Lewis, Satinder Singh
In this work, we show that one can learn object-interaction tasks from scratch without supervision by learning an attentive object-model as an auxiliary task during task learning with an object-centric relational RL agent.
2 code implementations • NeurIPS 2020 • Junhyuk Oh, Matteo Hessel, Wojciech M. Czarnecki, Zhongwen Xu, Hado van Hasselt, Satinder Singh, David Silver
Automating the discovery of update rules from data could lead to more efficient algorithms, or algorithms that are better adapted to specific environments.
no code implementations • NeurIPS 2020 • Zhongwen Xu, Hado van Hasselt, Matteo Hessel, Junhyuk Oh, Satinder Singh, David Silver
In this work, we propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network, solely from interactive experience with its environment.
2 code implementations • NeurIPS 2020 • Thomas Anthony, Tom Eccles, Andrea Tacchetti, János Kramár, Ian Gemp, Thomas C. Hudson, Nicolas Porcel, Marc Lanctot, Julien Pérolat, Richard Everett, Roman Werpachowski, Satinder Singh, Thore Graepel, Yoram Bachrach
It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms.
no code implementations • NeurIPS 2020 • Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh
Reinforcement learning algorithms are highly sensitive to the choice of hyperparameters, typically requiring significant manual effort to identify hyperparameters that perform well on a new domain.
no code implementations • 15 Dec 2019 • Janarthanan Rajendran, Richard Lewis, Vivek Veeriah, Honglak Lee, Satinder Singh
We present a method for learning intrinsic reward functions to drive the learning of an agent during periods of practice in which extrinsic task rewards are not available.
no code implementations • ICML 2020 • Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss, Hado van Hasselt, David Silver, Satinder Singh
Furthermore, we show that unlike policy transfer methods that capture "how" the agent should behave, the learned reward functions can generalise to other kinds of agents and to changes in the dynamics of the environment by capturing "what" the agent should strive to do.
1 code implementation • NeurIPS 2019 • Anna Harutyunyan, Will Dabney, Thomas Mesnard, Mohammad Azar, Bilal Piot, Nicolas Heess, Hado van Hasselt, Greg Wayne, Satinder Singh, Doina Precup, Remi Munos
We consider the problem of efficient credit assignment in reinforcement learning.
no code implementations • NeurIPS 2019 • Philip Paquette, Yuchen Lu, Seton Steven Bocco, Max Smith, Satya O.-G., Jonathan K. Kummerfeld, Joelle Pineau, Satinder Singh, Aaron C. Courville
Diplomacy is a seven-player non-stochastic, non-cooperative game, where agents acquire resources through a mix of teamwork and betrayal.
no code implementations • 25 Nov 2019 • John Holler, Risto Vuorio, Zhiwei Qin, Xiaocheng Tang, Yan Jiao, Tiancheng Jin, Satinder Singh, Chenxi Wang, Jieping Ye
Order dispatching and driver repositioning (also known as fleet management) in the face of spatially and temporally varying supply and demand are central to a ride-sharing platform marketplace.
no code implementations • 25 Nov 2019 • Christopher Grimm, Irina Higgins, Andre Barreto, Denis Teplyashin, Markus Wulfmeier, Tim Hertweck, Raia Hadsell, Satinder Singh
This is in contrast to the state-of-the-art reinforcement learning agents, which typically start learning each new task from scratch and struggle with knowledge transfer.
no code implementations • 31 Oct 2019 • Victor Bapst, Alvaro Sanchez-Gonzalez, Omar Shams, Kimberly Stachenfeld, Peter W. Battaglia, Satinder Singh, Jessica B. Hamrick
We introduce agents that use object-oriented reasoning to consider alternate states of the world in order to more quickly find solutions to problems.
no code implementations • 23 Oct 2019 • Aditya Modi, Nan Jiang, Ambuj Tewari, Satinder Singh
As an extension, we also consider the more challenging problem of model selection, where the state features are unknown and can be chosen from a large candidate set.
no code implementations • 25 Sep 2019 • Yoram Bachrach, Tor Lattimore, Marta Garnelo, Julien Perolat, David Balduzzi, Thomas Anthony, Satinder Singh, Thore Graepel
We show that MARL converges to the desired outcome if the rewards are designed so that exerting effort is the iterated dominance solution, but fails if it is merely a Nash equilibrium.
no code implementations • NeurIPS 2019 • Vivek Veeriah, Matteo Hessel, Zhongwen Xu, Richard Lewis, Janarthanan Rajendran, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh
Arguably, intelligent agents ought to be able to discover their own questions so that in learning answers for them they learn unanticipated useful knowledge and skills; this departs from the focus in much of machine learning on agents learning answers to externally defined questions.
1 code implementation • 4 Sep 2019 • Philip Paquette, Yuchen Lu, Steven Bocco, Max O. Smith, Satya Ortiz-Gagne, Jonathan K. Kummerfeld, Satinder Singh, Joelle Pineau, Aaron Courville
Diplomacy is a seven-player non-stochastic, non-cooperative game, where agents acquire resources through a mix of teamwork and betrayal.
3 code implementations • ICLR 2020 • Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado van Hasselt
bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement learning (RL) agents with two objectives.
no code implementations • 24 Jan 2019 • Christopher Grimm, Satinder Singh
We present a novel method for learning a set of disentangled reward functions that sum to the original environment reward and are constrained to be independently obtainable.
no code implementations • ICLR 2019 • Yijie Guo, Junhyuk Oh, Satinder Singh, Honglak Lee
This paper explores a simple regularizer for reinforcement learning by proposing Generative Adversarial Self-Imitation Learning (GASIL), which encourages the agent to imitate past good trajectories via generative adversarial imitation learning framework.
no code implementations • NeurIPS 2018 • Nan Jiang, Alex Kulesza, Satinder Singh
A central problem in dynamical system modeling is state discovery—that is, finding a compact summary of the past that captures the information needed to predict the future.
1 code implementation • EMNLP 2018 • Janarthanan Rajendran, Jatin Ganhotra, Satinder Singh, Lazaros Polymenakos
We also propose a new and more effective testbed, permuted-bAbI dialog tasks, by introducing multiple valid next utterances to the original-bAbI dialog tasks, which allows evaluation of goal-oriented dialog systems in a more realistic setting.
no code implementations • 22 Jun 2018 • Vivek Veeriah, Junhyuk Oh, Satinder Singh
Second, we explore whether many-goals updating can be used to pre-train a network to subsequently learn faster and better on a single main task of interest.
4 code implementations • ICML 2018 • Junhyuk Oh, Yijie Guo, Satinder Singh, Honglak Lee
This paper proposes Self-Imitation Learning (SIL), a simple off-policy actor-critic algorithm that learns to reproduce the agent's past good decisions.
Ranked #3 on Atari Games on Atari 2600 Atlantis
1 code implementation • RANLP 2019 • Janarthanan Rajendran, Jatin Ganhotra, Xiaoxiao Guo, Mo Yu, Satinder Singh, Lazaros Polymenakos
Many Natural Language Processing (NLP) tasks depend on using Named Entities (NEs) that are contained in texts and in external knowledge sources.
1 code implementation • NeurIPS 2018 • Zeyu Zheng, Junhyuk Oh, Satinder Singh
In this paper we derive a novel algorithm for learning intrinsic rewards for policy-gradient based learning agents.
1 code implementation • 8 Mar 2018 • Jiaxuan Wang, Ian Fox, Jonathan Skaza, Nick Linck, Satinder Singh, Jenna Wiens
During the 2017 NBA playoffs, Celtics coach Brad Stevens was faced with a difficult decision when defending against the Cavaliers: "Do you double and risk giving up easy shots, or stay at home and do the best you can?"
no code implementations • ICLR 2018 • Janarthanan Rajendran, Jatin Ganhotra, Xiaoxiao Guo, Mo Yu, Satinder Singh
Many goal-oriented dialog tasks, especially ones in which the dialog system has to interact with external knowledge sources such as databases, have to handle a large number of Named Entities (NEs).
no code implementations • 15 Nov 2017 • Aditya Modi, Nan Jiang, Satinder Singh, Ambuj Tewari
Because our lower bound has an exponential dependence on the dimension, we consider a tractable linear setting where the context is used to create linear combinations of a finite set of MDPs.
2 code implementations • NeurIPS 2017 • Junhyuk Oh, Satinder Singh, Honglak Lee
This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network.
Ranked #9 on Atari Games on Atari 2600 Krull
no code implementations • ACL 2017 • Ver{\'o}nica P{\'e}rez-Rosas, Rada Mihalcea, Kenneth Resnicow, Satinder Singh, Lawrence An
Counselor empathy is associated with better outcomes in psychology and behavioral counseling.
1 code implementation • ICML 2017 • Junhyuk Oh, Satinder Singh, Honglak Lee, Pushmeet Kohli
As a step towards developing zero-shot task generalization capabilities in reinforcement learning (RL), we introduce a new RL problem where the agent should learn to execute sequences of instructions after learning useful skills that solve subtasks.
no code implementations • NeurIPS 2017 • Kareem Amin, Nan Jiang, Satinder Singh
We introduce a novel repeated Inverse Reinforcement Learning problem: the agent has to act on behalf of a human in a sequence of tasks and wishes to minimize the number of tasks that it surprises the human by acting suboptimally with respect to how the human would have acted.
no code implementations • EACL 2017 • Ver{\'o}nica P{\'e}rez-Rosas, Rada Mihalcea, Kenneth Resnicow, Satinder Singh, Lawrence An, Kathy J. Goggin, Delwyn Catley
As the number of people receiving psycho-therapeutic treatment increases, the automatic evaluation of counseling practice arises as an important challenge in the clinical domain.
no code implementations • 14 Mar 2017 • Qi Zhang, Satinder Singh, Edmund Durfee
In cooperative multiagent planning, it can often be beneficial for an agent to make commitments about aspects of its behavior to others, allowing them in turn to plan their own behaviors without taking the agent's detailed behavior into account.
no code implementations • 30 May 2016 • Junhyuk Oh, Valliappa Chockalingam, Satinder Singh, Honglak Lee
In this paper, we introduce a new set of reinforcement learning (RL) tasks in Minecraft (a flexible 3D world).
no code implementations • 24 Apr 2016 • Xiaoxiao Guo, Satinder Singh, Richard Lewis, Honglak Lee
We present an adaptation of PGRD (policy-gradient for reward-design) for learning a reward-bonus function to improve UCT (a MCTS algorithm).
no code implementations • 25 Jan 2016 • Kareem Amin, Satinder Singh
We first demonstrate that if the learner can experiment with any transition dynamics on some fixed set of states and actions, then there exists an algorithm that reconstructs the agent's reward function to the fullest extent theoretically possible, and that requires only a small (logarithmic) number of experiments.
1 code implementation • NeurIPS 2015 • Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard Lewis, Satinder Singh
Motivated by vision-based reinforcement learning (RL) problems, in particular Atari games from the recent benchmark Aracade Learning Environment (ALE), we consider spatio-temporal prediction problems where future (image-)frames are dependent on control variables or actions as well as previous frames.
no code implementations • NeurIPS 2014 • Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard L. Lewis, Xiaoshi Wang
The combination of modern Reinforcement Learning and Deep Learning approaches holds the promise of making significant progress on challenging applications requiring both rich perception and policy-selection.
no code implementations • 16 Jan 2014 • Erik Talvitie, Satinder Singh
We formalize the problem of learning a prediction profile model as a transformation of the original model-learning problem, and show empirically that one can learn prediction profile models that make a small set of important predictions even in systems that are too complex for standard generative models.
no code implementations • NeurIPS 2013 • Xiaoxiao Guo, Satinder Singh, Richard L. Lewis
We demonstrate that our approach can substantially improve the agent's performance relative to other approaches, including an approach that transfers policies.
no code implementations • 10 Jan 2013 • Michael Kearns, Michael L. Littman, Satinder Singh
The interpretation is that the payoff to player i is determined entirely by the actions of player i and his neighbors in the graph, and thus the payoff matrix to player i is indexed only by these players.
1 code implementation • Artificial Intelligence 1999 • Richard S. Sutton, Doina Precup, Satinder Singh
In particular, we show that options may be used interchangeably with primitive actions in planning methods such as dynamic programming and in learning methods such as Q-learning.