no code implementations • 30 Mar 2023 • Tyler Malloy, Miao Liu, Matthew D. Riemer, Tim Klinger, Gerald Tesauro, Chris R. Sims
This raises the question of how humans learn to efficiently represent visual information in a manner useful for learning tasks.
no code implementations • 28 Oct 2022 • Dong-Ki Kim, Matthew Riemer, Miao Liu, Jakob N. Foerster, Gerald Tesauro, Jonathan P. How
By directly comparing active equilibria to Nash equilibria in these examples, we find that active equilibria find more effective solutions than Nash equilibria, concluding that an active equilibrium is the desired solution for multiagent learning settings.
1 code implementation • 7 Mar 2022 • Dong-Ki Kim, Matthew Riemer, Miao Liu, Jakob N. Foerster, Michael Everett, Chuangchuang Sun, Gerald Tesauro, Jonathan P. How
An effective approach that has recently emerged for addressing this non-stationarity is for each agent to anticipate the learning of other agents and influence the evolution of future policies towards desirable behavior for its own benefit.
1 code implementation • 20 Sep 2021 • Marwa Abdulhai, Dong-Ki Kim, Matthew Riemer, Miao Liu, Gerald Tesauro, Jonathan P. How
Hierarchical reinforcement learning has focused on discovering temporally extended actions, such as options, that can provide benefits in problems requiring extensive exploration.
no code implementations • NeurIPS 2020 • Gang Wang, Songtao Lu, Georgios Giannakis, Gerald Tesauro, Jian Sun
The present contribution deals with decentralized policy evaluation in multi-agent Markov decision processes using temporal-difference (TD) methods with linear function approximation for scalability.
no code implementations • 23 Nov 2020 • Tyler Malloy, Tim Klinger, Miao Liu, Matthew Riemer, Gerald Tesauro, Chris R. Sims
This paper introduces an information-theoretic constraint on learned policy complexity in the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) reinforcement learning algorithm.
1 code implementation • 31 Oct 2020 • Dong-Ki Kim, Miao Liu, Matthew Riemer, Chuangchuang Sun, Marwa Abdulhai, Golnaz Habibi, Sebastian Lopez-Cot, Gerald Tesauro, Jonathan P. How
A fundamental challenge in multiagent reinforcement learning is to learn beneficial behaviors in a shared environment with other simultaneously learning agents.
no code implementations • 9 Oct 2020 • Tyler Malloy, Chris R. Sims, Tim Klinger, Miao Liu, Matthew Riemer, Gerald Tesauro
We focus on the model-free reinforcement learning (RL) setting and formalize our approach in terms of an information-theoretic constraint on the complexity of learned policies.
2 code implementations • 8 Oct 2020 • Keerthiram Murugesan, Mattia Atzeni, Pavan Kapanipathi, Pushkar Shukla, Sadhana Kumaravel, Gerald Tesauro, Kartik Talamadupula, Mrinmaya Sachan, Murray Campbell
Text-based games have emerged as an important test-bed for Reinforcement Learning (RL) research, requiring RL agents to combine grounded language understanding with sequential decision making.
Ranked #1 on Commonsense Reasoning for RL on commonsense-rl
no code implementations • 12 Jul 2020 • Keerthiram Murugesan, Mattia Atzeni, Pavan Kapanipathi, Pushkar Shukla, Sadhana Kumaravel, Gerald Tesauro, Kartik Talamadupula, Mrinmaya Sachan, Murray Campbell
We introduce a number of RL agents that combine the sequential context with a dynamic graph representation of their beliefs of the world and commonsense knowledge from ConceptNet in different ways.
2 code implementations • 28 Apr 2020 • Cameron Allen, Michael Katz, Tim Klinger, George Konidaris, Matthew Riemer, Gerald Tesauro
Focused macros dramatically improve black-box planning efficiency across a wide range of planning domains, sometimes beating even state-of-the-art planners with access to a full domain model.
no code implementations • 31 Dec 2019 • Matthew Riemer, Ignacio Cases, Clemens Rosenbaum, Miao Liu, Gerald Tesauro
In this work we note that while this key assumption of the policy gradient theorems of option-critic holds in the tabular case, it is always violated in practice for the deep function approximation setting.
no code implementations • 25 Sep 2019 • Tyler James Malloy, Matthew Riemer, Miao Liu, Tim Klinger, Gerald Tesauro, Chris R. Sims
We formalize this type of bounded rationality in terms of an information-theoretic constraint on the complexity of policies that agents seek to learn.
1 code implementation • 11 Mar 2019 • Xiaoxiao Guo, Shiyu Chang, Mo Yu, Gerald Tesauro, Murray Campbell
The empirical results show that (1) the agents are able to leverage state expert sequences to learn faster than pure reinforcement learning baselines, (2) our tensor-based action inference model is advantageous compared to standard deep neural networks in inferring expert actions, and (3) the hybrid policy optimization objective is robust against noise in expert state sequences.
no code implementations • 7 Mar 2019 • Dong-Ki Kim, Miao Liu, Shayegan Omidshafiei, Sebastian Lopez-Cot, Matthew Riemer, Golnaz Habibi, Gerald Tesauro, Sami Mourad, Murray Campbell, Jonathan P. How
Collective learning can be greatly enhanced when agents effectively exchange knowledge with their peers.
3 code implementations • ICLR 2019 • Matthew Riemer, Ignacio Cases, Robert Ajemian, Miao Liu, Irina Rish, Yuhai Tu, Gerald Tesauro
In this work we propose a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples.
no code implementations • NeurIPS 2018 • Matthew Riemer, Miao Liu, Gerald Tesauro
Building systems that autonomously create temporal abstractions from data is a key challenge in scaling learning and planning in reinforcement learning.
no code implementations • 20 May 2018 • Shayegan Omidshafiei, Dong-Ki Kim, Miao Liu, Gerald Tesauro, Matthew Riemer, Christopher Amato, Murray Campbell, Jonathan P. How
The problem of teaching to improve agent learning has been investigated by prior works, but these approaches make assumptions that prevent application of teaching to general multiagent problems, or require domain expertise for problems they can apply to.
2 code implementations • NAACL 2018 • Mo Yu, Xiaoxiao Guo, Jin-Feng Yi, Shiyu Chang, Saloni Potdar, Yu Cheng, Gerald Tesauro, Haoyu Wang, Bo-Wen Zhou
We study few-shot learning in natural language domains.
1 code implementation • NeurIPS 2018 • Xiaoxiao Guo, Hui Wu, Yu Cheng, Steven Rennie, Gerald Tesauro, Rogerio Schmidt Feris
Experiments on both simulated and real-world data show that 1) our proposed learning framework achieves better accuracy than other supervised and reinforcement learning baselines and 2) user feedback based on natural language rather than pre-specified attributes leads to more effective retrieval results, and a more natural and expressive communication interface.
no code implementations • ICLR 2018 • Xiaoxiao Guo, Shiyu Chang, Mo Yu, Miao Liu, Gerald Tesauro
In this paper, we consider a realistic and more difficult sce- nario where a reinforcement learning agent only has access to the state sequences of an expert, while the expert actions are not available.
no code implementations • 11 Dec 2017 • Miao Liu, Marlos C. Machado, Gerald Tesauro, Murray Campbell
Eigenoptions (EOs) have been recently introduced as a promising idea for generating a diverse set of options through the graph Laplacian, having been shown to allow efficient exploration.
Efficient Exploration Hierarchical Reinforcement Learning +1
1 code implementation • ICLR 2018 • Shuohang Wang, Mo Yu, Jing Jiang, Wei zhang, Xiaoxiao Guo, Shiyu Chang, Zhiguo Wang, Tim Klinger, Gerald Tesauro, Murray Campbell
We propose two methods, namely, strength-based re-ranking and coverage-based re-ranking, to make use of the aggregated evidence from different passages to better determine the answer.
Ranked #1 on Open-Domain Question Answering on Quasar
1 code implementation • ICLR 2018 • Marlos C. Machado, Clemens Rosenbaum, Xiaoxiao Guo, Miao Liu, Gerald Tesauro, Murray Campbell
Options in reinforcement learning allow agents to hierarchically decompose a task into subtasks, having the potential to speed up learning and planning.
1 code implementation • 31 Aug 2017 • Shuohang Wang, Mo Yu, Xiaoxiao Guo, Zhiguo Wang, Tim Klinger, Wei zhang, Shiyu Chang, Gerald Tesauro, Bo-Wen Zhou, Jing Jiang
Second, we propose a novel method that jointly trains the Ranker along with an answer-generation Reader model, based on reinforcement learning.
Ranked #4 on Open-Domain Question Answering on Quasar
no code implementations • 26 Aug 2017 • Mo Yu, Xiaoxiao Guo, Jin-Feng Yi, Shiyu Chang, Saloni Potdar, Gerald Tesauro, Haoyu Wang, Bo-Wen Zhou
We propose a new method to measure task similarities with cross-task transfer performance matrix for the deep learning scenario.
4 code implementations • 2 Jun 2016 • Iulian Vlad Serban, Tim Klinger, Gerald Tesauro, Kartik Talamadupula, Bo-Wen Zhou, Yoshua Bengio, Aaron Courville
We introduce the multiresolution recurrent neural network, which extends the sequence-to-sequence framework to model natural language generation as two parallel discrete stochastic processes: a sequence of high-level coarse tokens, and a sequence of natural language tokens.
Ranked #1 on Dialogue Generation on Ubuntu Dialogue (Activity)
no code implementations • 24 May 2016 • Sarath Chandar, Sungjin Ahn, Hugo Larochelle, Pascal Vincent, Gerald Tesauro, Yoshua Bengio
In this paper, we explore a form of hierarchical memory network, which can be considered as a hybrid between hard and soft attention memory networks.
no code implementations • 31 Dec 2015 • Ashish Sabharwal, Horst Samulowitz, Gerald Tesauro
We study a novel machine learning (ML) problem setting of sequentially allocating small subsets of training data amongst a large set of classifiers.
no code implementations • 4 Feb 2014 • Gerald Tesauro, David C. Gondek, Jonathan Lenchner, James Fan, John M. Prager
After giving a detailed description of each of our game-strategy algorithms, we then focus in particular on validating the accuracy of the simulators predictions, and documenting performance improvements using our methods.