Hierarchical reinforcement learning has focused on discovering temporally extended actions, such as options, that can provide benefits in problems requiring extensive exploration.
The present contribution deals with decentralized policy evaluation in multi-agent Markov decision processes using temporal-difference (TD) methods with linear function approximation for scalability.
This paper introduces an information-theoretic constraint on learned policy complexity in the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) reinforcement learning algorithm.
A fundamental challenge in multiagent reinforcement learning is to learn beneficial behaviors in a shared environment with other simultaneously learning agents.
We focus on the model-free reinforcement learning (RL) setting and formalize our approach in terms of an information-theoretic constraint on the complexity of learned policies.
Text-based games have emerged as an important test-bed for Reinforcement Learning (RL) research, requiring RL agents to combine grounded language understanding with sequential decision making.
Ranked #1 on Commonsense Reasoning for RL on commonsense-rl
no code implementations • 12 Jul 2020 • Keerthiram Murugesan, Mattia Atzeni, Pavan Kapanipathi, Pushkar Shukla, Sadhana Kumaravel, Gerald Tesauro, Kartik Talamadupula, Mrinmaya Sachan, Murray Campbell
We introduce a number of RL agents that combine the sequential context with a dynamic graph representation of their beliefs of the world and commonsense knowledge from ConceptNet in different ways.
Focused macros dramatically improve black-box planning efficiency across a wide range of planning domains, sometimes beating even state-of-the-art planners with access to a full domain model.
In this work we note that while this key assumption of the policy gradient theorems of option-critic holds in the tabular case, it is always violated in practice for the deep function approximation setting.
We formalize this type of bounded rationality in terms of an information-theoretic constraint on the complexity of policies that agents seek to learn.
The empirical results show that (1) the agents are able to leverage state expert sequences to learn faster than pure reinforcement learning baselines, (2) our tensor-based action inference model is advantageous compared to standard deep neural networks in inferring expert actions, and (3) the hybrid policy optimization objective is robust against noise in expert state sequences.
Collective learning can be greatly enhanced when agents effectively exchange knowledge with their peers.
In this work we propose a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples.
The problem of teaching to improve agent learning has been investigated by prior works, but these approaches make assumptions that prevent application of teaching to general multiagent problems, or require domain expertise for problems they can apply to.
We study few-shot learning in natural language domains.
Experiments on both simulated and real-world data show that 1) our proposed learning framework achieves better accuracy than other supervised and reinforcement learning baselines and 2) user feedback based on natural language rather than pre-specified attributes leads to more effective retrieval results, and a more natural and expressive communication interface.
In this paper, we consider a realistic and more difficult sce- nario where a reinforcement learning agent only has access to the state sequences of an expert, while the expert actions are not available.
Eigenoptions (EOs) have been recently introduced as a promising idea for generating a diverse set of options through the graph Laplacian, having been shown to allow efficient exploration.
We propose two methods, namely, strength-based re-ranking and coverage-based re-ranking, to make use of the aggregated evidence from different passages to better determine the answer.
Ranked #1 on Open-Domain Question Answering on Quasar
Options in reinforcement learning allow agents to hierarchically decompose a task into subtasks, having the potential to speed up learning and planning.
Second, we propose a novel method that jointly trains the Ranker along with an answer-generation Reader model, based on reinforcement learning.
Ranked #4 on Open-Domain Question Answering on Quasar
We propose a new method to measure task similarities with cross-task transfer performance matrix for the deep learning scenario.
We introduce the multiresolution recurrent neural network, which extends the sequence-to-sequence framework to model natural language generation as two parallel discrete stochastic processes: a sequence of high-level coarse tokens, and a sequence of natural language tokens.
Ranked #1 on Dialogue Generation on Ubuntu Dialogue (Activity)
In this paper, we explore a form of hierarchical memory network, which can be considered as a hybrid between hard and soft attention memory networks.
We study a novel machine learning (ML) problem setting of sequentially allocating small subsets of training data amongst a large set of classifiers.
After giving a detailed description of each of our game-strategy algorithms, we then focus in particular on validating the accuracy of the simulators predictions, and documenting performance improvements using our methods.