Current language generation models suffer from issues such as repetition, incoherence, and hallucinations.
In this work, we propose the Simultaneous Learning of Adjacency and GNN Parameters with Self-supervision, or SLAPS, a method that provides more supervision for inferring a graph structure through self-supervision.
Ranked #1 on Graph structure learning on Cora
Further, to encourage better model planning during the decoding process, we incorporate K-step ahead token prediction objective that computes both MLE and UL losses on future tokens as well.
We show through extensive experiments and analysis that, when trained with policy gradient, recurrent neural networks often fail to learn a state representation that leads to an optimal policy in settings where the same action should be taken at different states.
Understanding audio-visual content and the ability to have an informative conversation about it have both been challenging areas for intelligent systems.
To solve a text-based game, an agent needs to formulate valid text commands for a given context and find the ones that lead to success.
Conditional text-to-image generation is an active area of research, with many possible applications.
Ranked #2 on Text-to-Image Generation on GeNeVA (i-CLEVR)
1 code implementation • 29 Jun 2018 • Marc-Alexandre Côté, Ákos Kádár, Xingdi Yuan, Ben Kybartas, Tavian Barnes, Emery Fine, James Moore, Ruo Yu Tao, Matthew Hausknecht, Layla El Asri, Mahmoud Adada, Wendy Tay, Adam Trischler
We introduce TextWorld, a sandbox learning environment for the training and evaluation of RL agents on text-based games.
However, previous work in dialogue response generation has shown that these metrics do not correlate strongly with human judgment in the non task-oriented dialogue setting.
We developed this dataset to study the role of memory in goal-oriented dialogue systems.
The model takes as input a sequence of dialogue contexts and outputs a sequence of dialogue acts corresponding to user intentions.
Indeed, with only a few hundred dialogues collected with a handcrafted policy, the actor-critic deep learner is considerably bootstrapped from a combination of supervised and batch RL.