The quality of fully automated text simplification systems is not good enough for use in real-world settings; instead, human simplifications are used.
To address these issues, we present reincarnating RL as an alternative workflow, where prior computational work (e. g., learned policies) is reused or transferred between design iterations of an RL agent, or from one RL agent to another.
This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later.
Ranked #2 on Atari Games 100k on Atari 100k
Specifically, we show that the temperature $\tau$ of the Softmax operation controls for the SEM representation's expressivity, allowing us to derive a tighter downstream classifier generalization bound than that for classifiers using unnormalized representations.
Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs.
Data efficiency is a key challenge for deep reinforcement learning.
Although neural module networks have an architectural bias towards compositionality, they require gold standard layouts to generalize systematically in practice.
Data efficiency poses a major challenge for deep reinforcement learning.
We further improve performance by adding data augmentation to the future prediction loss, which forces the agent's representations to be consistent across multiple views of an observation.
Ranked #3 on Atari Games 100k on Atari 100k
We advocate the use of a notion of entropy that reflects the relative abundances of the symbols in an alphabet, as well as the similarities between them.
no code implementations • 14 Oct 2018 • Max Schwarzer, Bryce Rogan, Yadong Ruan, Zhengming Song, Diana Y. Lee, Allon G. Percus, Viet T. Chau, Bryan A. Moore, Esteban Rougier, Hari S. Viswanathan, Gowri Srinivasan
Our methods use deep learning and train on simulation data from high-fidelity models, emulating the results of these models while avoiding the overwhelming computational demands associated with running a statistically significant sample of simulations.