The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks.
We present a neural encoder-decoder model to convert images into presentational markup based on a scalable coarse-to-fine attention mechanism.
We introduce a novel theoretical analysis of recurrent networks based on Gersgorin's circle theorem that illuminates several modeling and optimization issues and improves our understanding of the LSTM cell.
#6 best model for Language Modelling on Text8
As a case study, we implement a new system, Certigrad, for optimizing over stochastic computation graphs, and we generate a formal (i.e. machine-checkable) proof that the gradients sampled by the system are unbiased estimates of the true mathematical gradients.
This paper introduces DeepBach, a graphical model aimed at modeling polyphonic music and specifically hymn-like pieces.