The essence of the trick is to refactor each stochastic node into a differentiable function of its parameters and a random variable with fixed distribution.
There has been much recent work on training neural attention models at the sequence-level using either reinforcement learning-style methods or by optimizing the beam.
#6 best model for Machine Translation on IWSLT2015 German-English
We further propose to distill the structured knowledge from large networks to small networks, which is motivated by that semantic segmentation is a structured prediction problem.
We present Memory Augmented Policy Optimization (MAPO), a simple and novel way to leverage a memory buffer of promising trajectories to reduce the variance of policy gradient estimate.
Harnessing the statistical power of neural networks to perform language understanding and symbolic reasoning is difficult, when it requires executing efficient discrete operations against a large knowledge-base.
Today when many practitioners run basic NLP on the entire web and large-volume traffic, faster methods are paramount to saving time and energy costs.
#8 best model for Named Entity Recognition on Ontonotes v5 (English)
We introduce a new framework for learning dense correspondence between deformable 3D shapes.