The essence of the trick is to refactor each stochastic node into a differentiable function of its parameters and a random variable with fixed distribution.
There has been much recent work on training neural attention models at the sequence-level using either reinforcement learning-style methods or by optimizing the beam.
#5 best model for Machine Translation on IWSLT2015 German-English
We present Memory Augmented Policy Optimization (MAPO), a simple and novel way to leverage a memory buffer of promising trajectories to reduce the variance of policy gradient estimate.
Harnessing the statistical power of neural networks to perform language understanding and symbolic reasoning is difficult, when it requires executing efficient discrete operations against a large knowledge-base.
Today when many practitioners run basic NLP on the entire web and large-volume traffic, faster methods are paramount to saving time and energy costs.
#4 best model for Named Entity Recognition (NER) on Ontonotes v5 (English)
Over the past decades, numerous loss functions have been been proposed for a variety of supervised learning tasks, including regression, classification, ranking, and more generally structured prediction.