The essence of the trick is to refactor each stochastic node into a differentiable function of its parameters and a random variable with fixed distribution.
There has been much recent work on training neural attention models at the sequence-level using either reinforcement learning-style methods or by optimizing the beam.
#6 best model for Machine Translation on IWSLT2015 German-English
We present Memory Augmented Policy Optimization (MAPO), a simple and novel way to leverage a memory buffer of promising trajectories to reduce the variance of policy gradient estimate.
Harnessing the statistical power of neural networks to perform language understanding and symbolic reasoning is difficult, when it requires executing efficient discrete operations against a large knowledge-base.
Additionally, we collected Online Products dataset: 120k images of 23k classes of online products for metric learning.
We introduce a new framework for learning dense correspondence between deformable 3D shapes.
Today when many practitioners run basic NLP on the entire web and large-volume traffic, faster methods are paramount to saving time and energy costs.
#12 best model for Named Entity Recognition on Ontonotes v5 (English)