Learning in models with discrete latent variables is challenging due to high variance gradient estimators.
We prove that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size.
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration.
Generative Adversarial Networks (GANs) excel at creating realistic images with complex models for which maximum likelihood is infeasible.
#4 best model for Image Generation on LSUN Bedroom 256 x 256
Existing approaches to inference in DGP models assume approximate posteriors that force independence between the layers, and do not work well in practice.
We use the length of the activity vector to represent the probability that the entity exists and its orientation to represent the instantiation parameters.
SOTA for Image Classification on MultiMNIST