We show that this form of adversarial training converges to a degenerate global minimum, wherein small curvature artifacts near the data points obfuscate a linear approximation of the loss.
At the same time, advances in approximate Bayesian methods have made posterior approximation for flexible neural network models practical.
SOTA for Multi-Armed Bandits on Mushroom
We propose to improve the representation in sequence models by augmenting current approaches with an autoencoder that is forced to compress the sequence through an intermediate discrete latent space.
In this work, we study how depthwise separable convolutions can be applied to neural machine translation.
#10 best model for Machine Translation on WMT2014 English-German
By learning to reconstruct in both languages from this shared feature space, the model effectively learns to translate without using any labeled data.
#5 best model for Machine Translation on WMT2016 German-English