Deep learning frameworks have often focused on either usability or speed, but not both.
On unsupervised machine translation, we obtain 34. 3 BLEU on WMT'16 German-English, improving the previous state of the art by more than 9 BLEU.
As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging.
#5 best model for Semantic Textual Similarity on MRPC
With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling.
We explore the use of Vector Quantized Variational AutoEncoder (VQ-VAE) models for large scale image generation.
We further confirm the flexibility of our model by showing a Levenshtein Transformer trained by machine translation can straightforwardly be used for automatic post-editing.
We demonstrate that scaling networks with CondConv improves the performance and inference cost trade-off of several existing convolutional neural network architectures on both classification and detection tasks.
#66 best model for Image Classification on ImageNet
The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms.
Scaling up deep neural network capacity has been known as an effective approach to improving model quality for several different machine learning tasks.
#2 best model for Image Classification on CIFAR-10 (using extra training data)