We rigorously evaluate three state-of-the-art techniques for inducing sparsity in deep neural networks on two large-scale learning tasks: Transformer trained on WMT 2014 English-to-German, and ResNet-50 trained on ImageNet.
In this paper, we propose such a measure, and conduct extensive empirical studies on how well it can predict the generalization gap.
In reinforcement learning (RL) research, it is common to assume access to direct online interactions with the environment.
Modern machine learning methods including deep learning have achieved great success in predictive accuracy for supervised learning tasks, but may still fall short in giving useful estimates of their predictive {\em uncertainty}.
We give a new analysis of this sketch, providing nearly optimal bounds.
Variance reduction has emerged in recent years as a strong competitor to stochastic gradient descent in non-convex problems, providing the first algorithms to improve upon the converge rate of stochastic gradient descent for finding first-order critical points.
Although deep convolutional neural networks achieve state-of-the-art performance across nearly all image classification tasks, their decisions are difficult to interpret.
Using it to provide perturbations for semi-supervised consistency regularization, we achieve a state-of-the-art result on ImageNet with 10% labeled data, with a top-5 error of 8. 76% and top-1 error of 26. 06%.
The effectiveness of machine learning algorithms arises from being able to extract useful features from large amounts of data.
Back-translation is an effective strategy to improve the performance of Neural Machine Translation~(NMT) by generating pseudo-parallel data.