Recent advances in neural architecture search (NAS) demand tremendous computational resources, which makes it difficult to reproduce experiments and imposes a barrier-to-entry to researchers without access to large-scale computation.
LEGW enables Sqrt Scaling scheme to be useful in practice and as a result we achieve much better results than the Linear Scaling learning rate scheme.
Deep learning is extremely computationally intensive, and hardware vendors have responded by building faster accelerators in large clusters.
Current convolutional neural networks algorithms for video object tracking spend the same amount of computation for each object and video frame.
We can further reduce the number of parameter updates by increasing the learning rate $\epsilon$ and scaling the batch size $B \propto \epsilon$.