There is a stark disparity between the learning rate schedules used in the practice of large scale machine learning and what are considered admissible learning rate schedules prescribed in the theory of stochastic approximation. Recent results, such as in the 'super-convergence' methods which use oscillating learning rates, serve to emphasize this point even more... (read more)
PDFMETHOD | TYPE | |
---|---|---|
![]() |
Generalized Linear Models |