# Step Size Matters in Deep Learning

Training a neural network with the gradient descent algorithm gives rise to a discrete-time nonlinear dynamical system. Consequently, behaviors that are typically observed in these systems emerge during training, such as convergence to an orbit but not to a fixed point or dependence of convergence on the initialization... (read more)

PDF Abstract NeurIPS 2018 PDF NeurIPS 2018 Abstract