Previous work has examined the ability of larger capacity neural networks to generalize better than smaller ones, even without explicit regularizers, by analyzing gradient based algorithms such as GD and SGD. The presence of noise and its effect on robustness to parameter perturbations has been linked to generalization... (read more)
PDF AbstractMETHOD | TYPE | |
---|---|---|
![]() |
Stochastic Optimization |