Minimum weight norm models do not always generalize well for over-parameterized problems

16 Nov 2018Vatsal ShahAnastasios KyrillidisSujay Sanghavi

Stochastic gradient descent is the de facto algorithm for training deep neural networks (DNNs). Despite its popularity, it still requires fine tuning in order to achieve its best performance... (read more)

PDF Abstract


No code implementations yet. Submit your code now


Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper