LEARNING TO LEARN WITH BETTER CONVERGENCE

25 Sep 2019  ·  Patrick H. Chen, Sashank Reddi, Sanjiv Kumar, Cho-Jui Hsieh ·

We consider the learning to learn problem, where the goal is to leverage deeplearning models to automatically learn (iterative) optimization algorithms for training machine learning models. A natural way to tackle this problem is to replace the human-designed optimizer by an LSTM network and train the parameters on some simple optimization problems (Andrychowicz et al., 2016). Despite their success compared to traditional optimizers such as SGD on a short horizon, theselearnt (meta-) optimizers suffer from two key deficiencies: they fail to converge(or can even diverge) on a longer horizon (e.g., 10000 steps). They also often fail to generalize to new tasks. To address the convergence problem, we rethink the architecture design of the meta-optimizer and develop an embarrassingly simple,yet powerful form of meta-optimizers—a coordinate-wise RNN model. We provide insights into the problems with the previous designs of each component and re-design our SimpleOptimizer to resolve those issues. Furthermore, we propose anew mechanism to allow information sharing between coordinates which enables the meta-optimizer to exploit second-order information with negligible overhead.With these designs, our proposed SimpleOptimizer outperforms previous meta-optimizers and can successfully converge to optimal solutions in the long run.Furthermore, our empirical results show that these benefits can be obtained with much smaller models compared to the previous ones.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here