Optimal checkpointing for heterogeneous chains: how to train deep neural networks with limited memory

27 Nov 2019Julien HerrmannOlivier BeaumontLionel Eyraud-DuboisJulien HermannAlexis JolyAlena Shilova

This paper introduces a new activation checkpointing method which allows to significantly decrease memory usage when training Deep Neural Networks with the back-propagation algorithm. Similarly to checkpoint-ing techniques coming from the literature on Automatic Differentiation, it consists in dynamically selecting the forward activations that are saved during the training phase, and then automatically recomputing missing activations from those previously recorded... (read more)

PDF Abstract


No code implementations yet. Submit your code now


Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.