no code implementations • 13 Oct 2020 • Pedram Zamirai, Jian Zhang, Christopher R. Aberger, Christopher De Sa
State-of-the-art generic low-precision training algorithms use a mix of 16-bit and 32-bit precision, creating the folklore that 16-bit hardware compute units alone are not enough to maximize model accuracy.
1 code implementation • 29 Feb 2020 • Megan Leszczynski, Avner May, Jian Zhang, Sen Wu, Christopher R. Aberger, Christopher Ré
To theoretically explain this tradeoff, we introduce a new measure of embedding instability---the eigenspace instability measure---which we prove bounds the disagreement in downstream predictions introduced by the change in word embeddings.
no code implementations • 9 Oct 2019 • Bowen Yang, Jian Zhang, Jonathan Li, Christopher Ré, Christopher R. Aberger, Christopher De Sa
Pipeline parallelism (PP) when training neural networks enables larger models to be partitioned spatially, leading to both lower network communication and overall higher hardware utilization.
no code implementations • 24 Apr 2019 • Nimit S. Sohoni, Christopher R. Aberger, Megan Leszczynski, Jian Zhang, Christopher Ré
In this paper we study a fundamental question: How much memory is actually needed to train a neural network?
1 code implementation • 9 Mar 2018 • Christopher De Sa, Megan Leszczynski, Jian Zhang, Alana Marzoev, Christopher R. Aberger, Kunle Olukotun, Christopher Ré
Low-precision computation is often used to lower the time and energy cost of machine learning, and recently hardware accelerators have been developed to support it.