Paper

A Transistor Operations Model for Deep Learning Energy Consumption Scaling Law

Deep Learning (DL) has transformed the automation of a wide range of industries and finds increasing ubiquity in society. The high complexity of DL models and its widespread adoption has led to global energy consumption doubling every 3-4 months. Currently, the relationship between the DL model configuration and energy consumption is not well established. At a general computational energy model level, there is both strong dependency to both the hardware architecture (e.g. generic processors with different configuration of inner components- CPU and GPU, programmable integrated circuits - FPGA), as well as different interacting energy consumption aspects (e.g., data movement, calculation, control). At the DL model level, we need to translate non-linear activation functions and its interaction with data into calculation tasks. Current methods mainly linearize nonlinear DL models to approximate its theoretical FLOPs and MACs as a proxy for energy consumption. Yet, this is inaccurate (est. 93\% accuracy) due to the highly nonlinear nature of many convolutional neural networks (CNNs) for example. In this paper, we develop a bottom-level Transistor Operations (TOs) method to expose the role of non-linear activation functions and neural network structure in energy consumption. We translate a range of feedforward and CNN models into ALU calculation tasks and then TO steps. This is then statistically linked to real energy consumption values via a regression model for different hardware configurations and data sets. We show that our proposed TOs method can achieve a 93.61% - 99.51% precision in predicting its energy consumption.

Results in Papers With Code
(↓ scroll down to see all results)