no code implementations • 18 Sep 2022 • Mohammadamin Abedi, Yanni Iouannou, Pooyan Jamshidi, Hadi Hemmati
The proposed solution is an automated online layer caching mechanism that allows early exiting of a large model during inference time if the cache model in one of the early exits is confident enough for final prediction.