no code implementations • 1 Jan 2021 • Hyunmin Jeong, Deming Chen
This is the first work that considers using a highly compressed DNN along with the original DNN in parallel to improve latency significantly while effectively maintaining the original model accuracy.