Dual-module Inference for Efficient Recurrent Neural Networks

ICLR 2020 Anonymous

Using Recurrent Neural Networks (RNNs) in sequence modeling tasks is promising in delivering high-quality results but challenging to meet stringent latency requirements because of the memory-bound execution pattern of RNNs. We propose a big-little dual-module inference to dynamically skip unnecessary memory access and computation to speedup RNN inference... (read more)

PDF Abstract


No code implementations yet. Submit your code now


Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.