no code implementations • 11 Feb 2018 • Skanda Koppula, Khe Chai Sim, Kean Chin
We demonstrate this method's usefulness in revealing information divergence in the bases of recurrent factorized kernels, visualizing the character-level differences between the memory of n-gram and recurrent language models, and extracting knowledge of history encoded in the layers of grapheme-based end-to-end ASR networks.