1 code implementation • 16 Apr 2021 • Narendra Chaudhary, Sanchit Misra, Dhiraj Kalamkar, Alexander Heinecke, Evangelos Georganas, Barukh Ziv, Menachem Adelman, Bharat Kaul
Finally, we demonstrate the performance of our optimized 1D convolution layer by utilizing it in the end-to-end neural network training with real genomics datasets and achieve up to 6. 86x speedup over the oneDNN library-based implementation on Cascade Lake CPUs.
3 code implementations • 12 Apr 2021 • Evangelos Georganas, Dhiraj Kalamkar, Sasikanth Avancha, Menachem Adelman, Deepti Aggarwal, Cristina Anderson, Alexander Breuer, Jeremy Bruestle, Narendra Chaudhary, Abhisek Kundu, Denise Kutnick, Frank Laub, Vasimuddin Md, Sanchit Misra, Ramanarayan Mohanty, Hans Pabst, Brian Retford, Barukh Ziv, Alexander Heinecke
The TPP specification is platform-agnostic, thus code expressed via TPPs is portable, whereas the TPP implementation is highly-optimized and platform-specific.
1 code implementation • NeurIPS 2021 • Menachem Adelman, Kfir Y. Levy, Ido Hakimi, Mark Silberstein
We propose a novel technique for faster deep neural network training which systematically applies sample-based approximation to the constituent tensor operations, i. e., matrix multiplications and convolutions.