Search Results for author: Alexander Heinecke

Found 15 papers, 7 papers with code

Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures

no code implementations25 Apr 2023 Evangelos Georganas, Dhiraj Kalamkar, Kirill Voronin, Abhisek Kundu, Antonio Noack, Hans Pabst, Alexander Breuer, Alexander Heinecke

During the past decade, Deep Learning (DL) algorithms, programming systems and hardware have converged with the High Performance Computing (HPC) counterparts.

FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems

no code implementations22 Apr 2022 Rui Ma, Evangelos Georganas, Alexander Heinecke, Andrew Boutros, Eriko Nurvitadhi

The overhead of these collective communication operations in a distributed AI training system can bottleneck its performance, with more pronounced effects as the number of nodes increases.

Data Compression

Efficient and Generic 1D Dilated Convolution Layer for Deep Learning

1 code implementation16 Apr 2021 Narendra Chaudhary, Sanchit Misra, Dhiraj Kalamkar, Alexander Heinecke, Evangelos Georganas, Barukh Ziv, Menachem Adelman, Bharat Kaul

Finally, we demonstrate the performance of our optimized 1D convolution layer by utilizing it in the end-to-end neural network training with real genomics datasets and achieve up to 6. 86x speedup over the oneDNN library-based implementation on Cascade Lake CPUs.

Image Classification speech-recognition +1

DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks

no code implementations14 Apr 2021 Vasimuddin Md, Sanchit Misra, Guixiang Ma, Ramanarayan Mohanty, Evangelos Georganas, Alexander Heinecke, Dhiraj Kalamkar, Nesreen K. Ahmed, Sasikanth Avancha

Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible.

graph partitioning

PolyScientist: Automatic Loop Transformations Combined with Microkernels for Optimization of Deep Learning Primitives

no code implementations6 Feb 2020 Sanket Tavarageri, Alexander Heinecke, Sasikanth Avancha, Gagandeep Goyal, Ramakrishna Upadrasta, Bharat Kaul

In this paper, we develop a hybrid solution to the development of deep learning kernels that achieves the best of both worlds: the expert coded microkernels are utilized for the innermost loops of kernels and we use the advanced polyhedral technology to automatically tune the outer loops for performance.

Training Neural Machine Translation (NMT) Models using Tensor Train Decomposition on TensorFlow (T3F)

no code implementations5 Nov 2019 Amelia Drew, Alexander Heinecke

For the IWSLT English-Vietnamese training, we obtain BLEU test/dev scores of 24. 0/21. 9 and 24. 2/21. 9 using core dimensions $(2, 2, 256) \times (2, 2, 512)$ with learning rate 0. 0012 and rank distributions $(1, 4, 4, 1)$ and $(1, 4, 16, 1)$ respectively.

Machine Translation NMT +1

Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures

2 code implementations16 Aug 2018 Evangelos Georganas, Sasikanth Avancha, Kunal Banerjee, Dhiraj Kalamkar, Greg Henry, Hans Pabst, Alexander Heinecke

Convolution layers are prevalent in many classes of deep neural networks, including Convolutional Neural Networks (CNNs) which provide state-of-the-art results for tasks like image recognition, neural machine translation and speech recognition.

Distributed, Parallel, and Cluster Computing

Cannot find the paper you are looking for? You can Submit a new open access paper.