Search Results for author: Ganesh Bikshandi

Found 1 papers, 1 papers with code

A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on NVIDIA Hopper Architecture using the CUTLASS Library

1 code implementation19 Dec 2023 Ganesh Bikshandi, Jay Shah

We provide an optimized implementation of the forward pass of FlashAttention-2, a popular memory-aware scaled dot-product attention algorithm, as a custom fused CUDA kernel targeting NVIDIA Hopper architecture and written using the open-source CUTLASS library.

Cannot find the paper you are looking for? You can Submit a new open access paper.