Search Results for author: Ganesh Bikshandi

Found 1 papers, 1 papers with code

A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on NVIDIA Hopper Architecture using the CUTLASS Library

1 code implementation • 19 Dec 2023 • Ganesh Bikshandi, Jay Shah

We provide an optimized implementation of the forward pass of FlashAttention-2, a popular memory-aware scaled dot-product attention algorithm, as a custom fused CUDA kernel targeting NVIDIA Hopper architecture and written using the open-source CUTLASS library.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.