Search Results for author: Less Wright

Accelerating a Triton Fused Kernel for W4A16 Quantized Inference with SplitK work decomposition

Our implementation shows improvement for the type of skinny matrix-matrix multiplications found in foundation model inference workloads.

Paper
Add Code

It is widely acknowledged that large models have the potential to deliver superior performance across a broad range of domains.

Paper
Add Code

As optimizers are critical to the performances of neural networks, every year a large number of papers innovating on the subject are published.

317

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.