no code implementations • 2 Oct 2023 • Praneeth Kacham, Vahab Mirrokni, Peilin Zhong
For context lengths of 32k and GPT-2 style models, our model achieves a 2. 5-4x speedup in training compared to FlashAttention, with no observed degradation in quality across our experiments.
no code implementations • 1 Dec 2022 • Ainesh Bakshi, Piotr Indyk, Praneeth Kacham, Sandeep Silwal, Samson Zhou
We build on the recent Kernel Density Estimation framework, which (after preprocessing in time subquadratic in $n$) can return estimates of row/column sums of the kernel matrix.
no code implementations • 13 Apr 2022 • Praneeth Kacham, David P. Woodruff
For example, to produce a $1+\varepsilon$ approximate solution in $1$ iteration, which requires $2$ passes over the input, our algorithm requires the OSNAP embedding to have $m= O(n\sigma^2/\lambda\varepsilon)$ rows with a sparsity parameter $s = O(\log(n))$, whereas the earlier algorithm of Chowdhury et al. with the same number of rows of OSNAP requires a sparsity $s = O(\sqrt{\sigma^2/\lambda\varepsilon} \cdot \log(n))$, where $\sigma = \opnorm{A}$ is the spectral norm of the matrix $A$.
no code implementations • 16 Jul 2021 • Nadiia Chepurko, Kenneth L. Clarkson, Praneeth Kacham, David P. Woodruff
This question is regarding the logarithmic factors in the sketching dimension of existing oblivious subspace embeddings that achieve constant-factor approximation.