no code implementations • 8 Feb 2024 • Amir Zandieh, Insu Han, Vahab Mirrokni, Amin Karbasi
In this work, our focus is on developing an efficient compression technique for the KV cache.
1 code implementation • 9 Oct 2023 • Insu Han, Rajesh Jayaram, Amin Karbasi, Vahab Mirrokni, David P. Woodruff, Amir Zandieh
Recent work suggests that in the worst-case scenario, quadratic time is necessary unless the entries of the attention matrix are bounded or the matrix has low stable rank.
1 code implementation • 5 Feb 2023 • Amir Zandieh, Insu Han, Majid Daliri, Amin Karbasi
Dot-product attention mechanism plays a crucial role in modern deep architectures (e. g., Transformer) for sequence modeling, however, na\"ive exact computation of this model incurs quadratic time and memory complexities in sequence length, hindering the training of long-sequence models.
2 code implementations • 9 Sep 2022 • Insu Han, Amir Zandieh, Jaehoon Lee, Roman Novak, Lechao Xiao, Amin Karbasi
Moreover, most prior works on neural kernels have focused on the ReLU activation, mainly due to its popularity but also due to the difficulty of computing such kernels for general activations.
1 code implementation • 1 Jul 2022 • Insu Han, Mike Gartrell, Elvis Dohmatob, Amin Karbasi
In this work, we develop a scalable MCMC sampling algorithm for $k$-NDPPs with low-rank kernels, thus enabling runtime that is sublinear in $n$.
no code implementations • 7 Feb 2022 • Insu Han, Amir Zandieh, Haim Avron
Our proposed GZK family, generalizes the zonal kernels (i. e., dot-product kernels on the unit sphere) by introducing radial factors in their Gegenbauer series expansion, and includes a wide range of ubiquitous kernel functions such as the entirety of dot-product kernels as well as the Gaussian and the recently introduced Neural Tangent kernels.
2 code implementations • ICLR 2022 • Insu Han, Mike Gartrell, Jennifer Gillenwater, Elvis Dohmatob, Amin Karbasi
However, existing work leaves open the question of scalable NDPP sampling.
1 code implementation • NeurIPS 2021 • Amir Zandieh, Insu Han, Haim Avron, Neta Shoham, Chaewon Kim, Jinwoo Shin
To accelerate learning with NTK, we design a near input-sparsity time approximation algorithm for NTK, by sketching the polynomial expansions of arc-cosine kernels: our sketch for the convolutional counterpart of NTK (CNTK) can transform any image using a linear runtime in the number of pixels.
no code implementations • 3 Apr 2021 • Insu Han, Haim Avron, Neta Shoham, Chaewon Kim, Jinwoo Shin
We combine random features of the arc-cosine kernels with a sketching-based algorithm which can run in linear with respect to both the number of data points and input dimension.
2 code implementations • ICLR 2021 • Mike Gartrell, Insu Han, Elvis Dohmatob, Jennifer Gillenwater, Victor-Emmanuel Brunel
Determinantal point processes (DPPs) have attracted significant attention in machine learning for their ability to model subsets drawn from a large item collection.
1 code implementation • ICML 2020 • Insu Han, Haim Avron, Jinwoo Shin
This paper studies how to sketch element-wise functions of low-rank matrices.
1 code implementation • NeurIPS 2018 • Insu Han, Haim Avron, Jinwoo Shin
A large class of machine learning techniques requires the solution of optimization problems involving spectral functions of parametric matrices, e. g. log-determinant and nuclear norm.
1 code implementation • ICML 2017 • Insu Han, Prabhanjan Kambadur, KyoungSoo Park, Jinwoo Shin
Determinantal point processes (DPPs) are popular probabilistic models that arise in many machine learning tasks, where distributions of diverse sets are characterized by matrix determinants.
1 code implementation • 3 Jun 2016 • Insu Han, Dmitry Malioutov, Haim Avron, Jinwoo Shin
Computation of the trace of a matrix function plays an important role in many scientific computing applications, including applications in machine learning, computational physics (e. g., lattice quantum chromodynamics), network analysis and computational biology (e. g., protein folding), just to name a few application areas.
Data Structures and Algorithms
1 code implementation • 22 Mar 2015 • Insu Han, Dmitry Malioutov, Jinwoo Shin
Logarithms of determinants of large positive definite matrices appear ubiquitously in machine learning applications including Gaussian graphical and Gaussian process models, partition functions of discrete graphical models, minimum-volume ellipsoids, metric learning and kernel learning.