Learning the Positions in CountSketch

20 Jul 2020  ·  Simin Liu, Tianrui Liu, Ali Vakilian, Yulin Wan, David P. Woodruff ·

We consider sketching algorithms which first quickly compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem, e.g., low rank approximation. In the learning-based sketching paradigm proposed by Indyk et al. [2019], the sketch matrix is found by choosing a random sparse matrix, e.g., the CountSketch, and then updating the values of the non-zero entries by running gradient descent on a training data set. Despite the growing body of work on this paradigm, a noticeable omission is that the locations of the non-zero entries of previous algorithms were fixed, and only their values were learned. In this work we propose the first learning algorithm that also optimizes the locations of the non-zero entries. We show this algorithm gives better accuracy for low rank approximation than previous work, and apply it to other problems such as $k$-means clustering for the first time. We show that our algorithm is provably better in the spiked covariance model and for Zipfian matrices. We also show the importance of the sketch monotonicity property for combining learned sketches. Our empirical results show the importance of optimizing not only the values of the non-zero entries but also their positions.

PDF Abstract
No code implementations yet. Submit your code now



  Add Datasets introduced or used in this paper

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.


No methods listed for this paper. Add relevant methods here