no code implementations • 3 Mar 2023 • David P. Woodruff, Fred Zhang, Samson Zhou
In the online learning with experts problem, an algorithm must make a prediction about an outcome on each of $T$ days (or times), given a set of $n$ experts who make predictions on each day (or time).
no code implementations • 11 Feb 2023 • Itai Dinur, Uri Stemmer, David P. Woodruff, Samson Zhou
We study the space complexity of the two related fields of differential privacy and adaptive data analysis.
no code implementations • 30 Sep 2022 • David P. Woodruff, Fred Zhang, Qiuyi Zhang
Specifically, for any $m$ matrices $A_1,..., A_m$ with consecutive differences bounded in Schatten-$1$ norm by $\alpha$, we provide a novel binary tree summation procedure that simultaneously estimates all $m$ traces up to $\epsilon$ error with $\delta$ failure probability with an optimal query complexity of $\widetilde{O}\left(m \alpha\sqrt{\log(1/\delta)}/\epsilon + m\log(1/\delta)\right)$, improving the dependence on both $\alpha$ and $\delta$ from Dharangutte and Musco (NeurIPS, 2021).
no code implementations • 17 Jul 2022 • David P. Woodruff, Taisuke Yasuda
Towards our result, we give the first analysis of "one-shot'' Lewis weight sampling of sampling rows proportionally to their Lewis weights, with sample complexity $\tilde O(d^{p/2}/\epsilon^2)$ for $p>2$.
no code implementations • 16 Jul 2022 • Sepideh Mahabadi, David P. Woodruff, Samson Zhou
In this paper, we introduce an algorithm that approximately samples $T$ gradients of dimension $d$ from nearly the optimal importance sampling distribution for a robust regression problem over $n$ rows.
no code implementations • 15 Jul 2022 • Arvind V. Mahankali, David P. Woodruff, Ziyu Zhang
Our key technique is a method for obtaining subspace embeddings with a number of rows polynomial in $q$ for a matrix which is the flattening of a tensor train of $q$ tensors.
no code implementations • 26 Jun 2022 • Alexander Munteanu, Simon Omlor, Zhao Song, David P. Woodruff
A common method in training neural networks is to initialize all the weights to be independent Gaussian vectors.
no code implementations • 21 Apr 2022 • Vaidehi Srinivas, David P. Woodruff, Ziyu Xu, Samson Zhou
We initiate the study of the learning with expert advice problem in the streaming setting, and show lower and upper bounds.
no code implementations • 13 Apr 2022 • Praneeth Kacham, David P. Woodruff
For example, to produce a $1+\varepsilon$ approximate solution in $1$ iteration, which requires $2$ passes over the input, our algorithm requires the OSNAP embedding to have $m= O(n\sigma^2/\lambda\varepsilon)$ rows with a sparsity parameter $s = O(\log(n))$, whereas the earlier algorithm of Chowdhury et al. with the same number of rows of OSNAP requires a sparsity $s = O(\sqrt{\sigma^2/\lambda\varepsilon} \cdot \log(n))$, where $\sigma = \opnorm{A}$ is the spectral norm of the matrix $A$.
no code implementations • ICLR 2022 • Justin Y. Chen, Talya Eden, Piotr Indyk, Honghao Lin, Shyam Narayanan, Ronitt Rubinfeld, Sandeep Silwal, Tal Wagner, David P. Woodruff, Michael Zhang
We propose data-driven one-pass streaming algorithms for estimating the number of triangles and four cycles, two fundamental problems in graph analytics that are widely studied in the graph data stream literature.
no code implementations • 10 Feb 2022 • Ainesh Bakshi, Kenneth L. Clarkson, David P. Woodruff
For the special cases of $p=2$ (Frobenius norm) and $p = \infty$ (Spectral norm), Musco and Musco (NeurIPS 2015) obtained an algorithm based on Krylov methods that uses $\tilde{O}(k/\sqrt{\epsilon})$ matrix-vector products, improving on the na\"ive $\tilde{O}(k/\epsilon)$ dependence obtainable by the power method, where $\tilde{O}$ suppresses poly$(\log(dk/\epsilon))$ factors.
no code implementations • 9 Feb 2022 • David P. Woodruff, Amir Zandieh
We propose an input sparsity time sampling algorithm that can spectrally approximate the Gram matrix corresponding to the $q$-fold column-wise tensor product of $q$ matrices using a nearly optimal number of samples, improving upon all previously known methods by poly$(q)$ factors.
no code implementations • 9 Nov 2021 • Cameron Musco, Christopher Musco, David P. Woodruff, Taisuke Yasuda
By combining this with our techniques for $\ell_p$ regression, we obtain an active regression algorithm making $\tilde O(d^{1+\max\{1, p/2\}}/\mathrm{poly}(\epsilon))$ queries for such loss functions, including the Tukey and Huber losses, answering another question of [CD21].
no code implementations • ICLR 2022 • Jon C. Ergun, Zhili Feng, Sandeep Silwal, David P. Woodruff, Samson Zhou
$k$-means clustering is a well-studied problem due to its wide applicability.
no code implementations • 21 Aug 2021 • Zhao Song, David P. Woodruff, Zheng Yu, Lichen Zhang
Recent techniques in oblivious sketching reduce the dependence in the running time on the degree $q$ of the polynomial kernel from exponential to polynomial, which is useful for the Gaussian kernel, for which $q$ can be chosen to be polylogarithmic.
no code implementations • 16 Jul 2021 • Yifei Jiang, Yi Li, Yiming Sun, Jiaxin Wang, David P. Woodruff
A natural way to do this would be to simply apply $f$ to each entry of $A$, and then compute the matrix decomposition, but this requires storing all of $A$ as well as multiple passes over its entries.
no code implementations • 16 Jul 2021 • Nadiia Chepurko, Kenneth L. Clarkson, Praneeth Kacham, David P. Woodruff
This question is regarding the logarithmic factors in the sketching dimension of existing oblivious subspace embeddings that achieve constant-factor approximation.
no code implementations • 3 Jul 2021 • Cyrus Rashtchian, David P. Woodruff, Peng Ye, Hanlin Zhu
Our motivation is to understand the statistical-computational trade-offs in streaming, sketching, and query-based models.
no code implementations • 16 Jun 2021 • Zhili Feng, Fred Roosta, David P. Woodruff
In this paper, we present novel dimensionality reduction methods for non-PSD matrices, as well as their ``square-roots", which involve matrices with complex entries.
no code implementations • 17 May 2021 • Ainesh Bakshi, Chiranjib Bhattacharyya, Ravi Kannan, David P. Woodruff, Samson Zhou
We consider the problem of learning a latent $k$-vertex simplex $K\subset\mathbb{R}^d$, given access to $A\in\mathbb{R}^{d\times n}$, which can be viewed as a data matrix with $n$ points that are obtained by randomly perturbing latent points in the simplex $K$ (potentially beyond $K$).
no code implementations • 24 Feb 2021 • Yi Li, Honghao Lin, David P. Woodruff
We show how to design learned sketches for the Hessian in the context of second order methods.
1 code implementation • NeurIPS 2020 • Quang Minh Hoang, Trong Nghia Hoang, Hai Pham, David P. Woodruff
We introduce a new scalable approximation for Gaussian processes with provable guarantees which hold simultaneously over its entire parameter space.
no code implementations • 9 Nov 2020 • Nadiia Chepurko, Kenneth L. Clarkson, Lior Horesh, Honghao Lin, David P. Woodruff
We create classical (non-quantum) dynamic data structures supporting queries for recommender systems and least-squares regression that are comparable to their quantum analogues.
1 code implementation • 19 Oct 2020 • Raphael A. Meyer, Cameron Musco, Christopher Musco, David P. Woodruff
This improves on the ubiquitous Hutchinson's estimator, which requires $O(1/\epsilon^2)$ matrix-vector products.
no code implementations • 20 Jul 2020 • Simin Liu, Tianrui Liu, Ali Vakilian, Yulin Wan, David P. Woodruff
Despite the growing body of work on this paradigm, a noticeable omission is that the locations of the non-zero entries of previous algorithms were fixed, and only their values were learned.
no code implementations • 20 Jul 2020 • Arvind V. Mahankali, David P. Woodruff
We give the first polynomial time column subset selection-based $\ell_1$ low rank approximation algorithm sampling $\tilde{O}(k)$ columns and achieving an $\tilde{O}(k^{1/2})$-approximation for any $k$, improving upon the previous best $\tilde{O}(k)$-approximation and matching a prior lower bound for column subset selection-based $\ell_1$-low rank approximation which holds for any $\text{poly}(k)$ number of columns.
no code implementations • NeurIPS 2020 • Edith Cohen, Rasmus Pagh, David P. Woodruff
We design novel composable sketches for WOR $\ell_p$ sampling, weighted sampling of keys according to a power $p\in[0, 2]$ of their frequency (or for signed data, sum of updates).
no code implementations • 7 Jul 2020 • Alexandr Andoni, Collin Burns, Yi Li, Sepideh Mahabadi, David P. Woodruff
We show that, for both problems, for dimensions $d=1, 2$, one can obtain streaming algorithms with space polynomially smaller than $\frac{1}{\lambda\epsilon}$, which is the complexity of SGD for strongly convex functions like the bias-regularized SVM, and which is known to be tight in general, even for $d=1$.
no code implementations • 24 Jun 2020 • Cyrus Rashtchian, David P. Woodruff, Hanlin Zhu
We consider the general problem of learning about a matrix through vector-matrix-vector queries.
no code implementations • 23 Jun 2020 • Agniva Chowdhury, Petros Drineas, David P. Woodruff, Samson Zhou
To improve the interpretability of PCA, various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis (SPCA).
no code implementations • ICLR 2020 • Tanqiu Jiang, Yi Li, Honghao Lin, Yisong Ruan, David P. Woodruff
For estimating the $p$-th frequency moment for $0 < p < 2$ we obtain the first algorithms with optimal update time.
no code implementations • 23 Apr 2020 • Sepideh Mahabadi, Ilya Razenshteyn, David P. Woodruff, Samson Zhou
Adaptive sampling is a useful algorithmic tool for data summarization problems in the classical centralized setting, where the entire dataset is available to the single processor performing the computation.
no code implementations • 16 Apr 2020 • Zhao Song, David P. Woodruff, Peilin Zhong
entries drawn from any distribution $\mu$ for which the $(1+\gamma)$-th moment exists, for an arbitrarily small constant $\gamma > 0$, then it is possible to obtain a $(1+\epsilon)$-approximate column subset selection to the entrywise $\ell_1$-norm in nearly linear time.
no code implementations • 6 Mar 2020 • Cyrus Rashtchian, Aneesh Sharma, David P. Woodruff
Theoretically, we show that LSF-Join efficiently finds most close pairs, even for small similarity thresholds and for skewed input sets.
no code implementations • 12 Dec 2019 • Xiaofei Shi, David P. Woodruff
For example, in the overconstrained $(1+\epsilon)$-approximate polynomial interpolation problem, $A$ is a Vandermonde matrix and $T(A) = O(n \log n)$; in this case our running time is $n \cdot \poly(\log n) + \poly(d/\epsilon)$ and we recover the results of \cite{avron2013sketching} as a special case.
no code implementations • 9 Dec 2019 • Ainesh Bakshi, Nadiia Chepurko, David P. Woodruff
Our main result is to resolve this question by obtaining an optimal algorithm that queries $O(nk/\epsilon)$ entries of $A$ and outputs a relative-error low-rank approximation in $O(n(k/\epsilon)^{\omega-1})$ time.
no code implementations • NeurIPS 2019 • Huaian Diao, Rajesh Jayaram, Zhao Song, Wen Sun, David P. Woodruff
For input $\mathcal{A}$ as above, we give $O(\sum_{i=1}^q \text{nnz}(A_i))$ time algorithms, which is much faster than computing $\mathcal{A}$.
1 code implementation • NeurIPS 2019 • Huaian Diao, Zhao Song, David P. Woodruff, Xin Yang
In the total least squares problem, one is given an $m \times n$ matrix $A$, and an $m \times d$ matrix $B$, and one seeks to "correct" both $A$ and $B$, obtaining matrices $\hat{A}$ and $\hat{B}$, so that there exists an $X$ satisfying the equation $\hat{A}X = \hat{B}$.
no code implementations • 13 Jun 2019 • Santosh S. Vempala, Ruosong Wang, David P. Woodruff
We first resolve the randomized and deterministic communication complexity in the point-to-point model of communication, showing it is $\tilde{\Theta}(d^2L + sd)$ and $\tilde{\Theta}(sd^2L)$, respectively.
no code implementations • 15 May 2019 • Manuel Fernandez, David P. Woodruff, Taisuke Yasuda
We present tight lower bounds on the number of kernel evaluations required to approximately solve kernel ridge regression (KRR) and kernel $k$-means clustering (KKMC) on $n$ input points.
no code implementations • 14 May 2019 • Kenneth L. Clarkson, Ruosong Wang, David P. Woodruff
We give the first dimensionality reduction methods for the overconstrained Tukey regression problem.
no code implementations • 22 Apr 2019 • Cameron Musco, Christopher Musco, David P. Woodruff
In particular, for rank $k' > k$ depending on the $public\ coin\ partition\ number$ of $W$, the heuristic outputs rank-$k'$ $L$ with cost$(L) \leq OPT + \epsilon \|A\|_F^2$.
no code implementations • 5 Nov 2018 • Ainesh Bakshi, Rajesh Jayaram, David P. Woodruff
Given $n$ samples as a matrix $\mathbf{X} \in \mathbb{R}^{d \times n}$ and the (possibly noisy) labels $\mathbf{U}^* f(\mathbf{V}^* \mathbf{X}) + \mathbf{E}$ of the network on these samples, where $\mathbf{E}$ is a noise matrix, our goal is to recover the weight matrices $\mathbf{U}^*$ and $\mathbf{V}^*$.
1 code implementation • NeurIPS 2019 • Zhao Song, David P. Woodruff, Peilin Zhong
Our approximation algorithms handle functions which are not even scale-invariant, such as the Huber loss function, which we show have very different structural properties than $\ell_p$-norms, e. g., one can show the lack of scale-invariance causes any column subset selection algorithm to provably require a $\sqrt{\log n}$ factor larger number of columns than $\ell_p$-norms; nevertheless we design the first efficient column subset selection algorithms for such error measures.
no code implementations • 18 Oct 2018 • Maria-Florina Balcan, Yi Li, David P. Woodruff, Hongyang Zhang
This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix.
no code implementations • 16 Jul 2018 • Frank Ban, Vijay Bhattiprolu, Karl Bringmann, Pavel Kolev, Euiwoong Lee, David P. Woodruff
On the algorithmic side, for $p \in (0, 2)$, we give the first $(1+\epsilon)$-approximation algorithm running in time $n^{\text{poly}(k/\epsilon)}$.
no code implementations • NeurIPS 2018 • Alexander Munteanu, Chris Schwiegelshohn, Christian Sohler, David P. Woodruff
For data sets with bounded $\mu(X)$-complexity, we show that a novel sensitivity sampling scheme produces the first provably sublinear $(1\pm\varepsilon)$-coreset.
no code implementations • 27 Dec 2017 • Huaian Diao, Zhao Song, Wen Sun, David P. Woodruff
That is, TensorSketch only provides input sparsity time for Kronecker product regression with respect to the $2$-norm.
no code implementations • NeurIPS 2017 • Cameron Musco, David P. Woodruff
Low-rank approximation is a common tool used to accelerate kernel methods: the $n \times n$ kernel matrix $K$ is approximated via a rank-$k$ matrix $\tilde K$ which can be stored in much less space and processed more quickly.
no code implementations • 30 Oct 2017 • Karl Bringmann, Pavel Kolev, David P. Woodruff
For small $\psi$, our approximation factor is $1+o(1)$.
no code implementations • NeurIPS 2017 • Jarvis Haupt, Xingguo Li, David P. Woodruff
We study the least squares regression problem \begin{align*} \min_{\Theta \in \mathcal{S}_{\odot D, R}} \|A\Theta-b\|_2, \end{align*} where $\mathcal{S}_{\odot D, R}$ is the set of $\Theta$ for which $\Theta = \sum_{r=1}^{R} \theta_1^{(r)} \circ \cdots \circ \theta_D^{(r)}$ for vectors $\theta_d^{(r)} \in \mathbb{R}^{p_d}$ for all $r \in [R]$ and $d \in [D]$, and $\circ$ denotes the outer product of vectors.
no code implementations • ICML 2017 • Flavio Chierichetti, Sreenivas Gollapudi, Ravi Kumar, Silvio Lattanzi, Rina Panigrahy, David P. Woodruff
We consider the problem of approximating a given matrix by a low-rank matrix so as to minimize the entrywise $\ell_p$-approximation error, for any $p \geq 1$; the case $p = 2$ is the classical SVD problem.
no code implementations • 30 May 2017 • Eric Price, Zhao Song, David P. Woodruff
Our main result is that, when $S$ is the subsampled randomized Fourier/Hadamard transform, the error $x' - x^*$ behaves as if it lies in a "random" direction within this bound: for any fixed direction $a\in \mathbb{R}^d$, we have with $1 - d^{-c}$ probability that \[ \langle a, x'-x^*\rangle \lesssim \frac{\|a\|_2\|x'-x^*\|_2}{d^{\frac{1}{2}-\gamma}}, \quad (1) \] where $c, \gamma > 0$ are arbitrary constants.
no code implementations • 27 Apr 2017 • Maria-Florina Balcan, YIngyu Liang, David P. Woodruff, Hongyang Zhang
This work studies the strong duality of non-convex matrix factorization problems: we show that under certain dual conditions, these problems and its dual have the same optimum.
no code implementations • 26 Apr 2017 • Zhao Song, David P. Woodruff, Peilin Zhong
Despite the success on obtaining relative error low rank approximations for matrices, no such results were known for tensors.
no code implementations • 13 Apr 2017 • Cameron Musco, Praneeth Netrapalli, Aaron Sidford, Shashanka Ubaru, David P. Woodruff
We thus effectively compute a histogram of the spectrum, which can stand in for the true singular values in many applications.
no code implementations • 11 Apr 2017 • Cameron Musco, David P. Woodruff
We show how to compute a relative-error low-rank approximation to any positive semidefinite (PSD) matrix in sublinear time, i. e., for any $n \times n$ PSD matrix $A$, in $\tilde O(n \cdot poly(k/\epsilon))$ time we output a rank-$k$ matrix $B$, in factored form, for which $\|A-B\|_F^2 \leq (1+\epsilon)\|A-A_k\|_F^2$, where $A_k$ is the best rank-$k$ approximation to $A$.
no code implementations • NeurIPS 2016 • Jiecao Chen, He Sun, David P. Woodruff, Qin Zhang
We would like the quality of the clustering in the distributed setting to match that in the centralized setting for which all the data resides on a single site.
1 code implementation • 10 Nov 2016 • Haim Avron, Kenneth L. Clarkson, David P. Woodruff
The preconditioner is based on random feature maps, such as random Fourier features, which have recently emerged as a powerful technique for speeding up and scaling the training of kernel-based methods, such as kernel ridge regression, by resorting to approximations.
no code implementations • 10 Nov 2016 • Haim Avron, Kenneth L. Clarkson, David P. Woodruff
We study regularization both in a fairly broad setting, and in the specific context of the popular and widely used technique of ridge regularization; for the latter, as applied to each of these problems, we show algorithmic resource bounds in which the {\em statistical dimension} appears in places where in previous bounds the rank would appear.
no code implementations • 3 Nov 2016 • Zhao Song, David P. Woodruff, Peilin Zhong
We give the first provable approximation algorithms for $\ell_1$-low rank approximation, showing that it is possible to achieve approximation factor $\alpha = (\log d) \cdot \mathrm{poly}(k)$ in $\mathrm{nnz}(A) + (n+d) \mathrm{poly}(k)$ time, where $\mathrm{nnz}(A)$ denotes the number of non-zero entries of $A$.
no code implementations • 28 Jan 2016 • David P. Woodruff, Peilin Zhong
For example, each of $s$ servers may have an $n \times d$ matrix $A^t$, and we may be interested in computing a low rank approximation to $A = f(\sum_{t=1}^s A^t)$, where $f$ is a function which is applied entrywise to the matrix $\sum_{t=1}^s A^t$.
no code implementations • 8 Jul 2015 • Michael B. Cohen, Jelani Nelson, David P. Woodruff
We prove, using the subspace embedding guarantee in a black box way, that one can achieve the spectral norm guarantee for approximate matrix multiplication with a dimensionality-reducing map having $m = O(\tilde{r}/\varepsilon^2)$ rows.
no code implementations • 24 Jun 2015 • Mark Braverman, Ankit Garg, Tengyu Ma, Huy L. Nguyen, David P. Woodruff
We study the tradeoff between the statistical error and communication cost of distributed statistical estimation problems in high dimensions.
no code implementations • 8 Jan 2015 • Mina Ghashami, Edo Liberty, Jeff M. Phillips, David P. Woodruff
It performed $O(d \times \ell)$ operations per row and maintains a sketch matrix $B \in R^{\ell \times d}$ such that for any $k < \ell$ $\|A^TA - B^TB \|_2 \leq \|A - A_k\|_F^2 / (\ell-k)$ and $\|A - \pi_{B_k}(A)\|_F^2 \leq \big(1 + \frac{k}{\ell-k}\big) \|A-A_k\|_F^2 $ .
Data Structures and Algorithms 68W40 (Primary)
no code implementations • 17 Nov 2014 • David P. Woodruff
This survey highlights the recent advances in algorithms for numerical linear algebra that have come from the technique of linear sketching, whereby given a matrix, one first compresses it to a much smaller matrix by multiplying it by a (usually) random matrix with certain properties.
Data Structures and Algorithms
no code implementations • 30 May 2014 • Christos Boutsidis, David P. Woodruff
The CUR decomposition of an $m \times n$ matrix $A$ finds an $m \times c$ matrix $C$ with a subset of $c < n$ columns of $A,$ together with an $r \times n$ matrix $R$ with a subset of $r < m$ rows of $A,$ as well as a $c \times r$ low-rank matrix $U$ such that the matrix $C U R$ approximates the matrix $A,$ that is, $ || A - CUR ||_F^2 \le (1+\epsilon) || A - A_k||_F^2$, where $||.||_F$ denotes the Frobenius norm and $A_k$ is the best $m \times n$ matrix of rank $k$ constructed via the SVD.
1 code implementation • 26 Jul 2012 • Kenneth L. Clarkson, David P. Woodruff
We design a new distribution over $\poly(r \eps^{-1}) \times n$ matrices $S$ so that for any fixed $n \times d$ matrix $A$ of rank $r$, with probability at least 9/10, $\norm{SAx}_2 = (1 \pm \eps)\norm{Ax}_2$ simultaneously for all $x \in \mathbb{R}^d$.
Data Structures and Algorithms
no code implementations • 19 Jul 2012 • Kenneth L. Clarkson, Petros Drineas, Malik Magdon-Ismail, Michael W. Mahoney, Xiangrui Meng, David P. Woodruff
We provide fast algorithms for overconstrained $\ell_p$ regression and related problems: for an $n\times d$ input matrix $A$ and vector $b\in\mathbb{R}^n$, in $O(nd\log n)$ time we reduce the problem $\min_{x\in\mathbb{R}^d} \|Ax-b\|_p$ to the same problem with input matrix $\tilde A$ of dimension $s \times d$ and corresponding $\tilde b$ of dimension $s\times 1$.
no code implementations • 23 Jul 2010 • Daniel M. Kane, Jelani Nelson, Ely Porat, David P. Woodruff
We give a space-optimal algorithm with update time O(log^2(1/eps)loglog(1/eps)) for (1+eps)-approximating the pth frequency moment, 0 < p < 2, of a length-n vector updated in a data stream.
Data Structures and Algorithms