Search Results for author: David P. Woodruff

Found 77 papers, 7 papers with code

Fast Moment Estimation in Data Streams in Optimal Space

no code implementations • 23 Jul 2010 • Daniel M. Kane, Jelani Nelson, Ely Porat, David P. Woodruff

We give a space-optimal algorithm with update time O(log^2(1/eps)loglog(1/eps)) for (1+eps)-approximating the pth frequency moment, 0 < p < 2, of a length-n vector updated in a data stream.

Data Structures and Algorithms

Paper
Add Code

The Fast Cauchy Transform and Faster Robust Linear Regression

no code implementations • 19 Jul 2012 • Kenneth L. Clarkson, Petros Drineas, Malik Magdon-Ismail, Michael W. Mahoney, Xiangrui Meng, David P. Woodruff

We provide fast algorithms for overconstrained $\ell_p$ regression and related problems: for an $n\times d$ input matrix $A$ and vector $b\in\mathbb{R}^n$, in $O(nd\log n)$ time we reduce the problem $\min_{x\in\mathbb{R}^d} \|Ax-b\|_p$ to the same problem with input matrix $\tilde A$ of dimension $s \times d$ and corresponding $\tilde b$ of dimension $s\times 1$.

regression

Paper
Add Code

Low Rank Approximation and Regression in Input Sparsity Time

1 code implementation • 26 Jul 2012 • Kenneth L. Clarkson, David P. Woodruff

We design a new distribution over $\poly(r \eps^{-1}) \times n$ matrices $S$ so that for any fixed $n \times d$ matrix $A$ of rank $r$, with probability at least 9/10, $\norm{SAx}_2 = (1 \pm \eps)\norm{Ax}_2$ simultaneously for all $x \in \mathbb{R}^d$.

Data Structures and Algorithms

Paper
Code

Optimal CUR Matrix Decompositions

no code implementations • 30 May 2014 • Christos Boutsidis, David P. Woodruff

The CUR decomposition of an $m \times n$ matrix $A$ finds an $m \times c$ matrix $C$ with a subset of $c < n$ columns of $A,$ together with an $r \times n$ matrix $R$ with a subset of $r < m$ rows of $A,$ as well as a $c \times r$ low-rank matrix $U$ such that the matrix $C U R$ approximates the matrix $A,$ that is, $ || A - CUR ||_F^2 \le (1+\epsilon) || A - A_k||_F^2$, where $||.||_F$ denotes the Frobenius norm and $A_k$ is the best $m \times n$ matrix of rank $k$ constructed via the SVD.

Paper
Add Code

Sketching as a Tool for Numerical Linear Algebra

no code implementations • 17 Nov 2014 • David P. Woodruff

This survey highlights the recent advances in algorithms for numerical linear algebra that have come from the technique of linear sketching, whereby given a matrix, one first compresses it to a much smaller matrix by multiplying it by a (usually) random matrix with certain properties.

Data Structures and Algorithms

Paper
Add Code

Frequent Directions : Simple and Deterministic Matrix Sketching

no code implementations • 8 Jan 2015 • Mina Ghashami, Edo Liberty, Jeff M. Phillips, David P. Woodruff

It performed $O(d \times \ell)$ operations per row and maintains a sketch matrix $B \in R^{\ell \times d}$ such that for any $k < \ell$ $\|A^TA - B^TB \|_2 \leq \|A - A_k\|_F^2 / (\ell-k)$ and $\|A - \pi_{B_k}(A)\|_F^2 \leq \big(1 + \frac{k}{\ell-k}\big) \|A-A_k\|_F^2 $ .

Data Structures and Algorithms 68W40 (Primary)

Paper
Add Code

Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality

no code implementations • 24 Jun 2015 • Mark Braverman, Ankit Garg, Tengyu Ma, Huy L. Nguyen, David P. Woodruff

We study the tradeoff between the statistical error and communication cost of distributed statistical estimation problems in high dimensions.

Paper
Add Code

Optimal approximate matrix product in terms of stable rank

no code implementations • 8 Jul 2015 • Michael B. Cohen, Jelani Nelson, David P. Woodruff

We prove, using the subspace embedding guarantee in a black box way, that one can achieve the spectral norm guarantee for approximate matrix multiplication with a dimensionality-reducing map having $m = O(\tilde{r}/\varepsilon^2)$ rows.

Clustering Dimensionality Reduction +1

Paper
Add Code

Distributed Low Rank Approximation of Implicit Functions of a Matrix

no code implementations • 28 Jan 2016 • David P. Woodruff, Peilin Zhong

For example, each of $s$ servers may have an $n \times d$ matrix $A^t$, and we may be interested in computing a low rank approximation to $A = f(\sum_{t=1}^s A^t)$, where $f$ is a function which is applied entrywise to the matrix $\sum_{t=1}^s A^t$.

Paper
Add Code

Low Rank Approximation with Entrywise $\ell_1$-Norm Error

no code implementations • 3 Nov 2016 • Zhao Song, David P. Woodruff, Peilin Zhong

We give the first provable approximation algorithms for $\ell_1$-low rank approximation, showing that it is possible to achieve approximation factor $\alpha = (\log d) \cdot \mathrm{poly}(k)$ in $\mathrm{nnz}(A) + (n+d) \mathrm{poly}(k)$ time, where $\mathrm{nnz}(A)$ denotes the number of non-zero entries of $A$.

Paper
Add Code

Faster Kernel Ridge Regression Using Sketching and Preconditioning

1 code implementation • 10 Nov 2016 • Haim Avron, Kenneth L. Clarkson, David P. Woodruff

The preconditioner is based on random feature maps, such as random Fourier features, which have recently emerged as a powerful technique for speeding up and scaling the training of kernel-based methods, such as kernel ridge regression, by resorting to approximations.

regression

Paper
Code

Sharper Bounds for Regularized Data Fitting

no code implementations • 10 Nov 2016 • Haim Avron, Kenneth L. Clarkson, David P. Woodruff

We study regularization both in a fairly broad setting, and in the specific context of the popular and widely used technique of ridge regularization; for the latter, as applied to each of these problems, we show algorithmic resource bounds in which the {\em statistical dimension} appears in places where in previous bounds the rank would appear.

Paper
Add Code

Communication-Optimal Distributed Clustering

no code implementations • NeurIPS 2016 • Jiecao Chen, He Sun, David P. Woodruff, Qin Zhang

We would like the quality of the clustering in the distributed setting to match that in the centralized setting for which all the data resides on a single site.

Clustering

Paper
Add Code

Sublinear Time Low-Rank Approximation of Positive Semidefinite Matrices

no code implementations • 11 Apr 2017 • Cameron Musco, David P. Woodruff

We show how to compute a relative-error low-rank approximation to any positive semidefinite (PSD) matrix in sublinear time, i. e., for any $n \times n$ PSD matrix $A$, in $\tilde O(n \cdot poly(k/\epsilon))$ time we output a rank-$k$ matrix $B$, in factored form, for which $\|A-B\|_F^2 \leq (1+\epsilon)\|A-A_k\|_F^2$, where $A_k$ is the best rank-$k$ approximation to $A$.

Paper
Add Code

Spectrum Approximation Beyond Fast Matrix Multiplication: Algorithms and Hardness

no code implementations • 13 Apr 2017 • Cameron Musco, Praneeth Netrapalli, Aaron Sidford, Shashanka Ubaru, David P. Woodruff

We thus effectively compute a histogram of the spectrum, which can stand in for the true singular values in many applications.

Paper
Add Code

Relative Error Tensor Low Rank Approximation

no code implementations • 26 Apr 2017 • Zhao Song, David P. Woodruff, Peilin Zhong

Despite the success on obtaining relative error low rank approximations for matrices, no such results were known for tensors.

Paper
Add Code

Matrix Completion and Related Problems via Strong Duality

no code implementations • 27 Apr 2017 • Maria-Florina Balcan, YIngyu Liang, David P. Woodruff, Hongyang Zhang

This work studies the strong duality of non-convex matrix factorization problems: we show that under certain dual conditions, these problems and its dual have the same optimum.

Matrix Completion

Paper
Add Code

Fast Regression with an $\ell_\infty$ Guarantee

no code implementations • 30 May 2017 • Eric Price, Zhao Song, David P. Woodruff

Our main result is that, when $S$ is the subsampled randomized Fourier/Hadamard transform, the error $x' - x^*$ behaves as if it lies in a "random" direction within this bound: for any fixed direction $a\in \mathbb{R}^d$, we have with $1 - d^{-c}$ probability that \[ \langle a, x'-x^*\rangle \lesssim \frac{\|a\|_2\|x'-x^*\|_2}{d^{\frac{1}{2}-\gamma}}, \quad (1) \] where $c, \gamma > 0$ are arbitrary constants.

regression

Paper
Add Code

Algorithms for $\ell_p$ Low-Rank Approximation

no code implementations • ICML 2017 • Flavio Chierichetti, Sreenivas Gollapudi, Ravi Kumar, Silvio Lattanzi, Rina Panigrahy, David P. Woodruff

We consider the problem of approximating a given matrix by a low-rank matrix so as to minimize the entrywise $\ell_p$-approximation error, for any $p \geq 1$; the case $p = 2$ is the classical SVD problem.

Paper
Add Code

Near Optimal Sketching of Low-Rank Tensor Regression

no code implementations • NeurIPS 2017 • Jarvis Haupt, Xingguo Li, David P. Woodruff

We study the least squares regression problem \begin{align*} \min_{\Theta \in \mathcal{S}_{\odot D, R}} \|A\Theta-b\|_2, \end{align*} where $\mathcal{S}_{\odot D, R}$ is the set of $\Theta$ for which $\Theta = \sum_{r=1}^{R} \theta_1^{(r)} \circ \cdots \circ \theta_D^{(r)}$ for vectors $\theta_d^{(r)} \in \mathbb{R}^{p_d}$ for all $r \in [R]$ and $d \in [D]$, and $\circ$ denotes the outer product of vectors.

Dimensionality Reduction regression

Paper
Add Code

Approximation Algorithms for $\ell_0$-Low Rank Approximation

no code implementations • 30 Oct 2017 • Karl Bringmann, Pavel Kolev, David P. Woodruff

For small $\psi$, our approximation factor is $1+o(1)$.

Paper
Add Code

Is Input Sparsity Time Possible for Kernel Low-Rank Approximation?

no code implementations • NeurIPS 2017 • Cameron Musco, David P. Woodruff

Low-rank approximation is a common tool used to accelerate kernel methods: the $n \times n$ kernel matrix $K$ is approximated via a rank-$k$ matrix $\tilde K$ which can be stored in much less space and processed more quickly.

Paper
Add Code

Sketching for Kronecker Product Regression and P-splines

no code implementations • 27 Dec 2017 • Huaian Diao, Zhao Song, Wen Sun, David P. Woodruff

That is, TensorSketch only provides input sparsity time for Kronecker product regression with respect to the $2$-norm.

regression

Paper
Add Code

On Coresets for Logistic Regression

no code implementations • NeurIPS 2018 • Alexander Munteanu, Chris Schwiegelshohn, Christian Sohler, David P. Woodruff

For data sets with bounded $\mu(X)$-complexity, we show that a novel sensitivity sampling scheme produces the first provably sublinear $(1\pm\varepsilon)$-coreset.

regression

Paper
Add Code

A PTAS for $\ell_p$-Low Rank Approximation

no code implementations • 16 Jul 2018 • Frank Ban, Vijay Bhattiprolu, Karl Bringmann, Pavel Kolev, Euiwoong Lee, David P. Woodruff

On the algorithmic side, for $p \in (0, 2)$, we give the first $(1+\epsilon)$-approximation algorithm running in time $n^{\text{poly}(k/\epsilon)}$.

Paper
Add Code

Testing Matrix Rank, Optimally

no code implementations • 18 Oct 2018 • Maria-Florina Balcan, Yi Li, David P. Woodruff, Hongyang Zhang

This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix.

Paper
Add Code

Towards a Zero-One Law for Column Subset Selection

1 code implementation • NeurIPS 2019 • Zhao Song, David P. Woodruff, Peilin Zhong

Our approximation algorithms handle functions which are not even scale-invariant, such as the Huber loss function, which we show have very different structural properties than $\ell_p$-norms, e. g., one can show the lack of scale-invariance causes any column subset selection algorithm to provably require a $\sqrt{\log n}$ factor larger number of columns than $\ell_p$-norms; nevertheless we design the first efficient column subset selection algorithms for such error measures.

Paper
Code

Learning Two Layer Rectified Neural Networks in Polynomial Time

no code implementations • 5 Nov 2018 • Ainesh Bakshi, Rajesh Jayaram, David P. Woodruff

Given $n$ samples as a matrix $\mathbf{X} \in \mathbb{R}^{d \times n}$ and the (possibly noisy) labels $\mathbf{U}^* f(\mathbf{V}^* \mathbf{X}) + \mathbf{E}$ of the network on these samples, where $\mathbf{E}$ is a noise matrix, our goal is to recover the weight matrices $\mathbf{U}^*$ and $\mathbf{V}^*$.

Vocal Bursts Valence Prediction

Paper
Add Code

Simple Heuristics Yield Provable Algorithms for Masked Low-Rank Approximation

no code implementations • 22 Apr 2019 • Cameron Musco, Christopher Musco, David P. Woodruff

In particular, for rank $k' > k$ depending on the $public\ coin\ partition\ number$ of $W$, the heuristic outputs rank-$k'$ $L$ with cost$(L) \leq OPT + \epsilon \|A\|_F^2$.

Low-Rank Matrix Completion Tensor Decomposition

Paper
Add Code

Dimensionality Reduction for Tukey Regression

no code implementations • 14 May 2019 • Kenneth L. Clarkson, Ruosong Wang, David P. Woodruff

We give the first dimensionality reduction methods for the overconstrained Tukey regression problem.

Dimensionality Reduction regression

Paper
Add Code

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel $k$-means Clustering

no code implementations • 15 May 2019 • Manuel Fernandez, David P. Woodruff, Taisuke Yasuda

We present tight lower bounds on the number of kernel evaluations required to approximately solve kernel ridge regression (KRR) and kernel $k$-means clustering (KKMC) on $n$ input points.

Clustering Open-Ended Question Answering +1

Paper
Add Code

The Communication Complexity of Optimization

no code implementations • 13 Jun 2019 • Santosh S. Vempala, Ruosong Wang, David P. Woodruff

We first resolve the randomized and deterministic communication complexity in the point-to-point model of communication, showing it is $\tilde{\Theta}(d^2L + sd)$ and $\tilde{\Theta}(sd^2L)$, respectively.

Distributed Optimization

Paper
Add Code

Total Least Squares Regression in Input Sparsity Time

1 code implementation • NeurIPS 2019 • Huaian Diao, Zhao Song, David P. Woodruff, Xin Yang

In the total least squares problem, one is given an $m \times n$ matrix $A$, and an $m \times d$ matrix $B$, and one seeks to "correct" both $A$ and $B$, obtaining matrices $\hat{A}$ and $\hat{B}$, so that there exists an $X$ satisfying the equation $\hat{A}X = \hat{B}$.

regression

Paper
Code

Optimal Sketching for Kronecker Product Regression and Low Rank Approximation

no code implementations • NeurIPS 2019 • Huaian Diao, Rajesh Jayaram, Zhao Song, Wen Sun, David P. Woodruff

For input $\mathcal{A}$ as above, we give $O(\sum_{i=1}^q \text{nnz}(A_i))$ time algorithms, which is much faster than computing $\mathcal{A}$.

regression

Paper
Add Code

Robust and Sample Optimal Algorithms for PSD Low-Rank Approximation

no code implementations • 9 Dec 2019 • Ainesh Bakshi, Nadiia Chepurko, David P. Woodruff

Our main result is to resolve this question by obtaining an optimal algorithm that queries $O(nk/\epsilon)$ entries of $A$ and outputs a relative-error low-rank approximation in $O(n(k/\epsilon)^{\omega-1})$ time.

Paper
Add Code

Sublinear Time Numerical Linear Algebra for Structured Matrices

no code implementations • 12 Dec 2019 • Xiaofei Shi, David P. Woodruff

For example, in the overconstrained $(1+\epsilon)$-approximate polynomial interpolation problem, $A$ is a Vandermonde matrix and $T(A) = O(n \log n)$; in this case our running time is $n \cdot \poly(\log n) + \poly(d/\epsilon)$ and we recover the results of \cite{avron2013sketching} as a special case.

regression

Paper
Add Code

LSF-Join: Locality Sensitive Filtering for Distributed All-Pairs Set Similarity Under Skew

no code implementations • 6 Mar 2020 • Cyrus Rashtchian, Aneesh Sharma, David P. Woodruff

Theoretically, we show that LSF-Join efficiently finds most close pairs, even for small similarity thresholds and for skewed input sets.

Recommendation Systems

Paper
Add Code

Average Case Column Subset Selection for Entrywise $\ell_1$-Norm Loss

no code implementations • 16 Apr 2020 • Zhao Song, David P. Woodruff, Peilin Zhong

entries drawn from any distribution $\mu$ for which the $(1+\gamma)$-th moment exists, for an arbitrarily small constant $\gamma > 0$, then it is possible to obtain a $(1+\epsilon)$-approximate column subset selection to the entrywise $\ell_1$-norm in nearly linear time.

Paper
Add Code

Non-Adaptive Adaptive Sampling on Turnstile Streams

no code implementations • 23 Apr 2020 • Sepideh Mahabadi, Ilya Razenshteyn, David P. Woodruff, Samson Zhou

Adaptive sampling is a useful algorithmic tool for data summarization problems in the classical centralized setting, where the entire dataset is available to the single processor performing the computation.

Clustering Data Summarization

Paper
Add Code

Learning-Augmented Data Stream Algorithms

no code implementations • ICLR 2020 • Tanqiu Jiang, Yi Li, Honghao Lin, Yisong Ruan, David P. Woodruff

For estimating the $p$-th frequency moment for $0 < p < 2$ we obtain the first algorithms with optimal update time.

Paper
Add Code

Approximation Algorithms for Sparse Principal Component Analysis

no code implementations • 23 Jun 2020 • Agniva Chowdhury, Petros Drineas, David P. Woodruff, Samson Zhou

To improve the interpretability of PCA, various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis (SPCA).

Dimensionality Reduction

Paper
Add Code

Vector-Matrix-Vector Queries for Solving Linear Algebra, Statistics, and Graph Problems

no code implementations • 24 Jun 2020 • Cyrus Rashtchian, David P. Woodruff, Hanlin Zhu

We consider the general problem of learning about a matrix through vector-matrix-vector queries.

Paper
Add Code

Streaming Complexity of SVMs

no code implementations • 7 Jul 2020 • Alexandr Andoni, Collin Burns, Yi Li, Sepideh Mahabadi, David P. Woodruff

We show that, for both problems, for dimensions $d=1, 2$, one can obtain streaming algorithms with space polynomially smaller than $\frac{1}{\lambda\epsilon}$, which is the complexity of SGD for strongly convex functions like the bias-regularized SVM, and which is known to be tight in general, even for $d=1$.

Paper
Add Code

WOR and $p$'s: Sketches for $\ell_p$-Sampling Without Replacement

no code implementations • NeurIPS 2020 • Edith Cohen, Rasmus Pagh, David P. Woodruff

We design novel composable sketches for WOR $\ell_p$ sampling, weighted sampling of keys according to a power $p\in[0, 2]$ of their frequency (or for signed data, sum of updates).

Paper
Add Code

Optimal $\ell_1$ Column Subset Selection and a Fast PTAS for Low Rank Approximation

no code implementations • 20 Jul 2020 • Arvind V. Mahankali, David P. Woodruff

We give the first polynomial time column subset selection-based $\ell_1$ low rank approximation algorithm sampling $\tilde{O}(k)$ columns and achieving an $\tilde{O}(k^{1/2})$-approximation for any $k$, improving upon the previous best $\tilde{O}(k)$-approximation and matching a prior lower bound for column subset selection-based $\ell_1$-low rank approximation which holds for any $\text{poly}(k)$ number of columns.

Paper
Add Code

Learning the Positions in CountSketch

no code implementations • 20 Jul 2020 • Simin Liu, Tianrui Liu, Ali Vakilian, Yulin Wan, David P. Woodruff

Despite the growing body of work on this paradigm, a noticeable omission is that the locations of the non-zero entries of previous algorithms were fixed, and only their values were learned.

Clustering

Paper
Add Code

Hutch++: Optimal Stochastic Trace Estimation

1 code implementation • 19 Oct 2020 • Raphael A. Meyer, Cameron Musco, Christopher Musco, David P. Woodruff

This improves on the ubiquitous Hutchinson's estimator, which requires $O(1/\epsilon^2)$ matrix-vector products.

Paper
Code

Quantum-Inspired Algorithms from Randomized Numerical Linear Algebra

no code implementations • 9 Nov 2020 • Nadiia Chepurko, Kenneth L. Clarkson, Lior Horesh, Honghao Lin, David P. Woodruff

We create classical (non-quantum) dynamic data structures supporting queries for recommender systems and least-squares regression that are comparable to their quantum analogues.

Recommendation Systems

Paper
Add Code

Revisiting the Sample Complexity of Sparse Spectrum Approximation of Gaussian Processes

1 code implementation • NeurIPS 2020 • Quang Minh Hoang, Trong Nghia Hoang, Hai Pham, David P. Woodruff

We introduce a new scalable approximation for Gaussian processes with provable guarantees which hold simultaneously over its entire parameter space.

Gaussian Processes

Paper
Code

Learning-Augmented Sketches for Hessians

no code implementations • 24 Feb 2021 • Yi Li, Honghao Lin, David P. Woodruff

We show how to design learned sketches for the Hessian in the context of second order methods.

Dimensionality Reduction Second-order methods

Paper
Add Code

Learning a Latent Simplex in Input-Sparsity Time

no code implementations • 17 May 2021 • Ainesh Bakshi, Chiranjib Bhattacharyya, Ravi Kannan, David P. Woodruff, Samson Zhou

We consider the problem of learning a latent $k$-vertex simplex $K\subset\mathbb{R}^d$, given access to $A\in\mathbb{R}^{d\times n}$, which can be viewed as a data matrix with $n$ points that are obtained by randomly perturbing latent points in the simplex $K$ (potentially beyond $K$).

Topic Models

Paper
Add Code

Non-PSD Matrix Sketching with Applications to Regression and Optimization

no code implementations • 16 Jun 2021 • Zhili Feng, Fred Roosta, David P. Woodruff

In this paper, we present novel dimensionality reduction methods for non-PSD matrices, as well as their ``square-roots", which involve matrices with complex entries.

Dimensionality Reduction regression

Paper
Add Code

Average-Case Communication Complexity of Statistical Problems

no code implementations • 3 Jul 2021 • Cyrus Rashtchian, David P. Woodruff, Peng Ye, Hanlin Zhu

Our motivation is to understand the statistical-computational trade-offs in streaming, sketching, and query-based models.

Paper
Add Code

Single Pass Entrywise-Transformed Low Rank Approximation

no code implementations • 16 Jul 2021 • Yifei Jiang, Yi Li, Yiming Sun, Jiaxin Wang, David P. Woodruff

A natural way to do this would be to simply apply $f$ to each entry of $A$, and then compute the matrix decomposition, but this requires storing all of $A$ as well as multiple passes over its entries.

Open-Ended Question Answering

Paper
Add Code

Near-Optimal Algorithms for Linear Algebra in the Current Matrix Multiplication Time

no code implementations • 16 Jul 2021 • Nadiia Chepurko, Kenneth L. Clarkson, Praneeth Kacham, David P. Woodruff

This question is regarding the logarithmic factors in the sketching dimension of existing oblivious subspace embeddings that achieve constant-factor approximation.

Open-Ended Question Answering regression

Paper
Add Code

Fast Sketching of Polynomial Kernels of Polynomial Degree

no code implementations • 21 Aug 2021 • Zhao Song, David P. Woodruff, Zheng Yu, Lichen Zhang

Recent techniques in oblivious sketching reduce the dependence in the running time on the degree $q$ of the polynomial kernel from exponential to polynomial, which is useful for the Gaussian kernel, for which $q$ can be chosen to be polylogarithmic.

BIG-bench Machine Learning

Paper
Add Code

Learning-Augmented $k$-means Clustering

no code implementations • ICLR 2022 • Jon C. Ergun, Zhili Feng, Sandeep Silwal, David P. Woodruff, Samson Zhou

$k$-means clustering is a well-studied problem due to its wide applicability.

Clustering

Paper
Add Code

Active Linear Regression for $\ell_p$ Norms and Beyond

no code implementations • 9 Nov 2021 • Cameron Musco, Christopher Musco, David P. Woodruff, Taisuke Yasuda

By combining this with our techniques for $\ell_p$ regression, we obtain an active regression algorithm making $\tilde O(d^{1+\max\{1, p/2\}}/\mathrm{poly}(\epsilon))$ queries for such loss functions, including the Tukey and Huber losses, answering another question of [CD21].

Dimensionality Reduction Open-Ended Question Answering +1

Paper
Add Code

Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time

no code implementations • 9 Feb 2022 • David P. Woodruff, Amir Zandieh

We propose an input sparsity time sampling algorithm that can spectrally approximate the Gram matrix corresponding to the $q$-fold column-wise tensor product of $q$ matrices using a nearly optimal number of samples, improving upon all previously known methods by poly$(q)$ factors.

Paper
Add Code

Low-Rank Approximation with $1/ε^{1/3}$ Matrix-Vector Products

no code implementations • 10 Feb 2022 • Ainesh Bakshi, Kenneth L. Clarkson, David P. Woodruff

For the special cases of $p=2$ (Frobenius norm) and $p = \infty$ (Spectral norm), Musco and Musco (NeurIPS 2015) obtained an algorithm based on Krylov methods that uses $\tilde{O}(k/\sqrt{\epsilon})$ matrix-vector products, improving on the na\"ive $\tilde{O}(k/\epsilon)$ dependence obtainable by the power method, where $\tilde{O}$ suppresses poly$(\log(dk/\epsilon))$ factors.

Paper
Add Code

Triangle and Four Cycle Counting with Predictions in Graph Streams

no code implementations • ICLR 2022 • Justin Y. Chen, Talya Eden, Piotr Indyk, Honghao Lin, Shyam Narayanan, Ronitt Rubinfeld, Sandeep Silwal, Tal Wagner, David P. Woodruff, Michael Zhang

We propose data-driven one-pass streaming algorithms for estimating the number of triangles and four cycles, two fundamental problems in graph analytics that are widely studied in the graph data stream literature.

Paper
Add Code

Sketching Algorithms and Lower Bounds for Ridge Regression

no code implementations • 13 Apr 2022 • Praneeth Kacham, David P. Woodruff

For example, to produce a $1+\varepsilon$ approximate solution in $1$ iteration, which requires $2$ passes over the input, our algorithm requires the OSNAP embedding to have $m= O(n\sigma^2/\lambda\varepsilon)$ rows with a sparsity parameter $s = O(\log(n))$, whereas the earlier algorithm of Chowdhury et al. with the same number of rows of OSNAP requires a sparsity $s = O(\sqrt{\sigma^2/\lambda\varepsilon} \cdot \log(n))$, where $\sigma = \opnorm{A}$ is the spectral norm of the matrix $A$.

regression

Paper
Add Code

Memory Bounds for the Experts Problem

no code implementations • 21 Apr 2022 • Vaidehi Srinivas, David P. Woodruff, Ziyu Xu, Samson Zhou

We initiate the study of the learning with expert advice problem in the streaming setting, and show lower and upper bounds.

Paper
Add Code

Bounding the Width of Neural Networks via Coupled Initialization -- A Worst Case Analysis

no code implementations • 26 Jun 2022 • Alexander Munteanu, Simon Omlor, Zhao Song, David P. Woodruff

A common method in training neural networks is to initialize all the weights to be independent Gaussian vectors.

Paper
Add Code

Near-Linear Time and Fixed-Parameter Tractable Algorithms for Tensor Decompositions

no code implementations • 15 Jul 2022 • Arvind V. Mahankali, David P. Woodruff, Ziyu Zhang

Our key technique is a method for obtaining subspace embeddings with a number of rows polynomial in $q$ for a matrix which is the flattening of a tensor train of $q$ tensors.

Dimensionality Reduction Tensor Decomposition +1

Paper
Add Code

Adaptive Sketches for Robust Regression with Importance Sampling

no code implementations • 16 Jul 2022 • Sepideh Mahabadi, David P. Woodruff, Samson Zhou

In this paper, we introduce an algorithm that approximately samples $T$ gradients of dimension $d$ from nearly the optimal importance sampling distribution for a robust regression problem over $n$ rows.

regression

Paper
Add Code

Online Lewis Weight Sampling

no code implementations • 17 Jul 2022 • David P. Woodruff, Taisuke Yasuda

Towards our result, we give the first analysis of "one-shot'' Lewis weight sampling of sampling rows proportionally to their Lewis weights, with sample complexity $\tilde O(d^{p/2}/\epsilon^2)$ for $p>2$.

Open-Ended Question Answering regression

Paper
Add Code

Optimal Query Complexities for Dynamic Trace Estimation

no code implementations • 30 Sep 2022 • David P. Woodruff, Fred Zhang, Qiuyi Zhang

Specifically, for any $m$ matrices $A_1,..., A_m$ with consecutive differences bounded in Schatten-$1$ norm by $\alpha$, we provide a novel binary tree summation procedure that simultaneously estimates all $m$ traces up to $\epsilon$ error with $\delta$ failure probability with an optimal query complexity of $\widetilde{O}\left(m \alpha\sqrt{\log(1/\delta)}/\epsilon + m\log(1/\delta)\right)$, improving the dependence on both $\alpha$ and $\delta$ from Dharangutte and Musco (NeurIPS, 2021).

Paper
Add Code

On Differential Privacy and Adaptive Data Analysis with Bounded Space

no code implementations • 11 Feb 2023 • Itai Dinur, Uri Stemmer, David P. Woodruff, Samson Zhou

We study the space complexity of the two related fields of differential privacy and adaptive data analysis.

Paper
Add Code

Streaming Algorithms for Learning with Experts: Deterministic Versus Robust

no code implementations • 3 Mar 2023 • David P. Woodruff, Fred Zhang, Samson Zhou

In the online learning with experts problem, an algorithm must make a prediction about an outcome on each of $T$ days (or times), given a set of $n$ experts who make predictions on each day (or time).

Paper
Add Code

Optimal Sketching Bounds for Sparse Linear Regression

no code implementations • 5 Apr 2023 • Tung Mai, Alexander Munteanu, Cameron Musco, Anup B. Rao, Chris Schwiegelshohn, David P. Woodruff

For this problem, under the $\ell_2$ norm, we observe an upper bound of $O(k \log (d)/\varepsilon + k\log(k/\varepsilon)/\varepsilon^2)$ rows, showing that sparse recovery is strictly easier to sketch than sparse regression.

regression

Paper
Add Code

Sharper Bounds for $\ell_p$ Sensitivity Sampling

no code implementations • 1 Jun 2023 • David P. Woodruff, Taisuke Yasuda

In this work, we show the first bounds for sensitivity sampling for $\ell_p$ subspace embeddings for $p > 2$ that improve over the general $\mathfrak S d$ bound, achieving a bound of roughly $\mathfrak S^{2-2/p}$ for $2<p<\infty$.

Paper
Add Code

Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization

no code implementations • 2 Jun 2023 • Ameya Velingker, Maximilian Vötsch, David P. Woodruff, Samson Zhou

We introduce efficient $(1+\varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem, where the inputs are a matrix $\mathbf{A}\in\{0, 1\}^{n\times d}$, a rank parameter $k>0$, as well as an accuracy parameter $\varepsilon>0$, and the goal is to approximate $\mathbf{A}$ as a product of low-rank factors $\mathbf{U}\in\{0, 1\}^{n\times k}$ and $\mathbf{V}\in\{0, 1\}^{k\times d}$.

Paper
Add Code

Learning the Positions in CountSketch

no code implementations • 11 Jun 2023 • Yi Li, Honghao Lin, Simin Liu, Ali Vakilian, David P. Woodruff

We fix this issue and propose approaches for learning a sketching matrix for both low-rank approximation and Hessian approximation for second order optimization.

Paper
Add Code

$\ell_p$-Regression in the Arbitrary Partition Model of Communication

no code implementations • 11 Jul 2023 • Yi Li, Honghao Lin, David P. Woodruff

We consider the randomized communication complexity of the distributed $\ell_p$-regression problem in the coordinator model, for $p\in (0, 2]$.

regression

Paper
Add Code

Task-Based MoE for Multitask Multilingual Machine Translation

no code implementations • 30 Aug 2023 • Hai Pham, Young Jin Kim, Subhabrata Mukherjee, David P. Woodruff, Barnabas Poczos, Hany Hassan Awadalla

Mixture-of-experts (MoE) architecture has been proven a powerful method for diverse tasks in training deep models in many applications.

Machine Translation Translation

Paper
Add Code

HyperAttention: Long-context Attention in Near-Linear Time

1 code implementation • 9 Oct 2023 • Insu Han, Rajesh Jayaram, Amin Karbasi, Vahab Mirrokni, David P. Woodruff, Amir Zandieh

Recent work suggests that in the worst-case scenario, quadratic time is necessary unless the entries of the attention matrix are bounded or the matrix has low stable rank.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.