Search Results for author: David P. Woodruff

Found 77 papers, 7 papers with code

Fast Moment Estimation in Data Streams in Optimal Space

no code implementations23 Jul 2010 Daniel M. Kane, Jelani Nelson, Ely Porat, David P. Woodruff

We give a space-optimal algorithm with update time O(log^2(1/eps)loglog(1/eps)) for (1+eps)-approximating the pth frequency moment, 0 < p < 2, of a length-n vector updated in a data stream.

Data Structures and Algorithms

The Fast Cauchy Transform and Faster Robust Linear Regression

no code implementations19 Jul 2012 Kenneth L. Clarkson, Petros Drineas, Malik Magdon-Ismail, Michael W. Mahoney, Xiangrui Meng, David P. Woodruff

We provide fast algorithms for overconstrained $\ell_p$ regression and related problems: for an $n\times d$ input matrix $A$ and vector $b\in\mathbb{R}^n$, in $O(nd\log n)$ time we reduce the problem $\min_{x\in\mathbb{R}^d} \|Ax-b\|_p$ to the same problem with input matrix $\tilde A$ of dimension $s \times d$ and corresponding $\tilde b$ of dimension $s\times 1$.

regression

Low Rank Approximation and Regression in Input Sparsity Time

1 code implementation26 Jul 2012 Kenneth L. Clarkson, David P. Woodruff

We design a new distribution over $\poly(r \eps^{-1}) \times n$ matrices $S$ so that for any fixed $n \times d$ matrix $A$ of rank $r$, with probability at least 9/10, $\norm{SAx}_2 = (1 \pm \eps)\norm{Ax}_2$ simultaneously for all $x \in \mathbb{R}^d$.

Data Structures and Algorithms

Optimal CUR Matrix Decompositions

no code implementations30 May 2014 Christos Boutsidis, David P. Woodruff

The CUR decomposition of an $m \times n$ matrix $A$ finds an $m \times c$ matrix $C$ with a subset of $c < n$ columns of $A,$ together with an $r \times n$ matrix $R$ with a subset of $r < m$ rows of $A,$ as well as a $c \times r$ low-rank matrix $U$ such that the matrix $C U R$ approximates the matrix $A,$ that is, $ || A - CUR ||_F^2 \le (1+\epsilon) || A - A_k||_F^2$, where $||.||_F$ denotes the Frobenius norm and $A_k$ is the best $m \times n$ matrix of rank $k$ constructed via the SVD.

Sketching as a Tool for Numerical Linear Algebra

no code implementations17 Nov 2014 David P. Woodruff

This survey highlights the recent advances in algorithms for numerical linear algebra that have come from the technique of linear sketching, whereby given a matrix, one first compresses it to a much smaller matrix by multiplying it by a (usually) random matrix with certain properties.

Data Structures and Algorithms

Frequent Directions : Simple and Deterministic Matrix Sketching

no code implementations8 Jan 2015 Mina Ghashami, Edo Liberty, Jeff M. Phillips, David P. Woodruff

It performed $O(d \times \ell)$ operations per row and maintains a sketch matrix $B \in R^{\ell \times d}$ such that for any $k < \ell$ $\|A^TA - B^TB \|_2 \leq \|A - A_k\|_F^2 / (\ell-k)$ and $\|A - \pi_{B_k}(A)\|_F^2 \leq \big(1 + \frac{k}{\ell-k}\big) \|A-A_k\|_F^2 $ .

Data Structures and Algorithms 68W40 (Primary)

Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality

no code implementations24 Jun 2015 Mark Braverman, Ankit Garg, Tengyu Ma, Huy L. Nguyen, David P. Woodruff

We study the tradeoff between the statistical error and communication cost of distributed statistical estimation problems in high dimensions.

Optimal approximate matrix product in terms of stable rank

no code implementations8 Jul 2015 Michael B. Cohen, Jelani Nelson, David P. Woodruff

We prove, using the subspace embedding guarantee in a black box way, that one can achieve the spectral norm guarantee for approximate matrix multiplication with a dimensionality-reducing map having $m = O(\tilde{r}/\varepsilon^2)$ rows.

Clustering Dimensionality Reduction +1

Distributed Low Rank Approximation of Implicit Functions of a Matrix

no code implementations28 Jan 2016 David P. Woodruff, Peilin Zhong

For example, each of $s$ servers may have an $n \times d$ matrix $A^t$, and we may be interested in computing a low rank approximation to $A = f(\sum_{t=1}^s A^t)$, where $f$ is a function which is applied entrywise to the matrix $\sum_{t=1}^s A^t$.

Low Rank Approximation with Entrywise $\ell_1$-Norm Error

no code implementations3 Nov 2016 Zhao Song, David P. Woodruff, Peilin Zhong

We give the first provable approximation algorithms for $\ell_1$-low rank approximation, showing that it is possible to achieve approximation factor $\alpha = (\log d) \cdot \mathrm{poly}(k)$ in $\mathrm{nnz}(A) + (n+d) \mathrm{poly}(k)$ time, where $\mathrm{nnz}(A)$ denotes the number of non-zero entries of $A$.

Faster Kernel Ridge Regression Using Sketching and Preconditioning

1 code implementation10 Nov 2016 Haim Avron, Kenneth L. Clarkson, David P. Woodruff

The preconditioner is based on random feature maps, such as random Fourier features, which have recently emerged as a powerful technique for speeding up and scaling the training of kernel-based methods, such as kernel ridge regression, by resorting to approximations.

regression

Sharper Bounds for Regularized Data Fitting

no code implementations10 Nov 2016 Haim Avron, Kenneth L. Clarkson, David P. Woodruff

We study regularization both in a fairly broad setting, and in the specific context of the popular and widely used technique of ridge regularization; for the latter, as applied to each of these problems, we show algorithmic resource bounds in which the {\em statistical dimension} appears in places where in previous bounds the rank would appear.

Communication-Optimal Distributed Clustering

no code implementations NeurIPS 2016 Jiecao Chen, He Sun, David P. Woodruff, Qin Zhang

We would like the quality of the clustering in the distributed setting to match that in the centralized setting for which all the data resides on a single site.

Clustering

Sublinear Time Low-Rank Approximation of Positive Semidefinite Matrices

no code implementations11 Apr 2017 Cameron Musco, David P. Woodruff

We show how to compute a relative-error low-rank approximation to any positive semidefinite (PSD) matrix in sublinear time, i. e., for any $n \times n$ PSD matrix $A$, in $\tilde O(n \cdot poly(k/\epsilon))$ time we output a rank-$k$ matrix $B$, in factored form, for which $\|A-B\|_F^2 \leq (1+\epsilon)\|A-A_k\|_F^2$, where $A_k$ is the best rank-$k$ approximation to $A$.

Spectrum Approximation Beyond Fast Matrix Multiplication: Algorithms and Hardness

no code implementations13 Apr 2017 Cameron Musco, Praneeth Netrapalli, Aaron Sidford, Shashanka Ubaru, David P. Woodruff

We thus effectively compute a histogram of the spectrum, which can stand in for the true singular values in many applications.

Relative Error Tensor Low Rank Approximation

no code implementations26 Apr 2017 Zhao Song, David P. Woodruff, Peilin Zhong

Despite the success on obtaining relative error low rank approximations for matrices, no such results were known for tensors.

Matrix Completion and Related Problems via Strong Duality

no code implementations27 Apr 2017 Maria-Florina Balcan, YIngyu Liang, David P. Woodruff, Hongyang Zhang

This work studies the strong duality of non-convex matrix factorization problems: we show that under certain dual conditions, these problems and its dual have the same optimum.

Matrix Completion

Fast Regression with an $\ell_\infty$ Guarantee

no code implementations30 May 2017 Eric Price, Zhao Song, David P. Woodruff

Our main result is that, when $S$ is the subsampled randomized Fourier/Hadamard transform, the error $x' - x^*$ behaves as if it lies in a "random" direction within this bound: for any fixed direction $a\in \mathbb{R}^d$, we have with $1 - d^{-c}$ probability that \[ \langle a, x'-x^*\rangle \lesssim \frac{\|a\|_2\|x'-x^*\|_2}{d^{\frac{1}{2}-\gamma}}, \quad (1) \] where $c, \gamma > 0$ are arbitrary constants.

regression

Algorithms for $\ell_p$ Low-Rank Approximation

no code implementations ICML 2017 Flavio Chierichetti, Sreenivas Gollapudi, Ravi Kumar, Silvio Lattanzi, Rina Panigrahy, David P. Woodruff

We consider the problem of approximating a given matrix by a low-rank matrix so as to minimize the entrywise $\ell_p$-approximation error, for any $p \geq 1$; the case $p = 2$ is the classical SVD problem.

Near Optimal Sketching of Low-Rank Tensor Regression

no code implementations NeurIPS 2017 Jarvis Haupt, Xingguo Li, David P. Woodruff

We study the least squares regression problem \begin{align*} \min_{\Theta \in \mathcal{S}_{\odot D, R}} \|A\Theta-b\|_2, \end{align*} where $\mathcal{S}_{\odot D, R}$ is the set of $\Theta$ for which $\Theta = \sum_{r=1}^{R} \theta_1^{(r)} \circ \cdots \circ \theta_D^{(r)}$ for vectors $\theta_d^{(r)} \in \mathbb{R}^{p_d}$ for all $r \in [R]$ and $d \in [D]$, and $\circ$ denotes the outer product of vectors.

Dimensionality Reduction regression

Is Input Sparsity Time Possible for Kernel Low-Rank Approximation?

no code implementations NeurIPS 2017 Cameron Musco, David P. Woodruff

Low-rank approximation is a common tool used to accelerate kernel methods: the $n \times n$ kernel matrix $K$ is approximated via a rank-$k$ matrix $\tilde K$ which can be stored in much less space and processed more quickly.

Sketching for Kronecker Product Regression and P-splines

no code implementations27 Dec 2017 Huaian Diao, Zhao Song, Wen Sun, David P. Woodruff

That is, TensorSketch only provides input sparsity time for Kronecker product regression with respect to the $2$-norm.

regression

On Coresets for Logistic Regression

no code implementations NeurIPS 2018 Alexander Munteanu, Chris Schwiegelshohn, Christian Sohler, David P. Woodruff

For data sets with bounded $\mu(X)$-complexity, we show that a novel sensitivity sampling scheme produces the first provably sublinear $(1\pm\varepsilon)$-coreset.

regression

A PTAS for $\ell_p$-Low Rank Approximation

no code implementations16 Jul 2018 Frank Ban, Vijay Bhattiprolu, Karl Bringmann, Pavel Kolev, Euiwoong Lee, David P. Woodruff

On the algorithmic side, for $p \in (0, 2)$, we give the first $(1+\epsilon)$-approximation algorithm running in time $n^{\text{poly}(k/\epsilon)}$.

Testing Matrix Rank, Optimally

no code implementations18 Oct 2018 Maria-Florina Balcan, Yi Li, David P. Woodruff, Hongyang Zhang

This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix.

Towards a Zero-One Law for Column Subset Selection

1 code implementation NeurIPS 2019 Zhao Song, David P. Woodruff, Peilin Zhong

Our approximation algorithms handle functions which are not even scale-invariant, such as the Huber loss function, which we show have very different structural properties than $\ell_p$-norms, e. g., one can show the lack of scale-invariance causes any column subset selection algorithm to provably require a $\sqrt{\log n}$ factor larger number of columns than $\ell_p$-norms; nevertheless we design the first efficient column subset selection algorithms for such error measures.

Learning Two Layer Rectified Neural Networks in Polynomial Time

no code implementations5 Nov 2018 Ainesh Bakshi, Rajesh Jayaram, David P. Woodruff

Given $n$ samples as a matrix $\mathbf{X} \in \mathbb{R}^{d \times n}$ and the (possibly noisy) labels $\mathbf{U}^* f(\mathbf{V}^* \mathbf{X}) + \mathbf{E}$ of the network on these samples, where $\mathbf{E}$ is a noise matrix, our goal is to recover the weight matrices $\mathbf{U}^*$ and $\mathbf{V}^*$.

Vocal Bursts Valence Prediction

Simple Heuristics Yield Provable Algorithms for Masked Low-Rank Approximation

no code implementations22 Apr 2019 Cameron Musco, Christopher Musco, David P. Woodruff

In particular, for rank $k' > k$ depending on the $public\ coin\ partition\ number$ of $W$, the heuristic outputs rank-$k'$ $L$ with cost$(L) \leq OPT + \epsilon \|A\|_F^2$.

Low-Rank Matrix Completion Tensor Decomposition

Dimensionality Reduction for Tukey Regression

no code implementations14 May 2019 Kenneth L. Clarkson, Ruosong Wang, David P. Woodruff

We give the first dimensionality reduction methods for the overconstrained Tukey regression problem.

Dimensionality Reduction regression

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel $k$-means Clustering

no code implementations15 May 2019 Manuel Fernandez, David P. Woodruff, Taisuke Yasuda

We present tight lower bounds on the number of kernel evaluations required to approximately solve kernel ridge regression (KRR) and kernel $k$-means clustering (KKMC) on $n$ input points.

Clustering Open-Ended Question Answering +1

The Communication Complexity of Optimization

no code implementations13 Jun 2019 Santosh S. Vempala, Ruosong Wang, David P. Woodruff

We first resolve the randomized and deterministic communication complexity in the point-to-point model of communication, showing it is $\tilde{\Theta}(d^2L + sd)$ and $\tilde{\Theta}(sd^2L)$, respectively.

Distributed Optimization

Total Least Squares Regression in Input Sparsity Time

1 code implementation NeurIPS 2019 Huaian Diao, Zhao Song, David P. Woodruff, Xin Yang

In the total least squares problem, one is given an $m \times n$ matrix $A$, and an $m \times d$ matrix $B$, and one seeks to "correct" both $A$ and $B$, obtaining matrices $\hat{A}$ and $\hat{B}$, so that there exists an $X$ satisfying the equation $\hat{A}X = \hat{B}$.

regression

Optimal Sketching for Kronecker Product Regression and Low Rank Approximation

no code implementations NeurIPS 2019 Huaian Diao, Rajesh Jayaram, Zhao Song, Wen Sun, David P. Woodruff

For input $\mathcal{A}$ as above, we give $O(\sum_{i=1}^q \text{nnz}(A_i))$ time algorithms, which is much faster than computing $\mathcal{A}$.

regression

Robust and Sample Optimal Algorithms for PSD Low-Rank Approximation

no code implementations9 Dec 2019 Ainesh Bakshi, Nadiia Chepurko, David P. Woodruff

Our main result is to resolve this question by obtaining an optimal algorithm that queries $O(nk/\epsilon)$ entries of $A$ and outputs a relative-error low-rank approximation in $O(n(k/\epsilon)^{\omega-1})$ time.

Sublinear Time Numerical Linear Algebra for Structured Matrices

no code implementations12 Dec 2019 Xiaofei Shi, David P. Woodruff

For example, in the overconstrained $(1+\epsilon)$-approximate polynomial interpolation problem, $A$ is a Vandermonde matrix and $T(A) = O(n \log n)$; in this case our running time is $n \cdot \poly(\log n) + \poly(d/\epsilon)$ and we recover the results of \cite{avron2013sketching} as a special case.

regression

LSF-Join: Locality Sensitive Filtering for Distributed All-Pairs Set Similarity Under Skew

no code implementations6 Mar 2020 Cyrus Rashtchian, Aneesh Sharma, David P. Woodruff

Theoretically, we show that LSF-Join efficiently finds most close pairs, even for small similarity thresholds and for skewed input sets.

Recommendation Systems

Average Case Column Subset Selection for Entrywise $\ell_1$-Norm Loss

no code implementations16 Apr 2020 Zhao Song, David P. Woodruff, Peilin Zhong

entries drawn from any distribution $\mu$ for which the $(1+\gamma)$-th moment exists, for an arbitrarily small constant $\gamma > 0$, then it is possible to obtain a $(1+\epsilon)$-approximate column subset selection to the entrywise $\ell_1$-norm in nearly linear time.

Non-Adaptive Adaptive Sampling on Turnstile Streams

no code implementations23 Apr 2020 Sepideh Mahabadi, Ilya Razenshteyn, David P. Woodruff, Samson Zhou

Adaptive sampling is a useful algorithmic tool for data summarization problems in the classical centralized setting, where the entire dataset is available to the single processor performing the computation.

Clustering Data Summarization

Learning-Augmented Data Stream Algorithms

no code implementations ICLR 2020 Tanqiu Jiang, Yi Li, Honghao Lin, Yisong Ruan, David P. Woodruff

For estimating the $p$-th frequency moment for $0 < p < 2$ we obtain the first algorithms with optimal update time.

Approximation Algorithms for Sparse Principal Component Analysis

no code implementations23 Jun 2020 Agniva Chowdhury, Petros Drineas, David P. Woodruff, Samson Zhou

To improve the interpretability of PCA, various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis (SPCA).

Dimensionality Reduction

Vector-Matrix-Vector Queries for Solving Linear Algebra, Statistics, and Graph Problems

no code implementations24 Jun 2020 Cyrus Rashtchian, David P. Woodruff, Hanlin Zhu

We consider the general problem of learning about a matrix through vector-matrix-vector queries.

Streaming Complexity of SVMs

no code implementations7 Jul 2020 Alexandr Andoni, Collin Burns, Yi Li, Sepideh Mahabadi, David P. Woodruff

We show that, for both problems, for dimensions $d=1, 2$, one can obtain streaming algorithms with space polynomially smaller than $\frac{1}{\lambda\epsilon}$, which is the complexity of SGD for strongly convex functions like the bias-regularized SVM, and which is known to be tight in general, even for $d=1$.

WOR and $p$'s: Sketches for $\ell_p$-Sampling Without Replacement

no code implementations NeurIPS 2020 Edith Cohen, Rasmus Pagh, David P. Woodruff

We design novel composable sketches for WOR $\ell_p$ sampling, weighted sampling of keys according to a power $p\in[0, 2]$ of their frequency (or for signed data, sum of updates).

Optimal $\ell_1$ Column Subset Selection and a Fast PTAS for Low Rank Approximation

no code implementations20 Jul 2020 Arvind V. Mahankali, David P. Woodruff

We give the first polynomial time column subset selection-based $\ell_1$ low rank approximation algorithm sampling $\tilde{O}(k)$ columns and achieving an $\tilde{O}(k^{1/2})$-approximation for any $k$, improving upon the previous best $\tilde{O}(k)$-approximation and matching a prior lower bound for column subset selection-based $\ell_1$-low rank approximation which holds for any $\text{poly}(k)$ number of columns.

Learning the Positions in CountSketch

no code implementations20 Jul 2020 Simin Liu, Tianrui Liu, Ali Vakilian, Yulin Wan, David P. Woodruff

Despite the growing body of work on this paradigm, a noticeable omission is that the locations of the non-zero entries of previous algorithms were fixed, and only their values were learned.

Clustering

Hutch++: Optimal Stochastic Trace Estimation

1 code implementation19 Oct 2020 Raphael A. Meyer, Cameron Musco, Christopher Musco, David P. Woodruff

This improves on the ubiquitous Hutchinson's estimator, which requires $O(1/\epsilon^2)$ matrix-vector products.

Quantum-Inspired Algorithms from Randomized Numerical Linear Algebra

no code implementations9 Nov 2020 Nadiia Chepurko, Kenneth L. Clarkson, Lior Horesh, Honghao Lin, David P. Woodruff

We create classical (non-quantum) dynamic data structures supporting queries for recommender systems and least-squares regression that are comparable to their quantum analogues.

Recommendation Systems

Revisiting the Sample Complexity of Sparse Spectrum Approximation of Gaussian Processes

1 code implementation NeurIPS 2020 Quang Minh Hoang, Trong Nghia Hoang, Hai Pham, David P. Woodruff

We introduce a new scalable approximation for Gaussian processes with provable guarantees which hold simultaneously over its entire parameter space.

Gaussian Processes

Learning-Augmented Sketches for Hessians

no code implementations24 Feb 2021 Yi Li, Honghao Lin, David P. Woodruff

We show how to design learned sketches for the Hessian in the context of second order methods.

Dimensionality Reduction Second-order methods

Learning a Latent Simplex in Input-Sparsity Time

no code implementations17 May 2021 Ainesh Bakshi, Chiranjib Bhattacharyya, Ravi Kannan, David P. Woodruff, Samson Zhou

We consider the problem of learning a latent $k$-vertex simplex $K\subset\mathbb{R}^d$, given access to $A\in\mathbb{R}^{d\times n}$, which can be viewed as a data matrix with $n$ points that are obtained by randomly perturbing latent points in the simplex $K$ (potentially beyond $K$).

Topic Models

Non-PSD Matrix Sketching with Applications to Regression and Optimization

no code implementations16 Jun 2021 Zhili Feng, Fred Roosta, David P. Woodruff

In this paper, we present novel dimensionality reduction methods for non-PSD matrices, as well as their ``square-roots", which involve matrices with complex entries.

Dimensionality Reduction regression

Average-Case Communication Complexity of Statistical Problems

no code implementations3 Jul 2021 Cyrus Rashtchian, David P. Woodruff, Peng Ye, Hanlin Zhu

Our motivation is to understand the statistical-computational trade-offs in streaming, sketching, and query-based models.

Single Pass Entrywise-Transformed Low Rank Approximation

no code implementations16 Jul 2021 Yifei Jiang, Yi Li, Yiming Sun, Jiaxin Wang, David P. Woodruff

A natural way to do this would be to simply apply $f$ to each entry of $A$, and then compute the matrix decomposition, but this requires storing all of $A$ as well as multiple passes over its entries.

Open-Ended Question Answering

Near-Optimal Algorithms for Linear Algebra in the Current Matrix Multiplication Time

no code implementations16 Jul 2021 Nadiia Chepurko, Kenneth L. Clarkson, Praneeth Kacham, David P. Woodruff

This question is regarding the logarithmic factors in the sketching dimension of existing oblivious subspace embeddings that achieve constant-factor approximation.

Open-Ended Question Answering regression

Fast Sketching of Polynomial Kernels of Polynomial Degree

no code implementations21 Aug 2021 Zhao Song, David P. Woodruff, Zheng Yu, Lichen Zhang

Recent techniques in oblivious sketching reduce the dependence in the running time on the degree $q$ of the polynomial kernel from exponential to polynomial, which is useful for the Gaussian kernel, for which $q$ can be chosen to be polylogarithmic.

BIG-bench Machine Learning

Active Linear Regression for $\ell_p$ Norms and Beyond

no code implementations9 Nov 2021 Cameron Musco, Christopher Musco, David P. Woodruff, Taisuke Yasuda

By combining this with our techniques for $\ell_p$ regression, we obtain an active regression algorithm making $\tilde O(d^{1+\max\{1, p/2\}}/\mathrm{poly}(\epsilon))$ queries for such loss functions, including the Tukey and Huber losses, answering another question of [CD21].

Dimensionality Reduction Open-Ended Question Answering +1

Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time

no code implementations9 Feb 2022 David P. Woodruff, Amir Zandieh

We propose an input sparsity time sampling algorithm that can spectrally approximate the Gram matrix corresponding to the $q$-fold column-wise tensor product of $q$ matrices using a nearly optimal number of samples, improving upon all previously known methods by poly$(q)$ factors.

Low-Rank Approximation with $1/ε^{1/3}$ Matrix-Vector Products

no code implementations10 Feb 2022 Ainesh Bakshi, Kenneth L. Clarkson, David P. Woodruff

For the special cases of $p=2$ (Frobenius norm) and $p = \infty$ (Spectral norm), Musco and Musco (NeurIPS 2015) obtained an algorithm based on Krylov methods that uses $\tilde{O}(k/\sqrt{\epsilon})$ matrix-vector products, improving on the na\"ive $\tilde{O}(k/\epsilon)$ dependence obtainable by the power method, where $\tilde{O}$ suppresses poly$(\log(dk/\epsilon))$ factors.

Triangle and Four Cycle Counting with Predictions in Graph Streams

no code implementations ICLR 2022 Justin Y. Chen, Talya Eden, Piotr Indyk, Honghao Lin, Shyam Narayanan, Ronitt Rubinfeld, Sandeep Silwal, Tal Wagner, David P. Woodruff, Michael Zhang

We propose data-driven one-pass streaming algorithms for estimating the number of triangles and four cycles, two fundamental problems in graph analytics that are widely studied in the graph data stream literature.

Sketching Algorithms and Lower Bounds for Ridge Regression

no code implementations13 Apr 2022 Praneeth Kacham, David P. Woodruff

For example, to produce a $1+\varepsilon$ approximate solution in $1$ iteration, which requires $2$ passes over the input, our algorithm requires the OSNAP embedding to have $m= O(n\sigma^2/\lambda\varepsilon)$ rows with a sparsity parameter $s = O(\log(n))$, whereas the earlier algorithm of Chowdhury et al. with the same number of rows of OSNAP requires a sparsity $s = O(\sqrt{\sigma^2/\lambda\varepsilon} \cdot \log(n))$, where $\sigma = \opnorm{A}$ is the spectral norm of the matrix $A$.

regression

Memory Bounds for the Experts Problem

no code implementations21 Apr 2022 Vaidehi Srinivas, David P. Woodruff, Ziyu Xu, Samson Zhou

We initiate the study of the learning with expert advice problem in the streaming setting, and show lower and upper bounds.

Bounding the Width of Neural Networks via Coupled Initialization -- A Worst Case Analysis

no code implementations26 Jun 2022 Alexander Munteanu, Simon Omlor, Zhao Song, David P. Woodruff

A common method in training neural networks is to initialize all the weights to be independent Gaussian vectors.

Near-Linear Time and Fixed-Parameter Tractable Algorithms for Tensor Decompositions

no code implementations15 Jul 2022 Arvind V. Mahankali, David P. Woodruff, Ziyu Zhang

Our key technique is a method for obtaining subspace embeddings with a number of rows polynomial in $q$ for a matrix which is the flattening of a tensor train of $q$ tensors.

Dimensionality Reduction Tensor Decomposition +1

Adaptive Sketches for Robust Regression with Importance Sampling

no code implementations16 Jul 2022 Sepideh Mahabadi, David P. Woodruff, Samson Zhou

In this paper, we introduce an algorithm that approximately samples $T$ gradients of dimension $d$ from nearly the optimal importance sampling distribution for a robust regression problem over $n$ rows.

regression

Online Lewis Weight Sampling

no code implementations17 Jul 2022 David P. Woodruff, Taisuke Yasuda

Towards our result, we give the first analysis of "one-shot'' Lewis weight sampling of sampling rows proportionally to their Lewis weights, with sample complexity $\tilde O(d^{p/2}/\epsilon^2)$ for $p>2$.

Open-Ended Question Answering regression

Optimal Query Complexities for Dynamic Trace Estimation

no code implementations30 Sep 2022 David P. Woodruff, Fred Zhang, Qiuyi Zhang

Specifically, for any $m$ matrices $A_1,..., A_m$ with consecutive differences bounded in Schatten-$1$ norm by $\alpha$, we provide a novel binary tree summation procedure that simultaneously estimates all $m$ traces up to $\epsilon$ error with $\delta$ failure probability with an optimal query complexity of $\widetilde{O}\left(m \alpha\sqrt{\log(1/\delta)}/\epsilon + m\log(1/\delta)\right)$, improving the dependence on both $\alpha$ and $\delta$ from Dharangutte and Musco (NeurIPS, 2021).

On Differential Privacy and Adaptive Data Analysis with Bounded Space

no code implementations11 Feb 2023 Itai Dinur, Uri Stemmer, David P. Woodruff, Samson Zhou

We study the space complexity of the two related fields of differential privacy and adaptive data analysis.

Streaming Algorithms for Learning with Experts: Deterministic Versus Robust

no code implementations3 Mar 2023 David P. Woodruff, Fred Zhang, Samson Zhou

In the online learning with experts problem, an algorithm must make a prediction about an outcome on each of $T$ days (or times), given a set of $n$ experts who make predictions on each day (or time).

Optimal Sketching Bounds for Sparse Linear Regression

no code implementations5 Apr 2023 Tung Mai, Alexander Munteanu, Cameron Musco, Anup B. Rao, Chris Schwiegelshohn, David P. Woodruff

For this problem, under the $\ell_2$ norm, we observe an upper bound of $O(k \log (d)/\varepsilon + k\log(k/\varepsilon)/\varepsilon^2)$ rows, showing that sparse recovery is strictly easier to sketch than sparse regression.

regression

Sharper Bounds for $\ell_p$ Sensitivity Sampling

no code implementations1 Jun 2023 David P. Woodruff, Taisuke Yasuda

In this work, we show the first bounds for sensitivity sampling for $\ell_p$ subspace embeddings for $p > 2$ that improve over the general $\mathfrak S d$ bound, achieving a bound of roughly $\mathfrak S^{2-2/p}$ for $2<p<\infty$.

Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization

no code implementations2 Jun 2023 Ameya Velingker, Maximilian Vötsch, David P. Woodruff, Samson Zhou

We introduce efficient $(1+\varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem, where the inputs are a matrix $\mathbf{A}\in\{0, 1\}^{n\times d}$, a rank parameter $k>0$, as well as an accuracy parameter $\varepsilon>0$, and the goal is to approximate $\mathbf{A}$ as a product of low-rank factors $\mathbf{U}\in\{0, 1\}^{n\times k}$ and $\mathbf{V}\in\{0, 1\}^{k\times d}$.

Learning the Positions in CountSketch

no code implementations11 Jun 2023 Yi Li, Honghao Lin, Simin Liu, Ali Vakilian, David P. Woodruff

We fix this issue and propose approaches for learning a sketching matrix for both low-rank approximation and Hessian approximation for second order optimization.

$\ell_p$-Regression in the Arbitrary Partition Model of Communication

no code implementations11 Jul 2023 Yi Li, Honghao Lin, David P. Woodruff

We consider the randomized communication complexity of the distributed $\ell_p$-regression problem in the coordinator model, for $p\in (0, 2]$.

regression

Task-Based MoE for Multitask Multilingual Machine Translation

no code implementations30 Aug 2023 Hai Pham, Young Jin Kim, Subhabrata Mukherjee, David P. Woodruff, Barnabas Poczos, Hany Hassan Awadalla

Mixture-of-experts (MoE) architecture has been proven a powerful method for diverse tasks in training deep models in many applications.

Machine Translation Translation

HyperAttention: Long-context Attention in Near-Linear Time

1 code implementation9 Oct 2023 Insu Han, Rajesh Jayaram, Amin Karbasi, Vahab Mirrokni, David P. Woodruff, Amir Zandieh

Recent work suggests that in the worst-case scenario, quadratic time is necessary unless the entries of the attention matrix are bounded or the matrix has low stable rank.

Cannot find the paper you are looking for? You can Submit a new open access paper.