1 code implementation • 25 Jun 2024 • Aashiq Muhamed, Oscar Li, David Woodruff, Mona Diab, Virginia Smith
Large language model (LLM) training and finetuning are often bottlenecked by limited GPU memory.
no code implementations • 27 Feb 2024 • Kyriakos Axiotis, Vincent Cohen-Addad, Monika Henzinger, Sammy Jerome, Vahab Mirrokni, David Saulpic, David Woodruff, Michael Wunder
We study the data selection problem, whose aim is to select a small representative subset of data that can be used to efficiently train a machine learning model.
no code implementations • 17 Jan 2024 • Zhou Lu, Qiuyi Zhang, Xinyi Chen, Fred Zhang, David Woodruff, Elad Hazan
In this paper, we give query and regret optimal bandit algorithms under the strict notion of strongly adaptive regret, which measures the maximum regret over any contiguous interval $I$.
1 code implementation • 31 Mar 2023 • Alexander Munteanu, Simon Omlor, David Woodruff
We improve upon previous oblivious sketching and turnstile streaming results for $\ell_1$ and logistic regression, giving a much smaller sketching dimension achieving $O(1)$-approximation and yielding an efficient optimization problem in the sketch space.
no code implementations • NeurIPS 2021 • Arvind Mahankali, David Woodruff
For linear classification, we improve upon the algorithm of (Tai, et al. 2018), which solves the $\ell_1$ point query problem on the optimal weight vector $w_* \in \mathbb{R}^d$ in sublinear space.
no code implementations • NeurIPS 2021 • Piotr Indyk, Tal Wagner, David Woodruff
Recently, data-driven and learning-based algorithms for low rank matrix approximation were shown to outperform classical data-oblivious algorithms by wide margins in terms of accuracy.
1 code implementation • 14 Jul 2021 • Alexander Munteanu, Simon Omlor, David Woodruff
Our sketch can be computed in input sparsity time over a turnstile data stream and reduces the size of a $d$-dimensional data set from $n$ to only $\operatorname{poly}(\mu d\log n)$ weighted points, where $\mu$ is a useful parameter which captures the complexity of compressing the data.
no code implementations • 1 Jan 2021 • Shuli Jiang, Dongyu Li, Irene Mengze Li, Arvind V. Mahankali, David Woodruff
We give a distributed protocol with nearly-optimal communication and number of rounds for Column Subset Selection with respect to the entrywise {$\ell_1$} norm ($k$-CSS$_1$), and more generally, for the $\ell_p$-norm with $1 \leq p < 2$.
no code implementations • 1 Jan 2021 • Simin Liu, Tianrui Liu, Ali Vakilian, Yulin Wan, David Woodruff
In this work, we consider the problem of optimizing sketches to obtain low approximation error over a data distribution.
no code implementations • ICLR 2021 • Ainesh Bakshi, Chiranjib Bhattacharyya, Ravi Kannan, David Woodruff, Samson Zhou
Bhattacharyya and Kannan (SODA 2020) give an algorithm for learning such a $k$-vertex latent simplex in time roughly $O(k\cdot\text{nnz}(\mathbf{A}))$, where $\text{nnz}(\mathbf{A})$ is the number of non-zeros in $\mathbf{A}$.
no code implementations • 1 Jan 2021 • Sepideh Mahabadi, David Woodruff, Samson Zhou
Moreover, we show that our algorithm can be generalized to approximately sample Hessians and thus provides variance reduction for second-order methods as well.
1 code implementation • NeurIPS 2019 • Zhao Song, David Woodruff, Peilin Zhong
entries drawn from any distribution $\mu$ for which the $(1+\gamma)$-th moment exists, for an arbitrarily small constant $\gamma > 0$, then it is possible to obtain a $(1+\epsilon)$-approximate column subset selection to the entrywise $\ell_1$-norm in nearly linear time.
no code implementations • NeurIPS 2019 • Debmalya Mandal, Ariel D. Procaccia, Nisarg Shah, David Woodruff
We take an unorthodox view of voting by expanding the design space to include both the elicitation rule, whereby voters map their (cardinal) preferences to votes, and the aggregation rule, which transforms the reported votes into collective decisions.
1 code implementation • NeurIPS 2019 • Michela Meister, Tamas Sarlos, David Woodruff
We give a new analysis of this sketch, providing nearly optimal bounds.
no code implementations • NeurIPS 2019 • Frank Ban, David Woodruff, Qiuyi Zhang
The classical low rank approximation problem is to find a rank $k$ matrix $UV$ (where $U$ has $k$ columns and $V$ has $k$ rows) that minimizes the Frobenius norm of $A - UV$.
1 code implementation • 3 Sep 2019 • Thomas D. Ahle, Michael Kapralov, Jakob B. T. Knudsen, Rasmus Pagh, Ameya Velingker, David Woodruff, Amir Zandieh
Oblivious sketching has emerged as a powerful approach to speeding up numerical linear algebra over the past decade, but our understanding of oblivious sketching solutions for kernel matrices has remained quite limited, suffering from the aforementioned exponential dependence on input parameters.
Data Structures and Algorithms
no code implementations • 11 Jun 2019 • Yu Cheng, Ilias Diakonikolas, Rong Ge, David Woodruff
We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted.
no code implementations • 2 Jun 2019 • Piotr Indyk, Ali Vakilian, Tal Wagner, David Woodruff
Recent work by Bakshi and Woodruff (NeurIPS 2018) showed it is possible to compute a rank-$k$ approximation of a distance matrix in time $O((n+m)^{1+\gamma}) \cdot \mathrm{poly}(k, 1/\epsilon)$, where $\epsilon>0$ is an error parameter and $\gamma>0$ is an arbitrarily small constant.
no code implementations • NeurIPS 2018 • Roie Levin, Anish Prasad Sevekari, David Woodruff
We study robust subspace estimation in the streaming and distributed settings.
no code implementations • ICML 2018 • Charlie Dickens, Graham Cormode, David Woodruff
Work on approximate linear algebra has led to efficient distributed and streaming algorithms for problems such as approximate matrix multiplication, low rank approximation, and regression, primarily for the Euclidean norm $\ell_2$.
no code implementations • 26 Jun 2018 • John Hainline, Brendan Juba, Hai S. Le, David Woodruff
We consider the following conditional linear regression problem: the task is to identify both (i) a $k$-DNF condition $c$ and (ii) a linear rule $f$ such that the probability of $c$ is (approximately) at least some given bound $\mu$, and $f$ minimizes the $\ell_p$ loss of predicting the target $z$ in the distribution of examples conditioned on $c$.
no code implementations • NeurIPS 2017 • Karl Bringmann, Pavel Kolev, David Woodruff
For small $\psi$, our approximation factor is $1+o(1)$.
no code implementations • 2 Mar 2017 • Pranjal Awasthi, Ainesh Bakshi, Maria-Florina Balcan, Colin White, David Woodruff
In this work, we study the $k$-median and $k$-means clustering problems when the data is distributed across many servers and can contain outliers.
1 code implementation • NeurIPS 2016 • Zhao Song, David Woodruff, huan zhang
We show in a number of cases one can achieve the same theoretical guarantees in sublinear time, i. e., even without reading most of the input tensor.
no code implementations • 23 Mar 2015 • Maria-Florina Balcan, YIngyu Liang, Le Song, David Woodruff, Bo Xie
Can we perform kernel PCA on the entire dataset in a distributed and communication efficient fashion while maintaining provable and strong guarantees in solution quality?
no code implementations • NeurIPS 2014 • David Woodruff
We study low-rank approximation in the streaming model in which the rows of an $n \times d$ matrix $A$ are presented one at a time in an arbitrary order.
no code implementations • NeurIPS 2014 • Haim Avron, Huy Nguyen, David Woodruff
Sketching is a powerful dimensionality reduction tool for accelerating statistical learning algorithms.
no code implementations • NeurIPS 2014 • Maria-Florina Balcan, Vandana Kanchanapally, YIngyu Liang, David Woodruff
We give new algorithms and analyses for distributed PCA which lead to improved communication and computational costs for $k$-means clustering and related problems.
no code implementations • NeurIPS 2013 • Haim Avron, Vikas Sindhwani, David Woodruff
Motivated by the desire to extend fast randomized techniques to nonlinear $l_p$ regression, we consider a class of structured regression problems.