Search Results for author: David Woodruff

Found 28 papers, 6 papers with code

Data-Efficient Learning via Clustering-Based Sensitivity Sampling: Foundation Models and Beyond

no code implementations27 Feb 2024 Kyriakos Axiotis, Vincent Cohen-Addad, Monika Henzinger, Sammy Jerome, Vahab Mirrokni, David Saulpic, David Woodruff, Michael Wunder

We study the data selection problem, whose aim is to select a small representative subset of data that can be used to efficiently train a machine learning model.

Clustering

Adaptive Regret for Bandits Made Possible: Two Queries Suffice

no code implementations17 Jan 2024 Zhou Lu, Qiuyi Zhang, Xinyi Chen, Fred Zhang, David Woodruff, Elad Hazan

In this paper, we give query and regret optimal bandit algorithms under the strict notion of strongly adaptive regret, which measures the maximum regret over any contiguous interval $I$.

Hyperparameter Optimization Multi-Armed Bandits

Almost Linear Constant-Factor Sketching for $\ell_1$ and Logistic Regression

1 code implementation31 Mar 2023 Alexander Munteanu, Simon Omlor, David Woodruff

We improve upon previous oblivious sketching and turnstile streaming results for $\ell_1$ and logistic regression, giving a much smaller sketching dimension achieving $O(1)$-approximation and yielding an efficient optimization problem in the sketch space.

regression

Few-Shot Data-Driven Algorithms for Low Rank Approximation

no code implementations NeurIPS 2021 Piotr Indyk, Tal Wagner, David Woodruff

Recently, data-driven and learning-based algorithms for low rank matrix approximation were shown to outperform classical data-oblivious algorithms by wide margins in terms of accuracy.

Computational Efficiency

Linear and Kernel Classification in the Streaming Model: Improved Bounds for Heavy Hitters

no code implementations NeurIPS 2021 Arvind Mahankali, David Woodruff

For linear classification, we improve upon the algorithm of (Tai, et al. 2018), which solves the $\ell_1$ point query problem on the optimal weight vector $w_* \in \mathbb{R}^d$ in sublinear space.

Oblivious sketching for logistic regression

1 code implementation14 Jul 2021 Alexander Munteanu, Simon Omlor, David Woodruff

Our sketch can be computed in input sparsity time over a turnstile data stream and reduces the size of a $d$-dimensional data set from $n$ to only $\operatorname{poly}(\mu d\log n)$ weighted points, where $\mu$ is a useful parameter which captures the complexity of compressing the data.

regression

A framework for learned sparse sketches

no code implementations1 Jan 2021 Simin Liu, Tianrui Liu, Ali Vakilian, Yulin Wan, David Woodruff

In this work, we consider the problem of optimizing sketches to obtain low approximation error over a data distribution.

Clustering regression

Adaptive Single-Pass Stochastic Gradient Descent in Input Sparsity Time

no code implementations1 Jan 2021 Sepideh Mahabadi, David Woodruff, Samson Zhou

Moreover, we show that our algorithm can be generalized to approximately sample Hessians and thus provides variance reduction for second-order methods as well.

Second-order methods Stochastic Optimization

Learning a Latent Simplex in Input Sparsity Time

no code implementations ICLR 2021 Ainesh Bakshi, Chiranjib Bhattacharyya, Ravi Kannan, David Woodruff, Samson Zhou

Bhattacharyya and Kannan (SODA 2020) give an algorithm for learning such a $k$-vertex latent simplex in time roughly $O(k\cdot\text{nnz}(\mathbf{A}))$, where $\text{nnz}(\mathbf{A})$ is the number of non-zeros in $\mathbf{A}$.

Clustering Topic Models

An Efficient Protocol for Distributed Column Subset Selection in the Entrywise $\ell_p$ Norm

no code implementations1 Jan 2021 Shuli Jiang, Dongyu Li, Irene Mengze Li, Arvind V. Mahankali, David Woodruff

We give a distributed protocol with nearly-optimal communication and number of rounds for Column Subset Selection with respect to the entrywise {$\ell_1$} norm ($k$-CSS$_1$), and more generally, for the $\ell_p$-norm with $1 \leq p < 2$.

3D Reconstruction

Average Case Column Subset Selection for Entrywise \ell_1-Norm Loss

1 code implementation NeurIPS 2019 Zhao Song, David Woodruff, Peilin Zhong

entries drawn from any distribution $\mu$ for which the $(1+\gamma)$-th moment exists, for an arbitrarily small constant $\gamma > 0$, then it is possible to obtain a $(1+\epsilon)$-approximate column subset selection to the entrywise $\ell_1$-norm in nearly linear time.

Efficient and Thrifty Voting by Any Means Necessary

no code implementations NeurIPS 2019 Debmalya Mandal, Ariel D. Procaccia, Nisarg Shah, David Woodruff

We take an unorthodox view of voting by expanding the design space to include both the elicitation rule, whereby voters map their (cardinal) preferences to votes, and the aggregation rule, which transforms the reported votes into collective decisions.

Regularized Weighted Low Rank Approximation

no code implementations NeurIPS 2019 Frank Ban, David Woodruff, Qiuyi Zhang

The classical low rank approximation problem is to find a rank $k$ matrix $UV$ (where $U$ has $k$ columns and $V$ has $k$ rows) that minimizes the Frobenius norm of $A - UV$.

Oblivious Sketching of High-Degree Polynomial Kernels

1 code implementation3 Sep 2019 Thomas D. Ahle, Michael Kapralov, Jakob B. T. Knudsen, Rasmus Pagh, Ameya Velingker, David Woodruff, Amir Zandieh

Oblivious sketching has emerged as a powerful approach to speeding up numerical linear algebra over the past decade, but our understanding of oblivious sketching solutions for kernel matrices has remained quite limited, suffering from the aforementioned exponential dependence on input parameters.

Data Structures and Algorithms

Faster Algorithms for High-Dimensional Robust Covariance Estimation

no code implementations11 Jun 2019 Yu Cheng, Ilias Diakonikolas, Rong Ge, David Woodruff

We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted.

Vocal Bursts Intensity Prediction

Sample-Optimal Low-Rank Approximation of Distance Matrices

no code implementations2 Jun 2019 Piotr Indyk, Ali Vakilian, Tal Wagner, David Woodruff

Recent work by Bakshi and Woodruff (NeurIPS 2018) showed it is possible to compute a rank-$k$ approximation of a distance matrix in time $O((n+m)^{1+\gamma}) \cdot \mathrm{poly}(k, 1/\epsilon)$, where $\epsilon>0$ is an error parameter and $\gamma>0$ is an arbitrarily small constant.

Handwriting Recognition

Leveraging Well-Conditioned Bases: Streaming and Distributed Summaries in Minkowski $p$-Norms

no code implementations ICML 2018 Charlie Dickens, Graham Cormode, David Woodruff

Work on approximate linear algebra has led to efficient distributed and streaming algorithms for problems such as approximate matrix multiplication, low rank approximation, and regression, primarily for the Euclidean norm $\ell_2$.

regression

Conditional Sparse $\ell_p$-norm Regression With Optimal Probability

no code implementations26 Jun 2018 John Hainline, Brendan Juba, Hai S. Le, David Woodruff

We consider the following conditional linear regression problem: the task is to identify both (i) a $k$-DNF condition $c$ and (ii) a linear rule $f$ such that the probability of $c$ is (approximately) at least some given bound $\mu$, and $f$ minimizes the $\ell_p$ loss of predicting the target $z$ in the distribution of examples conditioned on $c$.

regression

Robust Communication-Optimal Distributed Clustering Algorithms

no code implementations2 Mar 2017 Pranjal Awasthi, Ainesh Bakshi, Maria-Florina Balcan, Colin White, David Woodruff

In this work, we study the $k$-median and $k$-means clustering problems when the data is distributed across many servers and can contain outliers.

Clustering

Sublinear Time Orthogonal Tensor Decomposition

1 code implementation NeurIPS 2016 Zhao Song, David Woodruff, huan zhang

We show in a number of cases one can achieve the same theoretical guarantees in sublinear time, i. e., even without reading most of the input tensor.

Tensor Decomposition

Communication Efficient Distributed Kernel Principal Component Analysis

no code implementations23 Mar 2015 Maria-Florina Balcan, YIngyu Liang, Le Song, David Woodruff, Bo Xie

Can we perform kernel PCA on the entire dataset in a distributed and communication efficient fashion while maintaining provable and strong guarantees in solution quality?

Subspace Embeddings for the Polynomial Kernel

no code implementations NeurIPS 2014 Haim Avron, Huy Nguyen, David Woodruff

Sketching is a powerful dimensionality reduction tool for accelerating statistical learning algorithms.

Dimensionality Reduction

Low Rank Approximation Lower Bounds in Row-Update Streams

no code implementations NeurIPS 2014 David Woodruff

We study low-rank approximation in the streaming model in which the rows of an $n \times d$ matrix $A$ are presented one at a time in an arbitrary order.

Improved Distributed Principal Component Analysis

no code implementations NeurIPS 2014 Maria-Florina Balcan, Vandana Kanchanapally, YIngyu Liang, David Woodruff

We give new algorithms and analyses for distributed PCA which lead to improved communication and computational costs for $k$-means clustering and related problems.

Clustering Computational Efficiency +1

Sketching Structured Matrices for Faster Nonlinear Regression

no code implementations NeurIPS 2013 Haim Avron, Vikas Sindhwani, David Woodruff

Motivated by the desire to extend fast randomized techniques to nonlinear $l_p$ regression, we consider a class of structured regression problems.

regression

Cannot find the paper you are looking for? You can Submit a new open access paper.