no code implementations • 1 Dec 2024 • Maryam Aliakbarpour, Piotr Indyk, Ronitt Rubinfeld, Sandeep Silwal
We provide lower bounds to indicate that the improvements in sample complexity achieved by our algorithms are information-theoretically optimal.
no code implementations • 30 Oct 2024 • Anders Aamand, Alexandr Andoni, Justin Y. Chen, Piotr Indyk, Shyam Narayanan, Sandeep Silwal, Haike Xu
In particular, if an algorithm uses $O(n/\log^c k)$ samples for some constant $c>0$ and polynomial space, then the query time of the data structure must be at least $k^{1-O(1)/\log \log k}$, i. e., close to linear in the number of distributions $k$.
1 code implementation • 5 Jun 2024 • Haike Xu, Sandeep Silwal, Piotr Indyk
In both cases we show that, as long as the proxy metric used to construct the data structure approximates the ground-truth metric up to a bounded factor, our data structure achieves arbitrarily good approximation guarantees with respect to the ground-truth metric.
no code implementations • 13 Mar 2024 • Arturs Backurs, Zinan Lin, Sepideh Mahabadi, Sandeep Silwal, Jakub Tarnawski
We abstract out this common subroutine and study the following fundamental algorithmic problem: Given a similarity function $f$ and a large high-dimensional private dataset $X \subset \mathbb{R}^d$, output a differentially private (DP) data structure which approximates $\sum_{x \in X} f(x, y)$ for any query $y$.
no code implementations • NeurIPS 2023 • Anders Aamand, Justin Y. Chen, Huy Lê Nguyen, Sandeep Silwal, Ali Vakilian
In particular, their learning-augmented frequency estimation algorithm uses a learned heavy-hitter oracle which predicts which elements will appear many times in the stream.
no code implementations • 6 Jul 2023 • Ainesh Bakshi, Piotr Indyk, Rajesh Jayaram, Sandeep Silwal, Erik Waingarten
For any two point sets $A, B \subset \mathbb{R}^d$ of size up to $n$, the Chamfer distance from $A$ to $B$ is defined as $\text{CH}(A, B)=\sum_{a \in A} \min_{b \in B} d_X(a, b)$, where $d_X$ is the underlying distance measure (e. g., the Euclidean or Manhattan distance).
1 code implementation • 20 Jun 2023 • Anders Aamand, Alexandr Andoni, Justin Y. Chen, Piotr Indyk, Shyam Narayanan, Sandeep Silwal
We study statistical/computational tradeoffs for the following density estimation problem: given $k$ distributions $v_1, \ldots, v_k$ over a discrete domain of size $n$, and sampling access to a distribution $p$, identify $v_i$ that is "close" to $p$.
no code implementations • 15 Apr 2023 • Nicholas Schiefer, Justin Y. Chen, Piotr Indyk, Shyam Narayanan, Sandeep Silwal, Tal Wagner
An $\varepsilon$-approximate quantile sketch over a stream of $n$ inputs approximates the rank of any query point $q$ - that is, the number of input points less than $q$ - up to an additive error of $\varepsilon n$, generally with some probability of at least $1 - 1/\mathrm{poly}(n)$, while consuming $o(n)$ space.
no code implementations • 2 Mar 2023 • Anders Aamand, Justin Y. Chen, Huy Lê Nguyen, Sandeep Silwal
We give improved tradeoffs between space and regret for the online learning with expert advice problem over $T$ days with $n$ experts.
no code implementations • 1 Dec 2022 • Ainesh Bakshi, Piotr Indyk, Praneeth Kacham, Sandeep Silwal, Samson Zhou
We build on the recent Kernel Density Estimation framework, which (after preprocessing in time subquadratic in $n$) can return estimates of row/column sums of the kernel matrix.
no code implementations • 6 Nov 2022 • Anders Aamand, Justin Y. Chen, Piotr Indyk, Shyam Narayanan, Ronitt Rubinfeld, Nicholas Schiefer, Sandeep Silwal, Tal Wagner
However, those simulations involve neural networks for the 'combine' function of size polynomial or even exponential in the number of graph nodes $n$, as well as feature vectors of length linear in $n$.
no code implementations • 21 Sep 2022 • Elena Grigorescu, Young-San Lin, Sandeep Silwal, Maoyuan Song, Samson Zhou
We show that if the predictor is accurate, we can efficiently bypass these impossibility results and achieve a constant-factor approximation to the optimal solution, i. e., consistency.
no code implementations • 29 Jun 2022 • Eric Price, Sandeep Silwal, Samson Zhou
We further show fine-grained hardness of robust regression through a reduction from the minimum-weight $k$-clique conjecture.
no code implementations • ICLR 2022 • Justin Y. Chen, Talya Eden, Piotr Indyk, Honghao Lin, Shyam Narayanan, Ronitt Rubinfeld, Sandeep Silwal, Tal Wagner, David P. Woodruff, Michael Zhang
We propose data-driven one-pass streaming algorithms for estimating the number of triangles and four cycles, two fundamental problems in graph analytics that are widely studied in the graph data stream literature.
no code implementations • ICLR 2022 • Jon C. Ergun, Zhili Feng, Sandeep Silwal, David P. Woodruff, Samson Zhou
$k$-means clustering is a well-studied problem due to its wide applicability.
no code implementations • NeurIPS 2021 • Zachary Izzo, Sandeep Silwal, Samson Zhou
In order to cope with this "curse of dimensionality," we study dimensionality reduction techniques for the Wasserstein barycenter problem.
no code implementations • 29 Sep 2021 • Dan Kushnir, Sandeep Silwal
In addition, we show theoretically and empirically that ClusterTree finds partitions which are superior to those found by RP trees in preserving the cluster structure of the input dataset.
no code implementations • 5 Jul 2021 • Shyam Narayanan, Sandeep Silwal, Piotr Indyk, Or Zamir
Random dimensionality reduction is a versatile tool for speeding up algorithms for high-dimensional problems.
no code implementations • NeurIPS 2021 • Vladimir Braverman, Avinatan Hassidim, Yossi Matias, Mariano Schain, Sandeep Silwal, Samson Zhou
In this paper, we introduce adversarially robust streaming algorithms for central machine learning and algorithmic tasks, such as regression and clustering, as well as their more general counterparts, subspace embedding, low-rank approximation, and coreset construction.
no code implementations • ICLR 2021 • Talya Eden, Piotr Indyk, Shyam Narayanan, Ronitt Rubinfeld, Sandeep Silwal, Tal Wagner
We consider the problem of estimating the number of distinct elements in a large data set (or, equivalently, the support size of the distribution induced by the data set) from a random sample of its elements.
1 code implementation • 2 Dec 2019 • Rikhav Shah, Sandeep Silwal
t-SNE is a popular tool for embedding multi-dimensional datasets into two or three dimensions.
no code implementations • 17 Nov 2019 • Maryam Aliakbarpour, Sandeep Silwal
We propose a new setting for testing properties of distributions while receiving samples from several distributions, but few samples per distribution.