Search Results for author: Piotr Indyk

Found 34 papers, 8 papers with code

Worst-case Performance of Popular Approximate Nearest Neighbor Search Implementations: Guarantees and Limitations

1 code implementation NeurIPS 2023 Piotr Indyk, Haike Xu

Graph-based approaches to nearest neighbor search are popular and powerful tools for handling large datasets in practice, but they have limited theoretical guarantees.

A Near-Linear Time Algorithm for the Chamfer Distance

no code implementations6 Jul 2023 Ainesh Bakshi, Piotr Indyk, Rajesh Jayaram, Sandeep Silwal, Erik Waingarten

For any two point sets $A, B \subset \mathbb{R}^d$ of size up to $n$, the Chamfer distance from $A$ to $B$ is defined as $\text{CH}(A, B)=\sum_{a \in A} \min_{b \in B} d_X(a, b)$, where $d_X$ is the underlying distance measure (e. g., the Euclidean or Manhattan distance).

Data Structures for Density Estimation

1 code implementation20 Jun 2023 Anders Aamand, Alexandr Andoni, Justin Y. Chen, Piotr Indyk, Shyam Narayanan, Sandeep Silwal

We study statistical/computational tradeoffs for the following density estimation problem: given $k$ distributions $v_1, \ldots, v_k$ over a discrete domain of size $n$, and sampling access to a distribution $p$, identify $v_i$ that is "close" to $p$.

Density Estimation

Learned Interpolation for Better Streaming Quantile Approximation with Worst-Case Guarantees

no code implementations15 Apr 2023 Nicholas Schiefer, Justin Y. Chen, Piotr Indyk, Shyam Narayanan, Sandeep Silwal, Tal Wagner

An $\varepsilon$-approximate quantile sketch over a stream of $n$ inputs approximates the rank of any query point $q$ - that is, the number of input points less than $q$ - up to an additive error of $\varepsilon n$, generally with some probability of at least $1 - 1/\mathrm{poly}(n)$, while consuming $o(n)$ space.

Sub-quadratic Algorithms for Kernel Matrices via Kernel Density Estimation

no code implementations1 Dec 2022 Ainesh Bakshi, Piotr Indyk, Praneeth Kacham, Sandeep Silwal, Samson Zhou

We build on the recent Kernel Density Estimation framework, which (after preprocessing in time subquadratic in $n$) can return estimates of row/column sums of the kernel matrix.

Density Estimation

Exponentially Improving the Complexity of Simulating the Weisfeiler-Lehman Test with Graph Neural Networks

no code implementations6 Nov 2022 Anders Aamand, Justin Y. Chen, Piotr Indyk, Shyam Narayanan, Ronitt Rubinfeld, Nicholas Schiefer, Sandeep Silwal, Tal Wagner

However, those simulations involve neural networks for the 'combine' function of size polynomial or even exponential in the number of graph nodes $n$, as well as feature vectors of length linear in $n$.

Generalization Bounds for Data-Driven Numerical Linear Algebra

no code implementations16 Jun 2022 Peter Bartlett, Piotr Indyk, Tal Wagner

Our techniques are general, and provide generalization bounds for many other recently proposed data-driven algorithms in numerical linear algebra, covering both sketching-based and multigrid-based methods.

Generalization Bounds PAC learning

Triangle and Four Cycle Counting with Predictions in Graph Streams

no code implementations ICLR 2022 Justin Y. Chen, Talya Eden, Piotr Indyk, Honghao Lin, Shyam Narayanan, Ronitt Rubinfeld, Sandeep Silwal, Tal Wagner, David P. Woodruff, Michael Zhang

We propose data-driven one-pass streaming algorithms for estimating the number of triangles and four cycles, two fundamental problems in graph analytics that are widely studied in the graph data stream literature.

Few-Shot Data-Driven Algorithms for Low Rank Approximation

no code implementations NeurIPS 2021 Piotr Indyk, Tal Wagner, David Woodruff

Recently, data-driven and learning-based algorithms for low rank matrix approximation were shown to outperform classical data-oblivious algorithms by wide margins in terms of accuracy.

Computational Efficiency

Targeted Supervised Contrastive Learning for Long-Tailed Recognition

1 code implementation CVPR 2022 Tianhong Li, Peng Cao, Yuan Yuan, Lijie Fan, Yuzhe Yang, Rogerio Feris, Piotr Indyk, Dina Katabi

This forces all classes, including minority classes, to maintain a uniform distribution in the feature space, improves class boundaries, and provides better generalization even in the presence of long-tail data.

Contrastive Learning Long-tail Learning

Embeddings and labeling schemes for A*

no code implementations19 Nov 2021 Talya Eden, Piotr Indyk, Haike Xu

In particular, we consider heuristics induced by norm embeddings and distance labeling schemes, and provide lower bounds for the tradeoffs between the number of dimensions or bits used to represent each graph node, and the running time of the A* algorithm.

(Optimal) Online Bipartite Matching with Degree Information

no code implementations21 Oct 2021 Anders Aamand, Justin Y. Chen, Piotr Indyk

For the bipartite version of a stochastic graph model due to Chung, Lu, and Vu where the expected values of the offline degrees are known and used as predictions, we show that MinPredictedDegree stochastically dominates any other online algorithm, i. e., it is optimal for graphs drawn from this model.

Learning-based Support Estimation in Sublinear Time

no code implementations ICLR 2021 Talya Eden, Piotr Indyk, Shyam Narayanan, Ronitt Rubinfeld, Sandeep Silwal, Tal Wagner

We consider the problem of estimating the number of distinct elements in a large data set (or, equivalently, the support size of the distribution induced by the data set) from a random sample of its elements.

Faster Kernel Matrix Algebra via Density Estimation

no code implementations16 Feb 2021 Arturs Backurs, Piotr Indyk, Cameron Musco, Tal Wagner

In particular, we consider estimating the sum of kernel matrix entries, along with its top eigenvalue and eigenvector.

Density Estimation

Addressing Feature Suppression in Unsupervised Visual Representations

no code implementations17 Dec 2020 Tianhong Li, Lijie Fan, Yuan Yuan, Hao He, Yonglong Tian, Rogerio Feris, Piotr Indyk, Dina Katabi

However, contrastive learning is susceptible to feature suppression, i. e., it may discard important information relevant to the task of interest, and learn irrelevant features.

Attribute Contrastive Learning +1

Online Page Migration with ML Advice

no code implementations9 Jun 2020 Piotr Indyk, Frederik Mallmann-Trenn, Slobodan Mitrović, Ronitt Rubinfeld

In contrast, we show that if the algorithm is given a prediction of the input sequence, then it can achieve a competitive ratio that tends to $1$ as the prediction error rate tends to $0$.

Space and Time Efficient Kernel Density Estimation in High Dimensions

1 code implementation NeurIPS 2019 Arturs Backurs, Piotr Indyk, Tal Wagner

We instantiate our framework with the Laplacian and Exponential kernels, two popular kernels which possess the aforementioned property.

Density Estimation Vocal Bursts Intensity Prediction

Estimating Entropy of Distributions in Constant Space

no code implementations NeurIPS 2019 Jayadev Acharya, Sourbh Bhadane, Piotr Indyk, Ziteng Sun

We consider the task of estimating the entropy of $k$-ary distributions from samples in the streaming model, where space is limited.

Learning-Based Low-Rank Approximations

no code implementations NeurIPS 2019 Piotr Indyk, Ali Vakilian, Yang Yuan

Our experiments show that, for multiple types of data sets, a learned sketch matrix can substantially reduce the approximation loss compared to a random matrix $S$, sometimes by one order of magnitude.

Generalization Bounds

Scalable Nearest Neighbor Search for Optimal Transport

1 code implementation ICML 2020 Arturs Backurs, Yihe Dong, Piotr Indyk, Ilya Razenshteyn, Tal Wagner

Our extensive experiments, on real-world text and image datasets, show that Flowtree improves over various baselines and existing methods in either running time or accuracy.

Data Structures and Algorithms

Neural Embeddings for Nearest Neighbor Search Under Edit Distance

no code implementations25 Sep 2019 Xiyuan Zhang, Yang Yuan, Piotr Indyk

The edit distance between two sequences is an important metric with many applications.

Composable Core-sets for Determinant Maximization: A Simple Near-Optimal Algorithm

no code implementations6 Jul 2019 Piotr Indyk, Sepideh Mahabadi, Shayan Oveis Gharan, Alireza Rezaei

In this work, first we provide a theoretical approximation guarantee of $O(C^{k^2})$ for the Greedy algorithm in the context of composable core-sets; Further, we propose to use a Local Search based algorithm that while being still practical, achieves a nearly optimal approximation bound of $O(k)^{2k}$; Finally, we implement all three algorithms and show the effectiveness of our proposed algorithm on standard data sets.

2k Fairness +1

Sample-Optimal Low-Rank Approximation of Distance Matrices

no code implementations2 Jun 2019 Piotr Indyk, Ali Vakilian, Tal Wagner, David Woodruff

Recent work by Bakshi and Woodruff (NeurIPS 2018) showed it is possible to compute a rank-$k$ approximation of a distance matrix in time $O((n+m)^{1+\gamma}) \cdot \mathrm{poly}(k, 1/\epsilon)$, where $\epsilon>0$ is an error parameter and $\gamma>0$ is an arbitrarily small constant.

Handwriting Recognition

Learning-Based Frequency Estimation Algorithms

no code implementations ICLR 2019 Chen-Yu Hsu, Piotr Indyk, Dina Katabi, Ali Vakilian

Estimating the frequencies of elements in a data stream is a fundamental task in data analysis and machine learning.

BIG-bench Machine Learning

Scalable Fair Clustering

1 code implementation10 Feb 2019 Arturs Backurs, Piotr Indyk, Krzysztof Onak, Baruch Schieber, Ali Vakilian, Tal Wagner

In the fair variant of $k$-median, the points are colored, and the goal is to minimize the same average distance objective while ensuring that all clusters have an "approximately equal" number of points of each color.

Clustering Fairness

Learning Space Partitions for Nearest Neighbor Search

1 code implementation ICLR 2020 Yihe Dong, Piotr Indyk, Ilya Razenshteyn, Tal Wagner

Space partitions of $\mathbb{R}^d$ underlie a vast and important class of fast nearest neighbor search (NNS) algorithms.

General Classification graph partitioning +1

Composable Core-sets for Determinant Maximization Problems via Spectral Spanners

no code implementations31 Jul 2018 Piotr Indyk, Sepideh Mahabadi, Shayan Oveis Gharan, Alireza Rezaei

We show that for many objective functions one can use a spectral spanner, independent of the underlying functions, as a core-set and obtain almost optimal composable core-sets.

Approximate Nearest Neighbor Search in High Dimensions

no code implementations26 Jun 2018 Alexandr Andoni, Piotr Indyk, Ilya Razenshteyn

The nearest neighbor problem is defined as follows: Given a set $P$ of $n$ points in some metric space $(X, D)$, build a data structure that, given any point $q$, returns a point in $P$ that is closest to $q$ (its "nearest neighbor" in $P$).

Vocal Bursts Intensity Prediction

On the Fine-Grained Complexity of Empirical Risk Minimization: Kernel Methods and Neural Networks

no code implementations NeurIPS 2017 Arturs Backurs, Piotr Indyk, Ludwig Schmidt

We also give similar hardness results for computing the gradient of the empirical loss, which is the main computational burden in many non-convex learning tasks.

Fast recovery from a union of subspaces

no code implementations NeurIPS 2016 Chinmay Hegde, Piotr Indyk, Ludwig Schmidt

We address the problem of recovering a high-dimensional but structured vector from linear observations in a general setting where the vector can come from an arbitrary union of subspaces.

Compressive Sensing

Practical and Optimal LSH for Angular Distance

1 code implementation NeurIPS 2015 Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya Razenshteyn, Ludwig Schmidt

Our lower bound implies that the above LSH family exhibits a trade-off between evaluation time and quality that is close to optimal for a natural class of LSH functions.

Nearly Optimal Deterministic Algorithm for Sparse Walsh-Hadamard Transform

no code implementations28 Apr 2015 Mahdi Cheraghchi, Piotr Indyk

Moreover, we design a deterministic and non-adaptive $\ell_1/\ell_1$ compressed sensing scheme based on general lossless condensers that is equipped with a fast reconstruction algorithm running in time $k^{1+\alpha} (\log N)^{O(1)}$ (for the GUV-based condenser) and is of independent interest.

Cannot find the paper you are looking for? You can Submit a new open access paper.