Search Results for author: David Saulpic

Found 9 papers, 3 papers with code

Settling Time vs. Accuracy Tradeoffs for Clustering Big Data

1 code implementation • 2 Apr 2024 • Andrew Draganov, David Saulpic, Chris Schwiegelshohn

We study the theoretical and practical runtime limits of k-means and k-median clustering on large datasets.

Paper
Code

Data-Efficient Learning via Clustering-Based Sensitivity Sampling: Foundation Models and Beyond

no code implementations • 27 Feb 2024 • Kyriakos Axiotis, Vincent Cohen-Addad, Monika Henzinger, Sammy Jerome, Vahab Mirrokni, David Saulpic, David Woodruff, Michael Wunder

We study the data selection problem, whose aim is to select a small representative subset of data that can be used to efficiently train a machine learning model.

Clustering

Paper
Add Code

Differential Privacy for Clustering Under Continual Observation

no code implementations • 7 Jul 2023 • Max Dupré la Tour, Monika Henzinger, David Saulpic

We consider the problem of clustering privately a dataset in $\mathbb{R}^d$ that undergoes both insertion and deletion of points.

Clustering Dimensionality Reduction

Paper
Add Code

Improved Coresets for Euclidean $k$-Means

no code implementations • 15 Nov 2022 • Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn, Omar Ali Sheikh-Omar

the Euclidean $k$-median problem) consists of finding $k$ centers such that the sum of squared distances (resp.

Paper
Add Code

Scalable Differentially Private Clustering via Hierarchically Separated Trees

1 code implementation • 17 Jun 2022 • Vincent Cohen-Addad, Alessandro Epasto, Silvio Lattanzi, Vahab Mirrokni, Andres Munoz, David Saulpic, Chris Schwiegelshohn, Sergei Vassilvitskii

We study the private $k$-median and $k$-means clustering problem in $d$ dimensional Euclidean space.

Clustering Dimensionality Reduction +1

32,808

Paper
Code

Towards Optimal Lower Bounds for k-median and k-means Coresets

no code implementations • 25 Feb 2022 • Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn

Given a set of points in a metric space, the $(k, z)$-clustering problem consists of finding a set of $k$ points called centers, such that the sum of distances raised to the power of $z$ of every data point to its closest center is minimized.

Clustering

Paper
Add Code

Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces

no code implementations • NeurIPS 2021 • Vincent Cohen-Addad, David Saulpic, Chris Schwiegelshohn

Special cases of problem include the well-known Fermat-Weber problem -- or geometric median problem -- where $z = 1$, the mean or centroid where $z=2$, and the Minimum Enclosing Ball problem, where $z = \infty$. We consider these problem in the big data regime. Here, we are interested in sampling as few points as possible such that we can accurately estimate $m$. More specifically, we consider sublinear algorithms as well as coresets for these problems. Sublinear algorithms have a random query access to the $A$ and the goal is to minimize the number of queries. Here, we show that $\tilde{O}(\varepsilon^{-z-3})$ samples are sufficient to achieve a $(1+\varepsilon)$ approximation, generalizing the results from Cohen, Lee, Miller, Pachocki, and Sidford [STOC '16] and Inaba, Katoh, and Imai [SoCG '94] to arbitrary $z$.

Paper
Add Code

On the Power of Louvain in the Stochastic Block Model

no code implementations • NeurIPS 2020 • Vincent Cohen-Addad, Adrian Kosowski, Frederik Mallmann-Trenn, David Saulpic

A classic problem in machine learning and data analysis is to partition the vertices of a network in such a way that vertices in the same set are densely connected and vertices in different sets are loosely connected.

BIG-bench Machine Learning Stochastic Block Model

Paper
Add Code

Fully Dynamic Consistent Facility Location

1 code implementation • NeurIPS 2019 • Vincent Cohen-Addad, Niklas Oskar D. Hjuler, Nikos Parotsidis, David Saulpic, Chris Schwiegelshohn

This improves over the naive algorithm which consists in recomputing a solution at each time step and that can take up to $O(n^2)$ update time, and $O(n^2)$ total recourse.

Clustering

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.