Search Results for author: David Saulpic

Found 9 papers, 3 papers with code

Settling Time vs. Accuracy Tradeoffs for Clustering Big Data

1 code implementation2 Apr 2024 Andrew Draganov, David Saulpic, Chris Schwiegelshohn

We study the theoretical and practical runtime limits of k-means and k-median clustering on large datasets.

Clustering

Data-Efficient Learning via Clustering-Based Sensitivity Sampling: Foundation Models and Beyond

no code implementations27 Feb 2024 Kyriakos Axiotis, Vincent Cohen-Addad, Monika Henzinger, Sammy Jerome, Vahab Mirrokni, David Saulpic, David Woodruff, Michael Wunder

We study the data selection problem, whose aim is to select a small representative subset of data that can be used to efficiently train a machine learning model.

Clustering

Differential Privacy for Clustering Under Continual Observation

no code implementations7 Jul 2023 Max Dupré la Tour, Monika Henzinger, David Saulpic

We consider the problem of clustering privately a dataset in $\mathbb{R}^d$ that undergoes both insertion and deletion of points.

Clustering Dimensionality Reduction

Improved Coresets for Euclidean $k$-Means

no code implementations15 Nov 2022 Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn, Omar Ali Sheikh-Omar

the Euclidean $k$-median problem) consists of finding $k$ centers such that the sum of squared distances (resp.

Towards Optimal Lower Bounds for k-median and k-means Coresets

no code implementations25 Feb 2022 Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn

Given a set of points in a metric space, the $(k, z)$-clustering problem consists of finding a set of $k$ points called centers, such that the sum of distances raised to the power of $z$ of every data point to its closest center is minimized.

Clustering

Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces

no code implementations NeurIPS 2021 Vincent Cohen-Addad, David Saulpic, Chris Schwiegelshohn

Special cases of problem include the well-known Fermat-Weber problem -- or geometric median problem -- where $z = 1$, the mean or centroid where $z=2$, and the Minimum Enclosing Ball problem, where $z = \infty$. We consider these problem in the big data regime. Here, we are interested in sampling as few points as possible such that we can accurately estimate $m$. More specifically, we consider sublinear algorithms as well as coresets for these problems. Sublinear algorithms have a random query access to the $A$ and the goal is to minimize the number of queries. Here, we show that $\tilde{O}(\varepsilon^{-z-3})$ samples are sufficient to achieve a $(1+\varepsilon)$ approximation, generalizing the results from Cohen, Lee, Miller, Pachocki, and Sidford [STOC '16] and Inaba, Katoh, and Imai [SoCG '94] to arbitrary $z$.

On the Power of Louvain in the Stochastic Block Model

no code implementations NeurIPS 2020 Vincent Cohen-Addad, Adrian Kosowski, Frederik Mallmann-Trenn, David Saulpic

A classic problem in machine learning and data analysis is to partition the vertices of a network in such a way that vertices in the same set are densely connected and vertices in different sets are loosely connected.

BIG-bench Machine Learning Stochastic Block Model

Fully Dynamic Consistent Facility Location

1 code implementation NeurIPS 2019 Vincent Cohen-Addad, Niklas Oskar D. Hjuler, Nikos Parotsidis, David Saulpic, Chris Schwiegelshohn

This improves over the naive algorithm which consists in recomputing a solution at each time step and that can take up to $O(n^2)$ update time, and $O(n^2)$ total recourse.

Clustering

Cannot find the paper you are looking for? You can Submit a new open access paper.