no code implementations • 11 Jan 2025 • Vincent Cohen-Addad, Andrew Draganov, Matteo Russo, David Saulpic, Chris Schwiegelshohn
We consider coresets for $k$-clustering problems, where the goal is to assign points to centers minimizing powers of distances.
no code implementations • 4 Dec 2024 • Karthik C. S., Euiwoong Lee, Yuval Rabani, Chris Schwiegelshohn, Samson Zhou
In this paper, we give the first hardness-of-approximation result for the $\ell_2^2$ min-sum $k$-clustering problem.
no code implementations • 30 Nov 2024 • Jakob Burkhardt, Hannah Keller, Claudio Orlandi, Chris Schwiegelshohn
We explore the use of distributed differentially private computations across multiple servers, balancing the tradeoff between the error introduced by the differentially private mechanism and the computational efficiency of the resulting distributed algorithm.
1 code implementation • 2 Apr 2024 • Andrew Draganov, David Saulpic, Chris Schwiegelshohn
We study the theoretical and practical runtime limits of k-means and k-median clustering on large datasets.
no code implementations • 5 Apr 2023 • Tung Mai, Alexander Munteanu, Cameron Musco, Anup B. Rao, Chris Schwiegelshohn, David P. Woodruff
For this problem, under the $\ell_2$ norm, we observe an upper bound of $O(k \log (d)/\varepsilon + k\log(k/\varepsilon)/\varepsilon^2)$ rows, showing that sparse recovery is strictly easier to sketch than sparse regression.
no code implementations • 13 Feb 2023 • Mikael Møller Høgsgaard, Lion Kamma, Kasper Green Larsen, Jelani Nelson, Chris Schwiegelshohn
In this work, we revisit sparse embeddings and identify a loophole in the lower bound.
no code implementations • 15 Nov 2022 • Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn, Omar Ali Sheikh-Omar
the Euclidean $k$-median problem) consists of finding $k$ centers such that the sum of squared distances (resp.
1 code implementation • 3 Jul 2022 • Chris Schwiegelshohn, Omar Ali Sheikh-Omar
Using this benchmark and real-world data sets, we conduct an exhaustive evaluation of the most commonly used coreset algorithms from theory and practice.
1 code implementation • 17 Jun 2022 • Vincent Cohen-Addad, Alessandro Epasto, Silvio Lattanzi, Vahab Mirrokni, Andres Munoz, David Saulpic, Chris Schwiegelshohn, Sergei Vassilvitskii
We study the private $k$-median and $k$-means clustering problem in $d$ dimensional Euclidean space.
no code implementations • 25 Feb 2022 • Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn
Given a set of points in a metric space, the $(k, z)$-clustering problem consists of finding a set of $k$ points called centers, such that the sum of distances raised to the power of $z$ of every data point to its closest center is minimized.
no code implementations • NeurIPS 2021 • Vincent Cohen-Addad, David Saulpic, Chris Schwiegelshohn
Special cases of problem include the well-known Fermat-Weber problem -- or geometric median problem -- where $z = 1$, the mean or centroid where $z=2$, and the Minimum Enclosing Ball problem, where $z = \infty$. We consider these problem in the big data regime. Here, we are interested in sampling as few points as possible such that we can accurately estimate $m$. More specifically, we consider sublinear algorithms as well as coresets for these problems. Sublinear algorithms have a random query access to the $A$ and the goal is to minimize the number of queries. Here, we show that $\tilde{O}(\varepsilon^{-z-3})$ samples are sufficient to achieve a $(1+\varepsilon)$ approximation, generalizing the results from Cohen, Lee, Miller, Pachocki, and Sidford [STOC '16] and Inaba, Katoh, and Imai [SoCG '94] to arbitrary $z$.
no code implementations • 14 Feb 2020 • Giorgio Barnabò, Adriano Fazzone, Stefano Leonardi, Chris Schwiegelshohn
In this short paper, we define the Fair Team Formation problem in the following way: given an online labour marketplace where each worker possesses one or more skills, and where all workers are divided into two or more not overlapping classes (for examples, men and women), we want to design an algorithm that is able to find a team with all the skills needed to complete a given task, and that has the same number of people from all classes.
1 code implementation • NeurIPS 2019 • Vincent Cohen-Addad, Niklas Oskar D. Hjuler, Nikos Parotsidis, David Saulpic, Chris Schwiegelshohn
This improves over the naive algorithm which consists in recomputing a solution at each time step and that can take up to $O(n^2)$ update time, and $O(n^2)$ total recourse.
no code implementations • 31 May 2019 • Aris Anagnostopoulos, Luca Becchetti, Adriano Fazzone, Cristina Menghini, Chris Schwiegelshohn
Reducing hidden bias in the data and ensuring fairness in algorithmic data analysis has recently received significant attention.
no code implementations • NeurIPS 2018 • Alexander Munteanu, Chris Schwiegelshohn, Christian Sohler, David P. Woodruff
For data sets with bounded $\mu(X)$-complexity, we show that a novel sensitivity sampling scheme produces the first provably sublinear $(1\pm\varepsilon)$-coreset.
no code implementations • 29 Jan 2017 • Vincent Cohen-Addad, Chris Schwiegelshohn
We study the classic $k$-median and $k$-means clustering objectives in the beyond-worst-case scenario.