no code implementations • 31 Jan 2023 • Jacob Imola, Alessandro Epasto, Mohammad Mahdian, Vincent Cohen-Addad, Vahab Mirrokni
Then, we exhibit a polynomial-time approximation algorithm with $O(|V|^{2. 5}/ \epsilon)$-additive error, and an exponential-time algorithm that meets the lower bound.
no code implementations • 11 Jan 2023 • Hongjie Chen, Vincent Cohen-Addad, Tommaso d'Orsi, Alessandro Epasto, Jacob Imola, David Steurer, Stefan Tiegel
For the latter, we design an $(\epsilon, \delta)$-differentially private algorithm that recovers the centers of the $k$-mixture when the minimum separation is at least $ O(k^{1/t}\sqrt{t})$.
no code implementations • 15 Nov 2022 • Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn, Omar Ali Sheikh-Omar
the Euclidean $k$-median problem) consists of finding $k$ centers such that the sum of squared distances (resp.
1 code implementation • 17 Jun 2022 • Vincent Cohen-Addad, Alessandro Epasto, Silvio Lattanzi, Vahab Mirrokni, Andres Munoz, David Saulpic, Chris Schwiegelshohn, Sergei Vassilvitskii
We study the private $k$-median and $k$-means clustering problem in $d$ dimensional Euclidean space.
no code implementations • 24 May 2022 • Limor Gultchin, Vincent Cohen-Addad, Sophie Giffard-Roisin, Varun Kanade, Frederik Mallmann-Trenn
Among the various aspects of algorithmic fairness studied in recent years, the tension between satisfying both \textit{sufficiency} and \textit{separation} -- e. g. the ratios of positive or negative predictive values, and false positive or false negative rates across groups -- has received much attention.
no code implementations • 11 Apr 2022 • Vincent Cohen-Addad, Hossein Esfandiari, Vahab Mirrokni, Shyam Narayanan
Motivated by data analysis and machine learning applications, we consider the popular high-dimensional Euclidean $k$-median and $k$-means problems.
no code implementations • 2 Mar 2022 • Vincent Cohen-Addad, Chenglin Fan, Silvio Lattanzi, Slobodan Mitrović, Ashkan Norouzi-Fard, Nikos Parotsidis, Jakub Tarnawski
Correlation clustering is a central problem in unsupervised learning, with applications spanning community detection, duplicate detection, automated labelling and many more.
no code implementations • 25 Feb 2022 • Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn
Given a set of points in a metric space, the $(k, z)$-clustering problem consists of finding a set of $k$ points called centers, such that the sum of distances raised to the power of $z$ of every data point to its closest center is minimized.
no code implementations • 6 Dec 2021 • Amir Abboud, Mohammadhossein Bateni, Vincent Cohen-Addad, Karthik C. S., Saeed Seddighin
Moreover, we extend one of our hardness results to rule out subquartic algorithms for the well-studied 1-median problem in the edit metric, where given a set of n strings each of length n, the goal is to find a string in the set that minimizes the sum of the edit distances to the rest of the strings in the set.
no code implementations • NeurIPS 2021 • Vincent Cohen-Addad, David Saulpic, Chris Schwiegelshohn
Special cases of problem include the well-known Fermat-Weber problem -- or geometric median problem -- where $z = 1$, the mean or centroid where $z=2$, and the Minimum Enclosing Ball problem, where $z = \infty$. We consider these problem in the big data regime. Here, we are interested in sampling as few points as possible such that we can accurately estimate $m$. More specifically, we consider sublinear algorithms as well as coresets for these problems. Sublinear algorithms have a random query access to the $A$ and the goal is to minimize the number of queries. Here, we show that $\tilde{O}(\varepsilon^{-z-3})$ samples are sufficient to achieve a $(1+\varepsilon)$ approximation, generalizing the results from Cohen, Lee, Miller, Pachocki, and Sidford [STOC '16] and Inaba, Katoh, and Imai [SoCG '94] to arbitrary $z$.
no code implementations • NeurIPS 2021 • Vincent Cohen-Addad, Silvio Lattanzi, Ashkan Norouzi-Fard, Christian Sohler, Ola Svensson
In this paper we introduce a new parallel algorithm for the Euclidean hierarchical $k$-median problem that, when using machines with memory $s$ (for $s\in \Omega(\log^2 (n+\Delta+d))$), outputs a hierarchical clustering such that for every fixed value of $k$ the cost of the solution is at most an $O(\min\{d, \log n\} \log \Delta)$ factor larger in expectation than that of an optimal solution.
no code implementations • 21 Nov 2021 • Vincent Cohen-Addad, Karthik C. S, Euiwoong Lee
We then show that together with generalizations of the embedding techniques introduced by Cohen-Addad and Karthik (FOCS '19), JCH implies hardness of approximation results for k-median and k-means in $\ell_p$-metrics for factors which are close to the ones obtained for general metrics.
no code implementations • 15 Jun 2021 • Vincent Cohen-Addad, Silvio Lattanzi, Slobodan Mitrović, Ashkan Norouzi-Fard, Nikos Parotsidis, Jakub Tarnawski
Correlation clustering is a central topic in unsupervised learning, with many applications in ML and data mining.
no code implementations • NeurIPS 2020 • Vincent Cohen-Addad, Silvio Lattanzi, Ashkan Norouzi-Fard, Christian Sohler, Ola Svensson
$k$-means++ \cite{arthur2007k} is a widely used clustering algorithm that is easy to implement, has nice theoretical guarantees and strong empirical performance.
no code implementations • NeurIPS 2020 • Vincent Cohen-Addad, Adrian Kosowski, Frederik Mallmann-Trenn, David Saulpic
A classic problem in machine learning and data analysis is to partition the vertices of a network in such a way that vertices in the same set are densely connected and vertices in different sets are loosely connected.
no code implementations • 30 Sep 2020 • Vincent Cohen-Addad, C. S. Karthik, Euiwoong Lee
In practice and historically, k-means is thought of in a continuous setting, namely where the centers can be located anywhere in the metric space.
no code implementations • ICML 2020 • Vincent Cohen-Addad, Karthik C. S., Guillaume Lagarde
In this paper, we provide a new algorithm which takes as input a set of points $P$ in $\mathbb{R}^d$, and for every $c\ge 1$, runs in time $n^{1+\frac{\rho}{c^2}}$ (for some universal constant $\rho>1$) to output an ultrametric $\Delta$ such that for any two points $u, v$ in $P$, we have $\Delta(u, v)$ is within a multiplicative factor of $5c$ to the distance between $u$ and $v$ in the "best" ultrametric representation of $P$.
1 code implementation • NeurIPS 2019 • Vincent Cohen-Addad, Niklas Oskar D. Hjuler, Nikos Parotsidis, David Saulpic, Chris Schwiegelshohn
This improves over the naive algorithm which consists in recomputing a solution at each time step and that can take up to $O(n^2)$ update time, and $O(n^2)$ total recourse.
no code implementations • NeurIPS 2019 • Amir Abboud, Vincent Cohen-Addad, Hussein Houdrouge
We consider the widely-used average-linkage, single-linkage, and Ward's methods for computing hierarchical clusterings of high-dimensional Euclidean inputs.
no code implementations • 15 Sep 2019 • Vincent Cohen-Addad, Benjamin Guedj, Varun Kanade, Guy Rom
The specific formulation we use is the $k$-means objective: At each time step the algorithm has to maintain a set of k candidate centers and the loss incurred is the squared distance between the new point and the closest center.
no code implementations • NeurIPS 2018 • Vincent Cohen-Addad, Varun Kanade, Frederik Mallmann-Trenn
In this work, we take a different approach, based on the observation that the consistency axiom fails to be satisfied when the “correct” number of clusters changes.
no code implementations • 21 Jun 2018 • Vincent Cohen-Addad, Frederik Mallmann-Trenn, Claire Mathieu
In this paper, we show optimal worst-case query complexity for the \textsc{max},\textsc{threshold-$v$} and \textsc{Top}-$k$ problems.
no code implementations • NeurIPS 2017 • Vincent Cohen-Addad, Varun Kanade, Frederik Mallmann-Trenn
Hiererachical clustering, that is computing a recursive partitioning of a dataset to obtain clusters at increasingly finer granularity is a fundamental problem in data analysis.
no code implementations • 7 Apr 2017 • Vincent Cohen-Addad, Varun Kanade, Frederik Mallmann-Trenn, Claire Mathieu
For similarity-based hierarchical clustering, Dasgupta showed that the divisive sparsest-cut approach achieves an $O(\log^{3/2} n)$-approximation.
no code implementations • 29 Jan 2017 • Vincent Cohen-Addad, Chris Schwiegelshohn
We study the classic $k$-median and $k$-means clustering objectives in the beyond-worst-case scenario.
no code implementations • 7 Apr 2016 • Vincent Cohen-Addad, Varun Kanade
We study online optimization of smoothed piecewise constant functions over the domain [0, 1).