Search Results for author: Vincent Cohen-Addad

Found 29 papers, 4 papers with code

Data-Efficient Learning via Clustering-Based Sensitivity Sampling: Foundation Models and Beyond

no code implementations27 Feb 2024 Kyriakos Axiotis, Vincent Cohen-Addad, Monika Henzinger, Sammy Jerome, Vahab Mirrokni, David Saulpic, David Woodruff, Michael Wunder

We study the data selection problem, whose aim is to select a small representative subset of data that can be used to efficiently train a machine learning model.

Clustering

A Scalable Algorithm for Individually Fair K-means Clustering

1 code implementation9 Feb 2024 Mohammadhossein Bateni, Vincent Cohen-Addad, Alessandro Epasto, Silvio Lattanzi

We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.

Clustering

A quasi-polynomial time algorithm for Multi-Dimensional Scaling via LP hierarchies

no code implementations29 Nov 2023 Ainesh Bakshi, Vincent Cohen-Addad, Samuel B. Hopkins, Rajesh Jayaram, Silvio Lattanzi

Multi-dimensional Scaling (MDS) is a family of methods for embedding pair-wise dissimilarities between $n$ objects into low-dimensional space.

Data Visualization

Multi-Swap $k$-Means++

no code implementations28 Sep 2023 Lorenzo Beretta, Vincent Cohen-Addad, Silvio Lattanzi, Nikos Parotsidis

The $k$-means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is often the practitioners' choice algorithm for optimizing the popular $k$-means clustering objective and is known to give an $O(\log k)$-approximation in expectation.

Clustering

Differentially-Private Hierarchical Clustering with Provable Approximation Guarantees

1 code implementation31 Jan 2023 Jacob Imola, Alessandro Epasto, Mohammad Mahdian, Vincent Cohen-Addad, Vahab Mirrokni

Then, we exhibit a polynomial-time approximation algorithm with $O(|V|^{2. 5}/ \epsilon)$-additive error, and an exponential-time algorithm that meets the lower bound.

Clustering Stochastic Block Model

Improved Coresets for Euclidean $k$-Means

no code implementations15 Nov 2022 Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn, Omar Ali Sheikh-Omar

the Euclidean $k$-median problem) consists of finding $k$ centers such that the sum of squared distances (resp.

Beyond Impossibility: Balancing Sufficiency, Separation and Accuracy

no code implementations24 May 2022 Limor Gultchin, Vincent Cohen-Addad, Sophie Giffard-Roisin, Varun Kanade, Frederik Mallmann-Trenn

Among the various aspects of algorithmic fairness studied in recent years, the tension between satisfying both \textit{sufficiency} and \textit{separation} -- e. g. the ratios of positive or negative predictive values, and false positive or false negative rates across groups -- has received much attention.

Fairness

Improved Approximations for Euclidean $k$-means and $k$-median, via Nested Quasi-Independent Sets

no code implementations11 Apr 2022 Vincent Cohen-Addad, Hossein Esfandiari, Vahab Mirrokni, Shyam Narayanan

Motivated by data analysis and machine learning applications, we consider the popular high-dimensional Euclidean $k$-median and $k$-means problems.

Near-Optimal Correlation Clustering with Privacy

no code implementations2 Mar 2022 Vincent Cohen-Addad, Chenglin Fan, Silvio Lattanzi, Slobodan Mitrović, Ashkan Norouzi-Fard, Nikos Parotsidis, Jakub Tarnawski

Correlation clustering is a central problem in unsupervised learning, with applications spanning community detection, duplicate detection, automated labelling and many more.

Clustering Community Detection

Towards Optimal Lower Bounds for k-median and k-means Coresets

no code implementations25 Feb 2022 Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn

Given a set of points in a metric space, the $(k, z)$-clustering problem consists of finding a set of $k$ points called centers, such that the sum of distances raised to the power of $z$ of every data point to its closest center is minimized.

Clustering

On Complexity of 1-Center in Various Metrics

no code implementations6 Dec 2021 Amir Abboud, Mohammad Hossein Bateni, Vincent Cohen-Addad, Karthik C. S., Saeed Seddighin

Moreover, we extend one of our hardness results to rule out subquartic algorithms for the well-studied 1-median problem in the edit metric, where given a set of $n$ strings each of length $n$, the goal is to find a string in the set that minimizes the sum of the edit distances to the rest of the strings in the set.

Parallel and Efficient Hierarchical k-Median Clustering

no code implementations NeurIPS 2021 Vincent Cohen-Addad, Silvio Lattanzi, Ashkan Norouzi-Fard, Christian Sohler, Ola Svensson

In this paper we introduce a new parallel algorithm for the Euclidean hierarchical $k$-median problem that, when using machines with memory $s$ (for $s\in \Omega(\log^2 (n+\Delta+d))$), outputs a hierarchical clustering such that for every fixed value of $k$ the cost of the solution is at most an $O(\min\{d, \log n\} \log \Delta)$ factor larger in expectation than that of an optimal solution.

Clustering

Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces

no code implementations NeurIPS 2021 Vincent Cohen-Addad, David Saulpic, Chris Schwiegelshohn

Special cases of problem include the well-known Fermat-Weber problem -- or geometric median problem -- where $z = 1$, the mean or centroid where $z=2$, and the Minimum Enclosing Ball problem, where $z = \infty$. We consider these problem in the big data regime. Here, we are interested in sampling as few points as possible such that we can accurately estimate $m$. More specifically, we consider sublinear algorithms as well as coresets for these problems. Sublinear algorithms have a random query access to the $A$ and the goal is to minimize the number of queries. Here, we show that $\tilde{O}(\varepsilon^{-z-3})$ samples are sufficient to achieve a $(1+\varepsilon)$ approximation, generalizing the results from Cohen, Lee, Miller, Pachocki, and Sidford [STOC '16] and Inaba, Katoh, and Imai [SoCG '94] to arbitrary $z$.

Johnson Coverage Hypothesis: Inapproximability of k-means and k-median in L_p metrics

no code implementations21 Nov 2021 Vincent Cohen-Addad, Karthik C. S, Euiwoong Lee

We then show that together with generalizations of the embedding techniques introduced by Cohen-Addad and Karthik (FOCS '19), JCH implies hardness of approximation results for k-median and k-means in $\ell_p$-metrics for factors which are close to the ones obtained for general metrics.

Correlation Clustering in Constant Many Parallel Rounds

no code implementations15 Jun 2021 Vincent Cohen-Addad, Silvio Lattanzi, Slobodan Mitrović, Ashkan Norouzi-Fard, Nikos Parotsidis, Jakub Tarnawski

Correlation clustering is a central topic in unsupervised learning, with many applications in ML and data mining.

Clustering

Fast and Accurate $k$-means++ via Rejection Sampling

no code implementations NeurIPS 2020 Vincent Cohen-Addad, Silvio Lattanzi, Ashkan Norouzi-Fard, Christian Sohler, Ola Svensson

$k$-means++ \cite{arthur2007k} is a widely used clustering algorithm that is easy to implement, has nice theoretical guarantees and strong empirical performance.

Clustering

On the Power of Louvain in the Stochastic Block Model

no code implementations NeurIPS 2020 Vincent Cohen-Addad, Adrian Kosowski, Frederik Mallmann-Trenn, David Saulpic

A classic problem in machine learning and data analysis is to partition the vertices of a network in such a way that vertices in the same set are densely connected and vertices in different sets are loosely connected.

BIG-bench Machine Learning Stochastic Block Model

On Approximability of Clustering Problems Without Candidate Centers

no code implementations30 Sep 2020 Vincent Cohen-Addad, C. S. Karthik, Euiwoong Lee

In practice and historically, k-means is thought of in a continuous setting, namely where the centers can be located anywhere in the metric space.

Clustering

On Efficient Low Distortion Ultrametric Embedding

no code implementations ICML 2020 Vincent Cohen-Addad, Karthik C. S., Guillaume Lagarde

In this paper, we provide a new algorithm which takes as input a set of points $P$ in $\mathbb{R}^d$, and for every $c\ge 1$, runs in time $n^{1+\frac{\rho}{c^2}}$ (for some universal constant $\rho>1$) to output an ultrametric $\Delta$ such that for any two points $u, v$ in $P$, we have $\Delta(u, v)$ is within a multiplicative factor of $5c$ to the distance between $u$ and $v$ in the "best" ultrametric representation of $P$.

Fully Dynamic Consistent Facility Location

1 code implementation NeurIPS 2019 Vincent Cohen-Addad, Niklas Oskar D. Hjuler, Nikos Parotsidis, David Saulpic, Chris Schwiegelshohn

This improves over the naive algorithm which consists in recomputing a solution at each time step and that can take up to $O(n^2)$ update time, and $O(n^2)$ total recourse.

Clustering

Subquadratic High-Dimensional Hierarchical Clustering

no code implementations NeurIPS 2019 Amir Abboud, Vincent Cohen-Addad, Hussein Houdrouge

We consider the widely-used average-linkage, single-linkage, and Ward's methods for computing hierarchical clusterings of high-dimensional Euclidean inputs.

Clustering Vocal Bursts Intensity Prediction

Online k-means Clustering

no code implementations15 Sep 2019 Vincent Cohen-Addad, Benjamin Guedj, Varun Kanade, Guy Rom

The specific formulation we use is the $k$-means objective: At each time step the algorithm has to maintain a set of k candidate centers and the loss incurred is the squared distance between the new point and the closest center.

Clustering Online Clustering

Clustering Redemption–Beyond the Impossibility of Kleinberg’s Axioms

no code implementations NeurIPS 2018 Vincent Cohen-Addad, Varun Kanade, Frederik Mallmann-Trenn

In this work, we take a different approach, based on the observation that the consistency axiom fails to be satisfied when the “correct” number of clusters changes.

Clustering

Instance-Optimality in the Noisy Value-and Comparison-Model --- Accept, Accept, Strong Accept: Which Papers get in?

no code implementations21 Jun 2018 Vincent Cohen-Addad, Frederik Mallmann-Trenn, Claire Mathieu

In this paper, we show optimal worst-case query complexity for the \textsc{max},\textsc{threshold-$v$} and \textsc{Top}-$k$ problems.

Recommendation Systems

Hierarchical Clustering Beyond the Worst-Case

no code implementations NeurIPS 2017 Vincent Cohen-Addad, Varun Kanade, Frederik Mallmann-Trenn

Hiererachical clustering, that is computing a recursive partitioning of a dataset to obtain clusters at increasingly finer granularity is a fundamental problem in data analysis.

Clustering General Classification +1

Hierarchical Clustering: Objective Functions and Algorithms

no code implementations7 Apr 2017 Vincent Cohen-Addad, Varun Kanade, Frederik Mallmann-Trenn, Claire Mathieu

For similarity-based hierarchical clustering, Dasgupta showed that the divisive sparsest-cut approach achieves an $O(\log^{3/2} n)$-approximation.

Clustering Combinatorial Optimization +1

On the Local Structure of Stable Clustering Instances

no code implementations29 Jan 2017 Vincent Cohen-Addad, Chris Schwiegelshohn

We study the classic $k$-median and $k$-means clustering objectives in the beyond-worst-case scenario.

Clustering

Online Optimization of Smoothed Piecewise Constant Functions

no code implementations7 Apr 2016 Vincent Cohen-Addad, Varun Kanade

We study online optimization of smoothed piecewise constant functions over the domain [0, 1).

Cannot find the paper you are looking for? You can Submit a new open access paper.