no code implementations • 29 Dec 2022 • Jakub Łącki, Vahab Mirrokni, Christian Sohler
We study the problem of graph clustering under a broad class of objectives in which the quality of a cluster is defined based on the ratio between the number of edges in the cluster, and the total weight of vertices in the cluster.
no code implementations • NeurIPS 2021 • Vincent Cohen-Addad, Silvio Lattanzi, Ashkan Norouzi-Fard, Christian Sohler, Ola Svensson
In this paper we introduce a new parallel algorithm for the Euclidean hierarchical $k$-median problem that, when using machines with memory $s$ (for $s\in \Omega(\log^2 (n+\Delta+d))$), outputs a hierarchical clustering such that for every fixed value of $k$ the cost of the solution is at most an $O(\min\{d, \log n\} \log \Delta)$ factor larger in expectation than that of an optimal solution.
no code implementations • 14 Jan 2021 • Grzegorz Gluch, Michael Kapralov, Silvio Lattanzi, Aida Mousavifar, Christian Sohler
The main technical contribution is a sublinear time oracle that provides dot product access to the spectral embedding of $G$ by estimating distributions of short random walks from vertices in $G$.
Data Structures and Algorithms
no code implementations • NeurIPS 2020 • Vincent Cohen-Addad, Silvio Lattanzi, Ashkan Norouzi-Fard, Christian Sohler, Ola Svensson
$k$-means++ \cite{arthur2007k} is a widely used clustering algorithm that is easy to implement, has nice theoretical guarantees and strong empirical performance.
no code implementations • NeurIPS 2018 • Alexander Munteanu, Chris Schwiegelshohn, Christian Sohler, David P. Woodruff
For data sets with bounded $\mu(X)$-complexity, we show that a novel sensitivity sampling scheme produces the first provably sublinear $(1\pm\varepsilon)$-coreset.
no code implementations • 26 Feb 2016 • Johannes Blömer, Christiane Lammersen, Melanie Schmidt, Christian Sohler
The $k$-means algorithm is one of the most widely used clustering heuristics.
no code implementations • 16 Dec 2010 • Marcel R. Ackermann, Johannes Blömer, Daniel Kuntze, Christian Sohler
Assuming that the dimension $d$ is a constant, we show that for any $k$ the solution computed by this algorithm is an $O(\log k)$-approximation to the diameter $k$-clustering problem.