Search Results for author: Chris Schwiegelshohn

Found 13 papers, 4 papers with code

Settling Time vs. Accuracy Tradeoffs for Clustering Big Data

1 code implementation2 Apr 2024 Andrew Draganov, David Saulpic, Chris Schwiegelshohn

We study the theoretical and practical runtime limits of k-means and k-median clustering on large datasets.

Clustering

Optimal Sketching Bounds for Sparse Linear Regression

no code implementations5 Apr 2023 Tung Mai, Alexander Munteanu, Cameron Musco, Anup B. Rao, Chris Schwiegelshohn, David P. Woodruff

For this problem, under the $\ell_2$ norm, we observe an upper bound of $O(k \log (d)/\varepsilon + k\log(k/\varepsilon)/\varepsilon^2)$ rows, showing that sparse recovery is strictly easier to sketch than sparse regression.

regression

Improved Coresets for Euclidean $k$-Means

no code implementations15 Nov 2022 Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn, Omar Ali Sheikh-Omar

the Euclidean $k$-median problem) consists of finding $k$ centers such that the sum of squared distances (resp.

An Empirical Evaluation of $k$-Means Coresets

1 code implementation3 Jul 2022 Chris Schwiegelshohn, Omar Ali Sheikh-Omar

Using this benchmark and real-world data sets, we conduct an exhaustive evaluation of the most commonly used coreset algorithms from theory and practice.

Clustering

Towards Optimal Lower Bounds for k-median and k-means Coresets

no code implementations25 Feb 2022 Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn

Given a set of points in a metric space, the $(k, z)$-clustering problem consists of finding a set of $k$ points called centers, such that the sum of distances raised to the power of $z$ of every data point to its closest center is minimized.

Clustering

Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces

no code implementations NeurIPS 2021 Vincent Cohen-Addad, David Saulpic, Chris Schwiegelshohn

Special cases of problem include the well-known Fermat-Weber problem -- or geometric median problem -- where $z = 1$, the mean or centroid where $z=2$, and the Minimum Enclosing Ball problem, where $z = \infty$. We consider these problem in the big data regime. Here, we are interested in sampling as few points as possible such that we can accurately estimate $m$. More specifically, we consider sublinear algorithms as well as coresets for these problems. Sublinear algorithms have a random query access to the $A$ and the goal is to minimize the number of queries. Here, we show that $\tilde{O}(\varepsilon^{-z-3})$ samples are sufficient to achieve a $(1+\varepsilon)$ approximation, generalizing the results from Cohen, Lee, Miller, Pachocki, and Sidford [STOC '16] and Inaba, Katoh, and Imai [SoCG '94] to arbitrary $z$.

Algorithms for Fair Team Formation in Online Labour Marketplaces

no code implementations14 Feb 2020 Giorgio Barnabò, Adriano Fazzone, Stefano Leonardi, Chris Schwiegelshohn

In this short paper, we define the Fair Team Formation problem in the following way: given an online labour marketplace where each worker possesses one or more skills, and where all workers are divided into two or more not overlapping classes (for examples, men and women), we want to design an algorithm that is able to find a team with all the skills needed to complete a given task, and that has the same number of people from all classes.

Fairness

Fully Dynamic Consistent Facility Location

1 code implementation NeurIPS 2019 Vincent Cohen-Addad, Niklas Oskar D. Hjuler, Nikos Parotsidis, David Saulpic, Chris Schwiegelshohn

This improves over the naive algorithm which consists in recomputing a solution at each time step and that can take up to $O(n^2)$ update time, and $O(n^2)$ total recourse.

Clustering

Principal Fairness: Removing Bias via Projections

no code implementations31 May 2019 Aris Anagnostopoulos, Luca Becchetti, Adriano Fazzone, Cristina Menghini, Chris Schwiegelshohn

Reducing hidden bias in the data and ensuring fairness in algorithmic data analysis has recently received significant attention.

Clustering Fairness

On Coresets for Logistic Regression

no code implementations NeurIPS 2018 Alexander Munteanu, Chris Schwiegelshohn, Christian Sohler, David P. Woodruff

For data sets with bounded $\mu(X)$-complexity, we show that a novel sensitivity sampling scheme produces the first provably sublinear $(1\pm\varepsilon)$-coreset.

regression

On the Local Structure of Stable Clustering Instances

no code implementations29 Jan 2017 Vincent Cohen-Addad, Chris Schwiegelshohn

We study the classic $k$-median and $k$-means clustering objectives in the beyond-worst-case scenario.

Clustering

Cannot find the paper you are looking for? You can Submit a new open access paper.