Search Results for author: Kasper Green Larsen

Found 24 papers, 3 papers with code

Heavy hitters via cluster-preserving clustering

no code implementations5 Apr 2016 Kasper Green Larsen, Jelani Nelson, Huy L. Nguyen, Mikkel Thorup

Our main innovation is an efficient reduction from the heavy hitters to a clustering problem in which each heavy hitter is encoded as some form of noisy spectral cluster in a much bigger graph, and the goal is to identify every cluster.

Clustering

Predicting Positive and Negative Links with Noisy Queries: Theory & Practice

1 code implementation19 Sep 2017 Charalampos E. Tsourakakis, Michael Mitzenmacher, Kasper Green Larsen, Jarosław Błasiok, Ben Lawson, Preetum Nakkiran, Vasileios Nakos

The {\em edge sign prediction problem} aims to predict whether an interaction between a pair of nodes will be positive or negative.

Clustering

Fully Understanding the Hashing Trick

no code implementations NeurIPS 2018 Casper Benjamin Freksen, Lior Kamma, Kasper Green Larsen

We settle this question by giving tight asymptotic bounds on the exact tradeoff between the central parameters, thus providing a complete understanding of the performance of feature hashing.

Open-Ended Question Answering

Optimal Minimal Margin Maximization with Boosting

no code implementations30 Jan 2019 Allan Grønlund, Kasper Green Larsen, Alexander Mathiasen

A common goal in a long line of research, is to maximize the smallest margin using as few base hypotheses as possible, culminating with the AdaBoostV algorithm by (R{\"a}tsch and Warmuth [JMLR'04]).

Optimal Learning of Joint Alignments with a Faulty Oracle

no code implementations21 Sep 2019 Kasper Green Larsen, Michael Mitzenmacher, Charalampos E. Tsourakakis

The goal is to recover $n$ discrete variables $g_i \in \{0, \ldots, k-1\}$ (up to some global offset) given noisy observations of a set of their pairwise differences $\{(g_i - g_j) \bmod k\}$; specifically, with probability $\frac{1}{k}+\delta$ for some $\delta > 0$ one obtains the correct answer, and with the remaining probability one obtains a uniformly random incorrect answer.

Margins are Insufficient for Explaining Gradient Boosting

no code implementations NeurIPS 2020 Allan Grønlund, Lior Kamma, Kasper Green Larsen

We then explain the short comings of the $k$'th margin bound and prove a stronger and more refined margin-based generalization bound for boosted classifiers that indeed succeeds in explaining the performance of modern gradient boosters.

CountSketches, Feature Hashing and the Median of Three

no code implementations3 Feb 2021 Kasper Green Larsen, Rasmus Pagh, Jakub Tětek

For $t > 1$, the estimator takes the median of $2t-1$ independent estimates, and the probability that the estimate is off by more than $2 \|v\|_2/\sqrt{s}$ is exponentially small in $t$.

Compression Implies Generalization

no code implementations15 Jun 2021 Allan Grønlund, Mikael Høgsgaard, Lior Kamma, Kasper Green Larsen

The framework is simple and powerful enough to extend the generalization bounds by Arora et al. to also hold for the original network.

BIG-bench Machine Learning Generalization Bounds

Optimality of the Johnson-Lindenstrauss Dimensionality Reduction for Practical Measures

no code implementations14 Jul 2021 Yair Bartal, Ora Nova Fandina, Kasper Green Larsen

They provided upper bounds on its quality for a wide range of practical measures and showed that indeed these are best possible in many cases.

Dimensionality Reduction

Towards Optimal Lower Bounds for k-median and k-means Coresets

no code implementations25 Feb 2022 Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn

Given a set of points in a metric space, the $(k, z)$-clustering problem consists of finding a set of $k$ points called centers, such that the sum of distances raised to the power of $z$ of every data point to its closest center is minimized.

Clustering

The Fast Johnson-Lindenstrauss Transform is Even Faster

1 code implementation4 Apr 2022 Ora Nova Fandina, Mikael Møller Høgsgaard, Kasper Green Larsen

In this work, we give a surprising new analysis of the Fast JL transform, showing that the $k \ln^2 n$ term in the embedding time can be improved to $(k \ln^2 n)/\alpha$ for an $\alpha = \Omega(\min\{\varepsilon^{-1}\ln(1/\varepsilon), \ln n\})$.

Dimensionality Reduction

Optimal Weak to Strong Learning

no code implementations3 Jun 2022 Kasper Green Larsen, Martin Ritzert

The classic algorithm AdaBoost allows to convert a weak learner, that is an algorithm that produces a hypothesis which is slightly better than chance, into a strong learner, achieving arbitrarily high accuracy when given enough training data.

Generalization Bounds

Improved Coresets for Euclidean $k$-Means

no code implementations15 Nov 2022 Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn, Omar Ali Sheikh-Omar

the Euclidean $k$-median problem) consists of finding $k$ centers such that the sum of squared distances (resp.

Bagging is an Optimal PAC Learner

no code implementations5 Dec 2022 Kasper Green Larsen

Finally, the seminal work by Hanneke (2016) gave an algorithm with a provably optimal sample complexity.

Learning Theory PAC learning

The Impossibility of Parallelizing Boosting

no code implementations23 Jan 2023 Amin Karbasi, Kasper Green Larsen

The aim of boosting is to convert a sequence of weak learners into a strong learner.

AdaBoost is not an Optimal Weak to Strong Learner

no code implementations27 Jan 2023 Mikael Møller Høgsgaard, Kasper Green Larsen, Martin Ritzert

AdaBoost is a classic boosting algorithm for combining multiple inaccurate classifiers produced by a weak learner, to produce a strong learner with arbitrarily high accuracy when given enough training data.

Boosting, Voting Classifiers and Randomized Sample Compression Schemes

no code implementations5 Feb 2024 Arthur da Cunha, Kasper Green Larsen, Martin Ritzert

At the center of this paradigm lies the concept of building the strong learner as a voting classifier, which outputs a weighted majority vote of the weak learners.

Replicable Learning of Large-Margin Halfspaces

no code implementations21 Feb 2024 Alkis Kalavasis, Amin Karbasi, Kasper Green Larsen, Grigoris Velegkas, Felix Zhou

Departing from the requirement of polynomial time algorithms, using the DP-to-Replicability reduction of Bun, Gaboardi, Hopkins, Impagliazzo, Lei, Pitassi, Sorrell, and Sivakumar [STOC, 2023], we show how to obtain a replicable algorithm for large-margin halfspaces with improved sample complexity with respect to the margin parameter $\tau$, but running time doubly exponential in $1/\tau^2$ and worse sample complexity dependence on $\epsilon$ than one of our previous algorithms.

Cannot find the paper you are looking for? You can Submit a new open access paper.