Search Results for author: Erich Schubert

Found 19 papers, 15 papers with code

Medoid Silhouette clustering with automatic cluster number selection

2 code implementations • 7 Sep 2023 • Lars Lenssen, Erich Schubert

We discuss the efficient medoid-based variant of the Silhouette, perform a theoretical analysis of its properties, provide two fast versions for the direct optimization, and discuss the use to choose the optimal number of clusters.

Clustering

Paper
Code

Sparse Partitioning Around Medoids

no code implementations • 5 Sep 2023 • Lars Lenssen, Erich Schubert

FastPAM recently introduced a speedup for large k to make it applicable for larger problems, but the method still has a runtime quadratic in N. In this chapter, we discuss a sparse and asymmetric variant of this problem, to be used for example on graph data such as road networks.

Electrical Engineering

Paper
Add Code

Data Aggregation for Hierarchical Clustering

1 code implementation • 5 Sep 2023 • Erich Schubert, Andreas Lang

Hierarchical Agglomerative Clustering (HAC) is likely the earliest and most flexible clustering method, because it can be used with many distances, similarities, and various linkage strategies.

Clustering Vector Quantization (k-means problem)

769

Paper
Code

LOSDD: Leave-Out Support Vector Data Description for Outlier Detection

no code implementations • 27 Dec 2022 • Daniel Boiar, Thomas Liebig, Erich Schubert

Support Vector Machines have been successfully used for one-class classification (OCSVM, SVDD) when trained on clean data, but they work much worse on dirty data: outliers present in the training data tend to become support vectors, and are hence considered "normal".

One-Class Classification Outlier Detection

Paper
Add Code

Stop using the elbow criterion for k-means and how to choose the number of clusters instead

no code implementations • 23 Dec 2022 • Erich Schubert

A major challenge when using k-means clustering often is how to choose the parameter k, the number of clusters.

Clustering

Paper
Add Code

Clustering by Direct Optimization of the Medoid Silhouette

2 code implementations • 26 Sep 2022 • Lars Lenssen, Erich Schubert

One of the versions guarantees equal results to the original variant and provides a run speedup of $O(k^2)$.

Clustering

Paper
Code

On Projections to Linear Subspaces

1 code implementation • 26 Sep 2022 • Erik Thordsen, Erich Schubert

The merit of projecting data onto linear subspaces is well known from, e. g., dimension reduction.

Dimensionality Reduction

Paper
Code

HACAM: Hierarchical Agglomerative Clustering Around Medoids - and its Limitations

1 code implementation • Lernen, Wissen, Daten, Analysen 2021 • Erich Schubert

Unfortunately, we also show that the requirement to produce a hierarchical result is a limiting factor to the cluster quality, as the optimum result for a particular number of clusters 𝑘 does not have to be consistent with the optimum result with 𝑘+1 clusters.

Clustering

769

Paper
Code

MESS: Manifold Embedding Motivated Super Sampling

no code implementations • 14 Jul 2021 • Erik Thordsen, Erich Schubert

Many approaches in the field of machine learning and data analysis rely on the assumption that the observed data lies on lower-dimensional manifolds.

Paper
Add Code

Accelerating Spherical k-Means

1 code implementation • 8 Jul 2021 • Erich Schubert, Andreas Lang, Gloria Feher

Spherical k-means is a widely used clustering algorithm for sparse and high-dimensional data such as document vectors.

Clustering Computational Efficiency

769

Paper
Code

A Triangle Inequality for Cosine Similarity

1 code implementation • 8 Jul 2021 • Erich Schubert

In this paper, we derive a triangle inequality for Cosine similarity that is suitable for efficient similarity search with many standard search structures (such as the VP-tree, Cover-tree, and M-tree); show that this bound is tight and discuss fast approximations for it.

769

Paper
Code

Fast and Eager k-Medoids Clustering: O(k) Runtime Improvement of the PAM, CLARA, and CLARANS Algorithms

3 code implementations • 12 Aug 2020 • Erich Schubert, Peter J. Rousseeuw

While we do not study the parallelization of our approach in this work, it can easily be combined with earlier approaches to use PAM and CLARA on big data (some of which use PAM as a subroutine, hence can immediately benefit from these improvements), where the performance with high k becomes increasingly important.

Clustering

769

Paper
Code

ABID: Angle Based Intrinsic Dimensionality

1 code implementation • 23 Jun 2020 • Erik Thordsen, Erich Schubert

In this paper we introduce an orthogonal concept, which does not use any distances: we use the distribution of angles between neighbor points.

Dimensionality Reduction

769

Paper
Code

BETULA: Numerically Stable CF-Trees for BIRCH Clustering

1 code implementation • 23 Jun 2020 • Andreas Lang, Erich Schubert

We introduce a replacement cluster feature that does not have this numeric problem, that is not much more expensive to maintain, and which makes many computations simpler and hence more efficient.

Clustering Data Compression

769

Paper
Code

ELKI: A large open-source library for data analysis - ELKI Release 0.7.5 "Heidelberg"

1 code implementation • 10 Feb 2019 • Erich Schubert, Arthur Zimek

We will first outline the motivation for this release, the plans for the future, and then give a brief overview over the new functionality in this version.

Benchmarking Clustering +6

769

Paper
Code

Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms

4 code implementations • 12 Oct 2018 • Erich Schubert, Peter J. Rousseeuw

It can easily be combined with earlier approaches to use PAM and CLARA on big data (some of which use PAM as a subroutine, hence can immediately benefit from these improvements), where the performance with high k becomes increasingly important.

Clustering

769

Paper
Code

Improving the Cluster Structure Extracted from OPTICS Plots

3 code implementations • Lernen, Wissen, Daten, Analysen 2018 • Erich Schubert, Michael Gertz

Density-based clustering is closely associated with the two algorithms DBSCAN and OPTICS.

58,024

Paper
Code

Semantic Word Clouds with Background Corpus Normalization and t-distributed Stochastic Neighbor Embedding

1 code implementation • 11 Aug 2017 • Erich Schubert, Andreas Spitz, Michael Weiler, Johanna Geiß, Michael Gertz

We then select keywords based on their significance and construct the word cloud based on the derived affinity.

769

Paper
Code

A Framework for Clustering Uncertain Data

1 code implementation • VLDB 2015 • Erich Schubert, Alexander Koos, Tobias Emrich, Andreas Zufle, Klaus Arthur Schmid, Arthur Zimek

The challenges associated with handling uncertain data, in particular with querying and mining, are finding increasing attention in the research community.

Clustering Outlier Detection

769

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.