Search Results for author: Erich Schubert

Found 19 papers, 15 papers with code

Medoid Silhouette clustering with automatic cluster number selection

2 code implementations7 Sep 2023 Lars Lenssen, Erich Schubert

We discuss the efficient medoid-based variant of the Silhouette, perform a theoretical analysis of its properties, provide two fast versions for the direct optimization, and discuss the use to choose the optimal number of clusters.

Clustering

Sparse Partitioning Around Medoids

no code implementations5 Sep 2023 Lars Lenssen, Erich Schubert

FastPAM recently introduced a speedup for large k to make it applicable for larger problems, but the method still has a runtime quadratic in N. In this chapter, we discuss a sparse and asymmetric variant of this problem, to be used for example on graph data such as road networks.

Electrical Engineering

Data Aggregation for Hierarchical Clustering

1 code implementation5 Sep 2023 Erich Schubert, Andreas Lang

Hierarchical Agglomerative Clustering (HAC) is likely the earliest and most flexible clustering method, because it can be used with many distances, similarities, and various linkage strategies.

Clustering Vector Quantization (k-means problem)

LOSDD: Leave-Out Support Vector Data Description for Outlier Detection

no code implementations27 Dec 2022 Daniel Boiar, Thomas Liebig, Erich Schubert

Support Vector Machines have been successfully used for one-class classification (OCSVM, SVDD) when trained on clean data, but they work much worse on dirty data: outliers present in the training data tend to become support vectors, and are hence considered "normal".

One-Class Classification Outlier Detection

Stop using the elbow criterion for k-means and how to choose the number of clusters instead

no code implementations23 Dec 2022 Erich Schubert

A major challenge when using k-means clustering often is how to choose the parameter k, the number of clusters.

Clustering

Clustering by Direct Optimization of the Medoid Silhouette

2 code implementations26 Sep 2022 Lars Lenssen, Erich Schubert

One of the versions guarantees equal results to the original variant and provides a run speedup of $O(k^2)$.

Clustering

On Projections to Linear Subspaces

1 code implementation26 Sep 2022 Erik Thordsen, Erich Schubert

The merit of projecting data onto linear subspaces is well known from, e. g., dimension reduction.

Dimensionality Reduction

HACAM: Hierarchical Agglomerative Clustering Around Medoids - and its Limitations

1 code implementation Lernen, Wissen, Daten, Analysen 2021 Erich Schubert

Unfortunately, we also show that the requirement to produce a hierarchical result is a limiting factor to the cluster quality, as the optimum result for a particular number of clusters 𝑘 does not have to be consistent with the optimum result with 𝑘+1 clusters.

Clustering

MESS: Manifold Embedding Motivated Super Sampling

no code implementations14 Jul 2021 Erik Thordsen, Erich Schubert

Many approaches in the field of machine learning and data analysis rely on the assumption that the observed data lies on lower-dimensional manifolds.

Accelerating Spherical k-Means

1 code implementation8 Jul 2021 Erich Schubert, Andreas Lang, Gloria Feher

Spherical k-means is a widely used clustering algorithm for sparse and high-dimensional data such as document vectors.

Clustering Computational Efficiency

A Triangle Inequality for Cosine Similarity

1 code implementation8 Jul 2021 Erich Schubert

In this paper, we derive a triangle inequality for Cosine similarity that is suitable for efficient similarity search with many standard search structures (such as the VP-tree, Cover-tree, and M-tree); show that this bound is tight and discuss fast approximations for it.

Fast and Eager k-Medoids Clustering: O(k) Runtime Improvement of the PAM, CLARA, and CLARANS Algorithms

3 code implementations12 Aug 2020 Erich Schubert, Peter J. Rousseeuw

While we do not study the parallelization of our approach in this work, it can easily be combined with earlier approaches to use PAM and CLARA on big data (some of which use PAM as a subroutine, hence can immediately benefit from these improvements), where the performance with high k becomes increasingly important.

Clustering

ABID: Angle Based Intrinsic Dimensionality

1 code implementation23 Jun 2020 Erik Thordsen, Erich Schubert

In this paper we introduce an orthogonal concept, which does not use any distances: we use the distribution of angles between neighbor points.

Dimensionality Reduction

BETULA: Numerically Stable CF-Trees for BIRCH Clustering

1 code implementation23 Jun 2020 Andreas Lang, Erich Schubert

We introduce a replacement cluster feature that does not have this numeric problem, that is not much more expensive to maintain, and which makes many computations simpler and hence more efficient.

Clustering Data Compression

ELKI: A large open-source library for data analysis - ELKI Release 0.7.5 "Heidelberg"

1 code implementation10 Feb 2019 Erich Schubert, Arthur Zimek

We will first outline the motivation for this release, the plans for the future, and then give a brief overview over the new functionality in this version.

Benchmarking Clustering +6

Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms

4 code implementations12 Oct 2018 Erich Schubert, Peter J. Rousseeuw

It can easily be combined with earlier approaches to use PAM and CLARA on big data (some of which use PAM as a subroutine, hence can immediately benefit from these improvements), where the performance with high k becomes increasingly important.

Clustering

Improving the Cluster Structure Extracted from OPTICS Plots

3 code implementations Lernen, Wissen, Daten, Analysen 2018 Erich Schubert, Michael Gertz

Density-based clustering is closely associated with the two algorithms DBSCAN and OPTICS.

Semantic Word Clouds with Background Corpus Normalization and t-distributed Stochastic Neighbor Embedding

1 code implementation11 Aug 2017 Erich Schubert, Andreas Spitz, Michael Weiler, Johanna Geiß, Michael Gertz

We then select keywords based on their significance and construct the word cloud based on the derived affinity.

A Framework for Clustering Uncertain Data

1 code implementation VLDB 2015 Erich Schubert, Alexander Koos, Tobias Emrich, Andreas Zufle, Klaus Arthur Schmid, Arthur Zimek

The challenges associated with handling uncertain data, in particular with querying and mining, are finding increasing attention in the research community.

Clustering Outlier Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.