Clustering
2809 papers with code • 0 benchmarks • 5 datasets
Clustering is the task of grouping unlabeled data point into disjoint subsets. Each data point is labeled with a single class. The number of classes is not known a priori. The grouping criteria is typically based on the similarity of data points to each other.
Benchmarks
These leaderboards are used to track progress in Clustering
Libraries
Use these libraries to find Clustering models and implementationsMost implemented papers
FaceNet: A Unified Embedding for Face Recognition and Clustering
On the widely used Labeled Faces in the Wild (LFW) dataset, our system achieves a new record accuracy of 99. 63%.
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10, 000 sentences requires about 50 million inference computations (~65 hours) with BERT.
Adversarial Autoencoders
In this paper, we propose the "adversarial autoencoder" (AAE), which is a probabilistic autoencoder that uses the recently proposed generative adversarial networks (GAN) to perform variational inference by matching the aggregated posterior of the hidden code vector of the autoencoder with an arbitrary prior distribution.
XGBoost: A Scalable Tree Boosting System
In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges.
SOLO: Segmenting Objects by Locations
We present a new, embarrassingly simple approach to instance segmentation in images.
Unsupervised Deep Embedding for Clustering Analysis
Clustering is central to many data-driven application domains and has been studied extensively in terms of distance functions and grouping algorithms.
Deep Speaker: an End-to-End Neural Speaker Embedding System
We present Deep Speaker, a neural speaker embedding system that maps utterances to a hypersphere where speaker similarity is measured by cosine similarity.
Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering
To recover the `clustering-friendly' latent representations and to better cluster the data, we propose a joint DR and K-means clustering approach in which DR is accomplished via learning a deep neural network (DNN).
Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering
In this paper, we propose Variational Deep Embedding (VaDE), a novel unsupervised generative clustering approach within the framework of Variational Auto-Encoder (VAE).
CatBoost: unbiased boosting with categorical features
This paper presents the key algorithmic techniques behind CatBoost, a new gradient boosting toolkit.