Text Clustering

32 papers with code • 3 benchmarks • 5 datasets

Grouping a set of texts in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). (Source: Adapted from Wikipedia)

More Discriminative Sentence Embeddings via Semantic Graph Smoothing

chakib401/smoothing_sentence_embeddings 20 Feb 2024

This paper explores an empirical approach to learn more discriminantive sentence representations in an unsupervised fashion.

0
20 Feb 2024

Elastic deep autoencoder for text embedding clustering by an improved graph regularization

safinal/text-embedding-clustering Expert System with application journal 2023

In this jointly end-to-end deep learning model, better representation and text clustering results are achieved with high accuracy on common datasets compared to existing methods.

2
23 Sep 2023

Large Language Models Enable Few-Shot Clustering

viswavi/few-shot-clustering 2 Jul 2023

In this paper, we ask whether a large language model can amplify an expert's guidance to enable query-efficient, few-shot semi-supervised text clustering.

41
02 Jul 2023

ClusterLLM: Large Language Models as a Guide for Text Clustering

zhang-yu-wei/clusterllm 24 May 2023

First, we prompt ChatGPT for insights on clustering perspective by constructing hard triplet questions <does A better correspond to B than C>, where A, B and C are similar data points that belong to different clusters according to small embedder.

32
24 May 2023

Robust Representation Learning with Reliable Pseudo-labels Generation via Self-Adaptive Optimal Transport for Short Text Clustering

hmllmh/rstc 23 May 2023

To tackle the above issues, we propose a Robust Short Text Clustering (RSTC) model to improve robustness against imbalanced and noisy data.

7
23 May 2023

Influence of various text embeddings on clustering performance in NLP

simpleparadox/cmput_697_project 4 May 2023

For example, a three star rating (out of five) may be incongruous with the review text, which may be more suitable for a five star review.

1
04 May 2023

DeepLens: Interactive Out-of-distribution Data Detection in NLP Models

momentum-lab-workspace/deeplens 2 Mar 2023

In this work, we propose DeepLens, an interactive system that helps users detect and explore OOD issues in massive text corpora.

8
02 Mar 2023

Very Large Language Model as a Unified Methodology of Text Mining

jonjoncardoso/jonjoncardoso 19 Dec 2022

Text data mining is the process of deriving essential information from language text.

0
19 Dec 2022

MTEB: Massive Text Embedding Benchmark

embeddings-benchmark/mteb 13 Oct 2022

MTEB spans 8 embedding tasks covering a total of 58 datasets and 112 languages.

1,377
13 Oct 2022

Training Effective Neural Sentence Encoders from Automatically Mined Paraphrases

sdadas/polish-sentence-evaluation 26 Jul 2022

Our sentence encoder can be trained in less than a day on a single graphics card, achieving high performance on a diverse set of sentence-level tasks.

20
26 Jul 2022