Topic Models

210 papers with code • 6 benchmarks • 12 datasets

A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Topic modeling is a frequently used text-mining tool for the discovery of hidden semantic structures in a text body.

Benchmarks

Add a Result

These leaderboards are used to track progress in Topic Models

Dataset	Best Model	Compare
AG News	DeTiME	See all
20NewsGroups	vONTSS	See all
20 Newsgroups	Bayesian SMM	See all
Arxiv HEP-TH citation graph	JoSH	See all
NYT	JoSH	See all
AgNews	vONTSS	See all

Libraries

Use these libraries to find Topic Models models and implementations

mind-Lab/octis

3 papers

683

YongfeiYan/Neural-Document-Modeling

3 papers

ahoho/topics

3 papers

d2klab/tomodapi

3 papers

See all 5 libraries.

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

Topic Modeling in Embedding Spaces

adjidieng/ETM • • TACL 2020

To this end, we develop the Embedded Topic Model (ETM), a generative model of documents that marries traditional topic models with word embeddings.

Paper
Code

Neural Variational Inference for Text Processing

carpedm20/variational-text-tensorflow • • 19 Nov 2015

We validate this framework on two very different text modelling applications, generative document modelling and supervised question answering.

Paper
Code

Autoencoding Variational Inference For Topic Models

mind-Lab/octis • • 4 Mar 2017

A promising approach to address this problem is autoencoding variational Bayes (AEVB), but it has proven diffi- cult to apply to topic models in practice.

Paper
Code

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec

cemoody/lda2vec • 6 May 2016

Distributed dense word vectors have been shown to be effective at capturing token-level semantic and syntactic regularities in language, while topic models can form interpretable representations over documents.

Paper
Code

Adapting Text Embeddings for Causal Inference

blei-lab/causal-text-embeddings • • 29 May 2019

To address this challenge, we develop causally sufficient embeddings, low-dimensional document representations that preserve sufficient information for causal identification and allow for efficient estimation of causal effects.

Paper
Code

Neural Models for Documents with Metadata

dallascard/scholar • • ACL 2018

Most real-world document collections involve various types of metadata, such as author, source, and date, and yet the most commonly-used approaches to modeling text corpora ignore this information.

Paper
Code

An Unsupervised Neural Attention Model for Aspect Extraction

ruidan/Unsupervised-Aspect-Extraction • ACL 2017

Unlike topic models which typically assume independently generated words, word embedding models encourage words that appear in similar contexts to be located close to each other in the embedding space.

Paper
Code

Topic Discovery in Massive Text Corpora Based on Min-Hashing

gibranfp/Sampled-MinHashing • 3 Jul 2018

This paper describes an alternative approach to discover topics based on Min-Hashing, which can handle massive text corpora and large vocabularies using modest computer hardware and does not require to fix the number of topics in advance.

Paper
Code

Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence

MilaNLProc/contextualized-topic-models • • ACL 2021

Topic models extract groups of words from documents, whose interpretation as a topic hopefully allows for a better understanding of the data.

Paper
Code

BERTopic: Neural topic modeling with a class-based TF-IDF procedure

MaartenGr/BERTopic • • 11 Mar 2022

BERTopic generates coherent topics and remains competitive across a variety of benchmarks involving classical models and those that follow the more recent clustering approach of topic modeling.

Paper
Code

Topic Models

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result