1 code implementation • 6 Mar 2024 • Arik Reuter, Anton Thielmann, Christoph Weisser, Benjamin Säfken, Thomas Kneib
With the rise of transformers in Natural Language Processing, however, several successful models that rely on straightforward clustering approaches in transformer-based embedding spaces have emerged and consolidated the notion of topics as clusters of embedding vectors.
no code implementations • 6 Mar 2024 • Arik Reuter, Anton Thielmann, Christoph Weisser, Sebastian Fischer, Benjamin Säfken
Topic modeling seems to be almost synonymous with generating lists of top words to represent topics within large text corpora.
no code implementations • 19 Dec 2022 • Anton Thielmann, Christoph Weisser, Benjamin Säfken
Few-shot methods for accurate modeling under sparse label-settings have improved significantly.
1 code implementation • 8 Jul 2022 • Andreas Buchmüller, Gillian Kant, Christoph Weisser, Benjamin Säfken, Krisztina Kis-Katos, Thomas Kneib
We present Twitmo, a package that provides a broad range of methods to collect, pre-process, analyze and visualize geo-tagged Twitter data.
no code implementations • 17 Nov 2021 • Mattias Luber, Anton Thielmann, Christoph Weisser, Benjamin Säfken
Extracting topics from large collections of unstructured text-documents has become a central task in current NLP applications and algorithms like NMF, LDA as well as their generalizations are the well-established current state of the art.
no code implementations • 29 Sep 2021 • Anton Frederik Thielmann, Christoph Weisser, Thomas Kneib, Benjamin Saefken
While these algorithms differ in their modeling approach, they have in common that hyperparameter optimization is difficult and is mainly achieved by maximizing the extracted topic coherence scores via a grid search.