2 code implementations • 19 Jun 2024 • Dan Saattrup Nielsen, Kenneth Enevoldsen, Peter Schneider-Kamp
This paper explores the performance of encoder and decoder language models on multilingual Natural Language Understanding (NLU) tasks, with a broad focus on Germanic languages.
1 code implementation • 13 Jun 2024 • Márton Kardos, Jan Kostkan, Arnault-Quentin Vermillet, Kristoffer Nielbo, Kenneth Enevoldsen, Roberta Rocca
Topic models are useful tools for discovering latent semantic structures in large textual corpora.
2 code implementations • 4 Jun 2024 • Kenneth Enevoldsen, Márton Kardos, Niklas Muennighoff, Kristoffer Laigaard Nielbo
The evaluation of English text embeddings has transitioned from evaluating a handful of datasets to broad coverage across many tasks through benchmarks such as MTEB.
no code implementations • 28 Feb 2024 • Kenneth Enevoldsen, Emil Trenckner Jessen, Rebekah Baglini
Named entity recognition is one of the cornerstones of Danish NLP, essential for language technology applications within both industry and research.
no code implementations • 9 Dec 2023 • Kenneth Enevoldsen
Augmnety is a Python library for structured text augmentation.
no code implementations • 13 Nov 2023 • Kenneth Enevoldsen, Lasse Hansen, Dan S. Nielsen, Rasmus A. F. Egebæk, Søren V. Holm, Martin C. Nielsen, Martin Bernstorff, Rasmus Larsen, Peter B. Jørgensen, Malte Højmark-Bertelsen, Peter B. Vahlstrup, Per Møldrup-Dalum, Kristoffer Nielbo
Large language models, sometimes referred to as foundation models, have transformed multiple fields of research.
1 code implementation • 5 Jan 2023 • Lasse Hansen, Ludvig Renbo Olsen, Kenneth Enevoldsen
TextDescriptives is a Python package for calculating a large variety of metrics from text.
no code implementations • 12 Jul 2021 • Kenneth Enevoldsen, Lasse Hansen, Kristoffer Nielbo
In addition, we conduct a series of tests for biases and robustness of Danish NLP pipelines through augmentation of the test set of DaNE.
Ranked #1 on Dependency Parsing on DaNE