GermEval

The GermEval dataset is a valuable resource for natural language processing (NLP) tasks, specifically named entity recognition (NER), conducted in the German language. Here are some key details about this dataset:

Task: Token Classification (specifically, named entity recognition)
Language: German
Size: The dataset falls within the category of 100K < n < 1M tokens.
Source: The data was sampled from German Wikipedia and News Corpora, comprising a collection of citations.
Annotations: The annotations were created through crowdsourcing efforts.
License: The dataset is available under the cc-by-4.0 license.
Content: It covers over 31,000 sentences, corresponding to more than 590,000 tokens.
Purpose: Researchers and practitioners can use this dataset to train and evaluate NER models for German text.

You can find more information and explore the dataset on the Hugging Face Datasets page ¹.

(1) germeval_14 · Datasets at Hugging Face. https://huggingface.co/datasets/germeval_14. (2) GermEval-2018 Corpus (DE) - Empirical Linguistics and ... - heiDATA. https://heidata.uni-heidelberg.de/dataset.xhtml?persistentId=doi:10.11588/data/0B5VML. (3) GermEval 2014 Named Entity Recognition Shared Task - Data and Task Setup. https://sites.google.com/site/germeval2014ner/data. (4) 6 Best German Language Datasets of 2022 | Twine - Twine Blog. https://www.twine.net/blog/best-german-language-datasets/. (5) germeval_14 | TensorFlow Datasets. https://www.tensorflow.org/datasets/community_catalog/huggingface/germeval_14. (6) undefined. http://www.stern.de/sport/fussball/krawalle-in-der-fussball-bundesliga-dfb-setzt-auf-falsche-konzepte-1553657.html. (7) undefined. http://www.fr-online.de/in_und_ausland/sport/aktuell/1618625_Frings-schaut-finster-in-die-Zukunft.html.

Homepage