The SB10k dataset is a valuable resource for sentiment analysis in German. Here are the key details:

  • Corpus Size: It contains approximately 10,000 German tweets¹.
  • Language: German.
  • Task: Text classification, specifically sentiment analysis.
  • Multilinguality: Monolingual (German only).
  • Size Category: Falls within the range of 1K to 10K examples.
  • Tags: Sentiment analysis.
  • License: CC-BY-4.0.

The dataset was created by annotating German tweets, with each tweet labeled by three annotators. Researchers have used SB10k to benchmark various machine learning classifiers, including convolutional neural networks (CNNs) and feature-based support vector machines (SVMs) for sentiment analysis²³.

(1) Alienmaster/SB10k · Datasets at Hugging Face. https://huggingface.co/datasets/Alienmaster/SB10k. (2) A Twitter Corpus and Benchmark Resources for German Sentiment Analysis. https://aclanthology.org/W17-1106/. (3) A Twitter Corpus and Benchmark Resources for German Sentiment Analysis. https://aclanthology.org/W17-1106.pdf. (4) undefined. http://t.co/9rhta65MSx. (5) undefined. http://t.co/G84qcIGk7k. (6) undefined. http://t.co/LvwyZgew4Q.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages