The TweetSentBR Dataset is a valuable resource for sentiment analysis in Brazilian Portuguese. Let me provide you with some details about it:

  1. Description:
  2. The dataset consists of 15,000 manually annotated sentences extracted from tweets in Brazilian Portuguese.
  3. These sentences are specifically related to the TV show domain.
  4. Each sentence has been labeled into one of three classes: positive, neutral, or negative sentiment.
  5. The annotation process followed literature guidelines to ensure reliability.

  6. Purpose:

  7. Researchers and practitioners in the field of Natural Language Processing (NLP) use this dataset for sentiment analysis tasks.
  8. It serves as a benchmark for developing and evaluating novel methods and approaches for sentiment classification.

  9. Performance:

  10. Baseline experiments on polarity classification using three machine learning methods achieved the following results:
    • Binary classification (positive vs. negative): 80.99% F-Measure and 82.06% accuracy.
    • Three-point classification (positive, neutral, negative): 59.85% F-Measure and 64.62% accuracy.

Source: Conversation with Bing, 3/16/2024 (1) Building a Sentiment Corpus of Tweets in Brazilian Portuguese. https://arxiv.org/abs/1712.08917. (2) 7 Best Portuguese Language Speech Datasets of 2022 | Twine. https://www.twine.net/blog/portuguese-language-speech-datasets/. (3) A survey and study impact of tweet sentiment analysis via ... - Springer. https://link.springer.com/article/10.1007/s10579-023-09687-8. (4) Top 25 Twitter Datasets for NLP and Machine Learning | iMerit. https://imerit.net/blog/top-25-twitter-datasets-for-natural-language-processing-and-machine-learning-all-pbm/. (5) Building a Sentiment Corpus of Tweets in Brazilian Portuguese - arXiv.org. https://arxiv.org/pdf/1712.08917v1.pdf. (6) undefined. https://doi.org/10.48550/arXiv.1712.08917.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages