PT Hate Speech Dataset | Papers With Code

Name:*

Full name (optional):

Description (Markdown and $\LaTeX$ enabled):*

The **PT Hate Speech** is a valuable resource for studying hate speech in the Portuguese language. Here are the key details about this dataset:

1. **Composition**:
   - The dataset consists of **5,668 tweets** written in Portuguese.
   - Annotators labeled these tweets using two different schemes based on their expertise levels.

2. **Annotation Schemes**:
   - **Non-experts** initially annotated the tweets using **binary labels**: either **'hate'** or **'no-hate'**.
   - **Expert annotators** then classified the tweets using a **fine-grained hierarchical multiple label scheme**. This scheme includes **81 hate speech categories** in total.

3. **Hierarchical Annotation Scheme**:
   - The hierarchical approach allows for identifying different types of hate speech and their intersections.
   - The inter-annotator agreement varied across categories, reflecting the nuanced nature of hate speech perception.

4. **Usefulness and Baseline Experiment**:
   - To demonstrate the dataset's usefulness, a **baseline classification experiment** was conducted using pre-trained word embeddings and LSTM models.
   - The results achieved a **state-of-the-art outcome**.

Source: Conversation with Bing, 3/16/2024
(1) A Hierarchically-Labeled Portuguese Hate Speech Dataset. https://aclanthology.org/W19-3510/.
(2) A Hierarchically-Labeled Portuguese Hate Speech Dataset - ACL Anthology. https://aclanthology.org/W19-3510.pdf.
(3) A Hierarchically-Labeled Portuguese Hate Speech Dataset. https://paperswithcode.com/paper/a-hierarchically-labeled-portuguese-hate.
(4) undefined. https://aclanthology.org/W19-3510.

Homepage URL (optional):

Paper where the dataset was introduced:

Introduction date:

Dataset license:

URL to full license terms:

Image

---

PT Hate Speech

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages

PT Hate Speech

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

License Edit

Modalities Edit

Languages Edit