3 dataset results for Word Embeddings AND Italian

WikiANN, also known as PAN-X, is a multilingual named entity recognition dataset. It consists of Wikipedia articles that have been annotated with LOC (location), PER (person), and ORG (organization) tags in the IOB2 format¹². This dataset serves as a valuable resource for training and evaluating named entity recognition models across various languages.

58 PAPERS • 3 BENCHMARKS

WikiNEuRal

WikiNEuRal is a high-quality automatically-generated dataset for Multilingual Named Entity Recognition.

5 PAPERS • NO BENCHMARKS YET

DICE: a Dataset of Italian Crime Event news

DICE: a Dataset of Italian Crime Event news (from Gazzetta di Modena [2011-2021])

The dataset contains the main components of the news articles published online by the newspaper named <a href="https://gazzettadimodena.gelocal.it/modena">Gazzetta di Modena</a>: url of the web page, title, sub-title, text, date of publication, crime category assigned to each news article by the author.

3 PAPERS • NO BENCHMARKS YET

Datasets

3 dataset results for Word Embeddings AND Italian