3 dataset results for Named Entity Recognition (NER) AND Texts AND Hindi

Naamapadam is a Named Entity Recognition (NER) dataset for the 11 major Indian languages from two language families. In each language, it contains more than 400k sentences annotated with a total of at least 100k entities from three standard entity categories (Person, Location and Organization) for 9 out of the 11 languages. The training dataset has been automatically created from the Samanantar parallel corpus by projecting automatically tagged entities from an English sentence to the corresponding Indian language sentence.

3 PAPERS • NO BENCHMARKS YET

HiNER-collapsed

HiNER-collapsed (HiNER: A Large Hindi Named Entity Recognition Dataset)

This dataset releases a significantly sized standard-abiding Hindi NER dataset containing 109,146 sentences and 2,220,856 tokens, annotated with 3 collapsed tags (PER, LOC, ORG).

1 PAPER • 1 BENCHMARK

HiNER-original

HiNER-original (HiNER: A Large Hindi Named Entity Recognition Dataset)

This dataset releases a significantly sized standard-abiding Hindi NER dataset containing 109,146 sentences and 2,220,856 tokens, annotated with 11 tags.

1 PAPER • 1 BENCHMARK

Datasets

3 dataset results for Named Entity Recognition (NER) AND Texts AND Hindi