5 dataset results for Word Embeddings AND Bengali

WikiANN, also known as PAN-X, is a multilingual named entity recognition dataset. It consists of Wikipedia articles that have been annotated with LOC (location), PER (person), and ORG (organization) tags in the IOB2 format¹². This dataset serves as a valuable resource for training and evaluating named entity recognition models across various languages.

58 PAPERS • 3 BENCHMARKS

Bengali Hate Speech

Introduces three datasets of expressing hate, commonly used topics, and opinions for hate speech detection, document classification, and sentiment analysis, respectively.

6 PAPERS • NO BENCHMARKS YET

IndicNLP Corpus

The IndicNLP corpus is a large-scale, general-domain corpus containing 2.7 billion words for 10 Indian languages from two language families.

3 PAPERS • NO BENCHMARKS YET

Bangla Word Analogy

We provide a Mikolov-style word-analogy evaluation set specifically for Bangla, with a sample size of 16678, as well as a translated and curated version of the Mikolov dataset, which contains 10594 samples for cross-lingual research.

1 PAPER • NO BENCHMARKS YET

Word Analogy Bangla

0 PAPER • NO BENCHMARKS YET

Datasets

5 dataset results for Word Embeddings AND Bengali