mC4 is a multilingual variant of the C4 dataset called mC4. mC4 comprises natural text in 101 languages drawn from the public Common Crawl web scrape.

Source: mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

Papers


Paper Code Results Date Stars

Tasks


Similar Datasets


License


Modalities


Languages