mC4 is a multilingual variant of the C4 dataset called mC4. mC4 comprises natural text in 101 languages drawn from the public Common Crawl web scrape.
Source: mT5: A Massively Multilingual Pre-trained Text-to-Text TransformerPaper | Code | Results | Date | Stars |
---|