A new multilingual language model benchmark that is composed of 40+ languages spanning several scripts and linguistic families containing round 40 billion characters and aimed to accelerate the research of multilingual modeling.
Source: Wiki-40B: Multilingual Language Model DatasetPaper | Code | Results | Date | Stars |
---|