The Liu et al. Corpus is a pretraining dataset for large language models. It consists of 160Gb of news, books, stories, and web text.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages