The Standardized Project Gutenberg Corpus (SPGC) is an open science approach to a curated version of the complete PG data containing more than 50,000 books and more than 3×109 word-tokens.

Source: https://arxiv.org/abs/1812.08092

Papers


Paper Code Results Date Stars

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages