The Standardized Project Gutenberg Corpus (SPGC) is an open science approach to a curated version of the complete PG data containing more than 50,000 books and more than 3×109 word-tokens.
Source: https://arxiv.org/abs/1812.08092Paper | Code | Results | Date | Stars |
---|