One Billion Word Benchmark (One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling)

Introduced by Chelba et al. in One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling

Text corpus with almost one billion words of training data for statistical language modeling benchmarking. The scale of approximately one billion words attempts to strike a balance between the relevance of the benchmark in a world of abundant data against the ease with which researchers can evaluate their modeling approaches. Monolingual english data was obtained from the WMT11 website and prepared using a variety of best-practices for machine learning dataset preparations.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


Modalities


Languages