One Billion Word Benchmark (One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling)

Introduced by Chelba et al. in One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling

Text corpus with almost one billion words of training data for statistical language modeling benchmarking. The scale of approximately one billion words attempts to strike a balance between the relevance of the benchmark in a world of abundant data against the ease with which researchers can evaluate their modeling approaches. Monolingual english data was obtained from the WMT11 website and prepared using a variety of best-practices for machine learning dataset preparations.

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Similar Datasets

Invisible Mobile Keyboard Dataset

Billion Word Benchmark

Usage

One Billion Word Benchmark (One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling)

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

Invisible Mobile Keyboard Dataset

Billion Word Benchmark

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages