BabySLM is a language-acquisition-friendly benchmark to probe speech-based LMs at the lexical and syntactic levels, both of which are compatible with the vocabulary typical of children's language experiences.
1 PAPER • NO BENCHMARKS YET
This repository contains gzipped files containing more than 2 million tokens (words) from answers submitted by more than 6,000 students over the course of their first 30 days of using Duolingo. It also contains baseline starter code written in Python. There are three data sets, corresponding to three different language courses. More details on the data set and task are available at: http://sharedtask.duolingo.com. (2018-01-10)
3 PAPERS • NO BENCHMARKS YET
This is a gzipped CSV file containing the 13 million Duolingo student learning traces used in experiments by Settles & Meeder (2016). For more details and replication source code, visit: https://github.com/duolingo/halflife-regression (2016-06-07)