no code implementations • 30 Oct 2022 • Alexander Maloney, Daniel A. Roberts, James Sully
Large language models with a huge number of parameters, when trained on near internet-sized number of tokens, have been empirically shown to obey neural scaling laws: specifically, their performance behaves predictably as a power law in either parameters or dataset size until bottlenecked by the other resource.