1 code implementation • 26 May 2023 • Vijeta Deshpande, Dan Pechi, Shree Thatte, Vladislav Lialin, Anna Rumshisky
The majority of recent scaling laws studies focused on high-compute high-parameter count settings, leaving the question of when these abilities begin to emerge largely unanswered.