no code implementations • 24 May 2024 • Adam S. Shai, Sarah E. Marzen, Lucas Teixeira, Alexander Gietelink Oldenziel, Paul M. Riechers
What computational structure are we building into large language models when we train them on next-token prediction?