1 code implementation • 16 Nov 2022 • Adam S. Jermyn, Nicholas Schiefer, Evan Hubinger
In this work we report preliminary attempts to engineer monosemanticity in toy models.
no code implementations • 4 Oct 2022 • Adam Scherlis, Kshitij Sachan, Adam S. Jermyn, Joe Benton, Buck Shlegeris
We show that in a toy model the optimal capacity allocation tends to monosemantically represent the most important features, polysemantically represent less important features (in proportion to their impact on the loss), and entirely ignore the least important features.
1 code implementation • 15 Jan 2020 • Frank Schindler, Adam S. Jermyn
We compare the obtained contraction sequences and identify signs of highly non-local optimization, with the more sophisticated algorithms sacrificing run-time early in the contraction for better overall performance.