no code implementations • 6 Dec 2023 • Ole Jorgensen, Dylan Cope, Nandi Schoots, Murray Shanahan
Recent work in activation steering has demonstrated the potential to better control the outputs of Large Language Models (LLMs), but it involves finding steering vectors.
1 code implementation • 20 Oct 2023 • Henning Bartsch, Ole Jorgensen, Domenic Rosati, Jason Hoelscher-Obermaier, Jacob Pfau
Using this test, we find that despite increases in self-consistency, models usually place significant weight on alternative, inconsistent answers.