no code implementations • 26 Feb 2024 • Domenic Rosati, Jan Wehner, Kai Williams, Łukasz Bartoszcze, Jan Batzner, Hassan Sajjad, Frank Rudzicz
Approaches to aligning large language models (LLMs) with human values has focused on correcting misalignment that emerges from pretraining.
no code implementations • 7 Feb 2024 • Jan Wehner, Frans Oliehoek, Luciano Cavalcante Siebert
Finally, we measure how informative the generated explanations are to a proxy-human model by training it on CTEs.