no code implementations • 12 Oct 2023 • Ulisse Mini, Peli Grietzer, Mrinank Sharma, Austin Meek, Monte MacDiarmid, Alexander Matt Turner
To understand the goals and goal representations of AI systems, we carefully study a pretrained reinforcement learning policy that solves mazes by navigating to a range of target squares.
1 code implementation • 20 Aug 2023 • Alexander Matt Turner, Lisa Thiergart, David Udell, Gavin Leech, Ulisse Mini, Monte MacDiarmid
We demonstrate ActAdd on GPT-2 on OpenWebText and ConceptNet, and replicate the effect on Llama-13B and GPT-J-6B.