Search Results for author: Ulisse Mini

Found 2 papers, 1 papers with code

Understanding and Controlling a Maze-Solving Policy Network

no code implementations • 12 Oct 2023 • Ulisse Mini, Peli Grietzer, Mrinank Sharma, Austin Meek, Monte MacDiarmid, Alexander Matt Turner

To understand the goals and goal representations of AI systems, we carefully study a pretrained reinforcement learning policy that solves mazes by navigating to a range of target squares.

Paper
Add Code

Activation Addition: Steering Language Models Without Optimization

1 code implementation • 20 Aug 2023 • Alexander Matt Turner, Lisa Thiergart, David Udell, Gavin Leech, Ulisse Mini, Monte MacDiarmid

We demonstrate ActAdd on GPT-2 on OpenWebText and ConceptNet, and replicate the effect on Llama-13B and GPT-J-6B.

Prompt Engineering

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.