Search Results for author: Nick Gabrieli

Found 2 papers, 1 papers with code

Safety Cases: How to Justify the Safety of Advanced AI Systems

no code implementations • 15 Mar 2024 • Joshua Clymer, Nick Gabrieli, David Krueger, Thomas Larsen

To prepare for these decisions, we investigate how developers could make a 'safety case,' which is a structured rationale that AI systems are unlikely to cause a catastrophe.

Paper
Add Code

Steering Llama 2 via Contrastive Activation Addition

3 code implementations • 9 Dec 2023 • Nina Rimsky, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, Alexander Matt Turner

We introduce Contrastive Activation Addition (CAA), an innovative method for steering language models by modifying their activations during forward passes.

Multiple-choice

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.