Search Results for author: Nahema Marchal

Found 3 papers, 0 papers with code

A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI

no code implementations • 23 Apr 2024 • Seliem El-Sayed, Canfer Akbulut, Amanda McCroskery, Geoff Keeling, Zachary Kenton, Zaria Jalan, Nahema Marchal, Arianna Manzini, Toby Shevlane, Shannon Vallor, Daniel Susser, Matija Franklin, Sophie Bridgers, Harry Law, Matthew Rahtz, Murray Shanahan, Michael Henry Tessler, Arthur Douillard, Tom Everitt, Sasha Brown

This has led to growing concerns about harms from AI persuasion and how they can be mitigated, highlighting the need for a systematic study of AI persuasion.

Prompt Engineering

Paper
Add Code

Sociotechnical Safety Evaluation of Generative AI Systems

no code implementations • 18 Oct 2023 • Laura Weidinger, Maribeth Rauh, Nahema Marchal, Arianna Manzini, Lisa Anne Hendricks, Juan Mateos-Garcia, Stevie Bergman, Jackie Kay, Conor Griffin, Ben Bariach, Iason Gabriel, Verena Rieser, William Isaac

First, we propose a three-layered framework that takes a structured, sociotechnical approach to evaluating these risks.

Paper
Add Code

Model evaluation for extreme risks

no code implementations • 24 May 2023 • Toby Shevlane, Sebastian Farquhar, Ben Garfinkel, Mary Phuong, Jess Whittlestone, Jade Leung, Daniel Kokotajlo, Nahema Marchal, Markus Anderljung, Noam Kolt, Lewis Ho, Divya Siddarth, Shahar Avin, Will Hawkins, Been Kim, Iason Gabriel, Vijay Bolina, Jack Clark, Yoshua Bengio, Paul Christiano, Allan Dafoe

Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.