Search Results for author: Chris van Merwijk

Found 2 papers, 0 papers with code

A Complete Criterion for Value of Information in Soluble Influence Diagrams

no code implementations23 Feb 2022 Chris van Merwijk, Ryan Carey, Tom Everitt

Influence diagrams have recently been used to analyse the safety and fairness properties of AI systems.

Fairness

Risks from Learned Optimization in Advanced Machine Learning Systems

no code implementations5 Jun 2019 Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, Scott Garrabrant

We analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer - a situation we refer to as mesa-optimization, a neologism we introduce in this paper.

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.