Search Results for author: Ryan Carey

Found 8 papers, 1 papers with code

Path-Specific Objectives for Safer Agent Incentives

no code implementations21 Apr 2022 Sebastian Farquhar, Ryan Carey, Tom Everitt

We then train agents to maximize the causal effect of actions on the expected return which is not mediated by the delicate parts of state, using Causal Influence Diagram analysis.

Too Big to Fail? Active Few-Shot Learning Guided Logic Synthesis

1 code implementation5 Apr 2022 Animesh Basak Chowdhury, Benjamin Tan, Ryan Carey, Tushit Jain, Ramesh Karri, Siddharth Garg

Generating sub-optimal synthesis transformation sequences ("synthesis recipe") is an important problem in logic synthesis.

Few-Shot Learning

A Complete Criterion for Value of Information in Soluble Influence Diagrams

no code implementations23 Feb 2022 Chris van Merwijk, Ryan Carey, Tom Everitt

Influence diagrams have recently been used to analyse the safety and fairness properties of AI systems.


Why Fair Labels Can Yield Unfair Predictions: Graphical Conditions for Introduced Unfairness

no code implementations22 Feb 2022 Carolyn Ashurst, Ryan Carey, Silvia Chiappa, Tom Everitt

In addition to reproducing discriminatory relationships in the training data, machine learning systems can also introduce or amplify discriminatory effects.

Agent Incentives: A Causal Perspective

no code implementations2 Feb 2021 Tom Everitt, Ryan Carey, Eric Langlois, Pedro A Ortega, Shane Legg

We propose a new graphical criterion for value of control, establishing its soundness and completeness.


The Incentives that Shape Behaviour

no code implementations20 Jan 2020 Ryan Carey, Eric Langlois, Tom Everitt, Shane Legg

Which variables does an agent have an incentive to control with its decision, and which variables does it have an incentive to respond to?


(When) Is Truth-telling Favored in AI Debate?

no code implementations11 Nov 2019 Vojtěch Kovařík, Ryan Carey

For some problems, humans may not be able to accurately judge the goodness of AI-proposed solutions.

Incorrigibility in the CIRL Framework

no code implementations19 Sep 2017 Ryan Carey

We demonstrate this by presenting some Supervised POMDP scenarios in which errors in the parameterized reward function remove the incentive to follow shutdown commands.

Cannot find the paper you are looking for? You can Submit a new open access paper.