Search Results for author: Andrew Critch

Found 22 papers, 8 papers with code

TASRA: a Taxonomy and Analysis of Societal-Scale Risks from AI

no code implementations12 Jun 2023 Andrew Critch, Stuart Russell

While several recent works have identified societal-scale and extinction-level risks to humanity arising from artificial intelligence, few have attempted an {\em exhaustive taxonomy} of such risks.

WordSig: QR streams enabling platform-independent self-identification that's impossible to deepfake

no code implementations15 Jul 2022 Andrew Critch

Deepfakes can degrade the fabric of society by limiting our ability to trust video content from leaders, authorities, and even friends.

Face Swapping

For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria

1 code implementation7 Jul 2022 Scott Emmons, Caspar Oesterheld, Andrew Critch, Vincent Conitzer, Stuart Russell

In this work, we show that any locally optimal symmetric strategy profile is also a (global) Nash equilibrium.

Human irrationality: both bad and good for reward inference

no code implementations12 Nov 2021 Lawrence Chan, Andrew Critch, Anca Dragan

More importantly, we show that an irrational human, when correctly modelled, can communicate more information about the reward than a perfectly rational human can.

Quantifying Local Specialization in Deep Neural Networks

3 code implementations13 Oct 2021 Shlomi Hod, Daniel Filan, Stephen Casper, Andrew Critch, Stuart Russell

These results suggest that graph-based partitioning can reveal local specialization and that statistical methods can be used to automatedly screen for sets of neurons that can be understood abstractly.

Detecting Modularity in Deep Neural Networks

no code implementations29 Sep 2021 Shlomi Hod, Stephen Casper, Daniel Filan, Cody Wild, Andrew Critch, Stuart Russell

These results suggest that graph-based partitioning can reveal modularity and help us understand how deep neural networks function.

Clusterability in Neural Networks

2 code implementations4 Mar 2021 Daniel Filan, Stephen Casper, Shlomi Hod, Cody Wild, Andrew Critch, Stuart Russell

We also exhibit novel methods to promote clusterability in neural network training, and find that in multi-layer perceptrons they lead to more clusterable networks with little reduction in accuracy.

Accumulating Risk Capital Through Investing in Cooperation

no code implementations25 Jan 2021 Charlotte Roman, Michael Dennis, Andrew Critch, Stuart Russell

Recent work on promoting cooperation in multi-agent learning has resulted in many methods which successfully promote cooperation at the cost of becoming more vulnerable to exploitation by malicious actors.

The impacts of known and unknown demonstrator irrationality on reward inference

no code implementations1 Jan 2021 Lawrence Chan, Andrew Critch, Anca Dragan

Surprisingly, we find that if we give the learner access to the correct model of the demonstrator's irrationality, these irrationalities can actually help reward inference.

Importance and Coherence: Methods for Evaluating Modularity in Neural Networks

no code implementations1 Jan 2021 Shlomi Hod, Stephen Casper, Daniel Filan, Cody Wild, Andrew Critch, Stuart Russell

We apply these methods on partitionings generated by a spectral clustering algorithm which uses a graph representation of the network's neurons and weights.

Clustering

Multi-Principal Assistance Games: Definition and Collegial Mechanisms

no code implementations29 Dec 2020 Arnaud Fickinger, Simon Zhuang, Andrew Critch, Dylan Hadfield-Menell, Stuart Russell

We introduce the concept of a multi-principal assistance game (MPAG), and circumvent an obstacle in social choice theory, Gibbard's theorem, by using a sufficiently collegial preference inference mechanism.

The MAGICAL Benchmark for Robust Imitation

1 code implementation NeurIPS 2020 Sam Toyer, Rohin Shah, Andrew Critch, Stuart Russell

This rewards precise reproduction of demonstrations in one particular environment, but provides little information about how robustly an algorithm can generalise the demonstrator's intent to substantially different deployment settings.

Imitation Learning

AI Research Considerations for Human Existential Safety (ARCHES)

no code implementations30 May 2020 Andrew Critch, David Krueger

Framed in positive terms, this report examines how technical AI research might be steered in a manner that is more attentive to humanity's long-term prospects for survival as a species.

Pruned Neural Networks are Surprisingly Modular

1 code implementation10 Mar 2020 Daniel Filan, Shlomi Hod, Cody Wild, Andrew Critch, Stuart Russell

To discern structure in these weights, we introduce a measurable notion of modularity for multi-layer perceptrons (MLPs), and investigate the modular structure of MLPs trained on datasets of small images.

Clustering Graph Clustering

Optimal Policies Tend to Seek Power

1 code implementation NeurIPS 2021 Alexander Matt Turner, Logan Smith, Rohin Shah, Andrew Critch, Prasad Tadepalli

Some researchers speculate that intelligent reinforcement learning (RL) agents would be incentivized to seek resources and power in pursuit of their objectives.

Reinforcement Learning (RL)

Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making

no code implementations NeurIPS 2018 Nishant Desai, Andrew Critch, Stuart J. Russell

To gain insight into the dynamics of this new framework, we implement a simple NRL agent and empirically examine its behavior in a simple environment.

Decision Making reinforcement-learning +1

Servant of Many Masters: Shifting priorities in Pareto-optimal sequential decision-making

no code implementations31 Oct 2017 Andrew Critch, Stuart Russell

It is often argued that an agent making decisions on behalf of two or more principals who have different utility functions should adopt a {\em Pareto-optimal} policy, i. e., a policy that cannot be improved upon for one agent without making sacrifices for another.

Decision Making

Toward negotiable reinforcement learning: shifting priorities in Pareto optimal sequential decision-making

no code implementations5 Jan 2017 Andrew Critch

Observation (2) represents a substantial divergence from na\"{i}ve linear utility aggregation (as in Harsanyi's utilitarian theorem, and existing MORL algorithms), which is shown here to be inadequate for Pareto optimal sequential decision-making on behalf of players with different beliefs.

Decision Making Multi-Objective Reinforcement Learning +1

Logical Induction

no code implementations12 Sep 2016 Scott Garrabrant, Tsvi Benson-Tilsen, Andrew Critch, Nate Soares, Jessica Taylor

For instance, if the language is Peano arithmetic, it assigns probabilities to all arithmetical statements, including claims about the twin prime conjecture, the outputs of long-running computations, and its own probabilities.

Sentence

Cannot find the paper you are looking for? You can Submit a new open access paper.