Search Results for author: Stuart Armstrong

Found 12 papers, 0 papers with code

CoinRun: Solving Goal Misgeneralisation

no code implementations28 Sep 2023 Stuart Armstrong, Alexandre Maranhão, Oliver Daniels-Koch, Patrick Leask, Rebecca Gorman

Goal misgeneralisation is a key challenge in AI alignment -- the task of getting powerful Artificial Intelligences to align their goals with human intentions and human morality.

Concept Extrapolation: A Conceptual Primer

no code implementations19 Jun 2023 Matija Franklin, Rebecca Gorman, Hal Ashton, Stuart Armstrong

This article is a primer on concept extrapolation - the ability to take a concept, a feature, or a goal that is defined in one context and extrapolate it safely to a more general context.

Recognising the importance of preference change: A call for a coordinated multidisciplinary research effort in the age of AI

no code implementations20 Mar 2022 Matija Franklin, Hal Ashton, Rebecca Gorman, Stuart Armstrong

We operationalize preference to incorporate concepts from various disciplines, outlining the importance of meta-preferences and preference-change preferences, and proposing a preliminary framework for how preferences change.

Recommendation Systems

The dangers in algorithms learning humans' values and irrationalities

no code implementations28 Feb 2022 Rebecca Gorman, Stuart Armstrong

For an artificial intelligence (AI) to be aligned with human values (or human preferences), it must first learn those values.

Chess as a Testing Grounds for the Oracle Approach to AI Safety

no code implementations6 Oct 2020 James D. Miller, Roman Yampolskiy, Olle Haggstrom, Stuart Armstrong

To reduce the danger of powerful super-intelligent AIs, we might make the first such AIs oracles that can only send and receive messages.

BIG-bench Machine Learning

Pitfalls of learning a reward function online

no code implementations28 Apr 2020 Stuart Armstrong, Jan Leike, Laurent Orseau, Shane Legg

We formally introduce two desirable properties: the first is `unriggability', which prevents the agent from steering the learning process in the direction of a reward function that is easier to optimise.

Counterfactual equivalence for POMDPs, and underlying deterministic environments

no code implementations11 Jan 2018 Stuart Armstrong

Partially Observable Markov Decision Processes (POMDPs) are rich environments often used in machine learning.

BIG-bench Machine Learning counterfactual

'Indifference' methods for managing agent rewards

no code implementations18 Dec 2017 Stuart Armstrong, Xavier O'Rourke

`Indifference' refers to a class of methods used to control reward based agents.

Good and safe uses of AI Oracles

no code implementations15 Nov 2017 Stuart Armstrong, Xavier O'Rorke

It is possible that powerful and potentially dangerous artificial intelligence (AI) might be developed in the future.

counterfactual

Low Impact Artificial Intelligences

no code implementations30 May 2017 Stuart Armstrong, Benjamin Levinstein

This paper looks at an alternative approach: defining a general concept of `low impact'.

Anthropic decision theory

no code implementations28 Oct 2011 Stuart Armstrong

This paper sets out to resolve how agents ought to act in the Sleeping Beauty problem and various related anthropic (self-locating belief) problems, not through the calculation of anthropic probabilities, but through finding the correct decision to make.

Cannot find the paper you are looking for? You can Submit a new open access paper.