Search Results for author: Stuart Armstrong

Found 12 papers, 0 papers with code

CoinRun: Solving Goal Misgeneralisation

no code implementations • 28 Sep 2023 • Stuart Armstrong, Alexandre Maranhão, Oliver Daniels-Koch, Patrick Leask, Rebecca Gorman

Goal misgeneralisation is a key challenge in AI alignment -- the task of getting powerful Artificial Intelligences to align their goals with human intentions and human morality.

Paper
Add Code

Concept Extrapolation: A Conceptual Primer

no code implementations • 19 Jun 2023 • Matija Franklin, Rebecca Gorman, Hal Ashton, Stuart Armstrong

This article is a primer on concept extrapolation - the ability to take a concept, a feature, or a goal that is defined in one context and extrapolate it safely to a more general context.

Paper
Add Code

Recognising the importance of preference change: A call for a coordinated multidisciplinary research effort in the age of AI

no code implementations • 20 Mar 2022 • Matija Franklin, Hal Ashton, Rebecca Gorman, Stuart Armstrong

We operationalize preference to incorporate concepts from various disciplines, outlining the importance of meta-preferences and preference-change preferences, and proposing a preliminary framework for how preferences change.

Recommendation Systems

Paper
Add Code

The dangers in algorithms learning humans' values and irrationalities

no code implementations • 28 Feb 2022 • Rebecca Gorman, Stuart Armstrong

For an artificial intelligence (AI) to be aligned with human values (or human preferences), it must first learn those values.

Paper
Add Code

Chess as a Testing Grounds for the Oracle Approach to AI Safety

no code implementations • 6 Oct 2020 • James D. Miller, Roman Yampolskiy, Olle Haggstrom, Stuart Armstrong

To reduce the danger of powerful super-intelligent AIs, we might make the first such AIs oracles that can only send and receive messages.

BIG-bench Machine Learning

Paper
Add Code

Pitfalls of learning a reward function online

no code implementations • 28 Apr 2020 • Stuart Armstrong, Jan Leike, Laurent Orseau, Shane Legg

We formally introduce two desirable properties: the first is `unriggability', which prevents the agent from steering the learning process in the direction of a reward function that is easier to optimise.

Paper
Add Code

Counterfactual equivalence for POMDPs, and underlying deterministic environments

no code implementations • 11 Jan 2018 • Stuart Armstrong

Partially Observable Markov Decision Processes (POMDPs) are rich environments often used in machine learning.

BIG-bench Machine Learning counterfactual

Paper
Add Code

'Indifference' methods for managing agent rewards

no code implementations • 18 Dec 2017 • Stuart Armstrong, Xavier O'Rourke

`Indifference' refers to a class of methods used to control reward based agents.

Paper
Add Code

Occam's razor is insufficient to infer the preferences of irrational agents

no code implementations • NeurIPS 2018 • Stuart Armstrong, Sören Mindermann

Inverse reinforcement learning (IRL) attempts to infer human rewards or preferences from observed behavior.

Reinforcement Learning (RL)

Paper
Add Code

Good and safe uses of AI Oracles

no code implementations • 15 Nov 2017 • Stuart Armstrong, Xavier O'Rorke

It is possible that powerful and potentially dangerous artificial intelligence (AI) might be developed in the future.

counterfactual

Paper
Add Code

Low Impact Artificial Intelligences

no code implementations • 30 May 2017 • Stuart Armstrong, Benjamin Levinstein

This paper looks at an alternative approach: defining a general concept of `low impact'.

Paper
Add Code

Anthropic decision theory

no code implementations • 28 Oct 2011 • Stuart Armstrong

This paper sets out to resolve how agents ought to act in the Sleeping Beauty problem and various related anthropic (self-locating belief) problems, not through the calculation of anthropic probabilities, but through finding the correct decision to make.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.