no code implementations • 28 Sep 2023 • Stuart Armstrong, Alexandre Maranhão, Oliver Daniels-Koch, Patrick Leask, Rebecca Gorman
Goal misgeneralisation is a key challenge in AI alignment -- the task of getting powerful Artificial Intelligences to align their goals with human intentions and human morality.
no code implementations • 12 Nov 2022 • Oliver Daniels-Koch, Rachel Freedman
RLHF algorithms that learn from multiple teachers therefore face an expertise problem: the reliability of a given piece of feedback depends both on the teacher that it comes from and how specialized that teacher is on relevant components of the task.
1 code implementation • 17 Jul 2022 • Oliver Daniels-Koch
We introduce CULT (Continual Unsupervised Representation Learning with Typicality-Based Environment Detection), a new algorithm for continual unsupervised learning with variational auto-encoders.