One of the most surprising puzzles in neural network generalisation is grokking: a network with perfect training accuracy but poor generalisation will, upon further training, transition to perfect generalisation.
One of the gnarliest challenges in reinforcement learning (RL) is exploration that scales to vast domains, where novelty-, or coverage-seeking behaviour falls short.
Recent work has shown that asking language models to generate reasoning steps improves performance on many reasoning tasks.
Ranked #25 on Arithmetic Reasoning on GSM8K (using extra training data)
However, an AI system may pursue an undesired goal even when the specification is correct, in the case of goal misgeneralization.
In this paper we answer this question in the affirmative, using ReQueST to train an agent to perform a 3D first-person object collection task using data entirely from human contractors.
Formal Methods for the Informal Engineer (FMIE) was a workshop held at the Broad Institute of MIT and Harvard in 2021 to explore the potential role of verified software in the biomedical software ecosystem.
How can we design agents that pursue a given objective when all feedback mechanisms are influenceable by the agent?
Standard Markov Decision Process (MDP) formulations of RL and simulated environments mirroring the MDP structure assume secure access to feedback (e. g., rewards).
Formal verification of machine learning models has attracted attention recently, and significant progress has been made on proving simple properties like robustness to small perturbations of the input features.
Can humans get arbitrarily capable reinforcement learning (RL) agents to do their bidding?
Proposals for safe AGI systems are typically made at the level of frameworks, specifying how the components of the proposed system should be trained and interact with each other.
How can we design safe reinforcement learning agents that avoid unnecessary disruptions to their environment?