no code implementations • 17 Dec 2024 • Salim I. Amoukou, Tom Bewley, Saumitra Mishra, Freddy Lecue, Daniele Magazzeni, Manuela Veloso
We introduce a novel approach for detecting distribution shifts that negatively impact the performance of machine learning models in continuous production environments, which requires no access to ground truth data labels.
no code implementations • 25 Nov 2024 • Junqi Jiang, Tom Bewley, Saumitra Mishra, Freddy Lecue, Manuela Veloso
We see our method as a flexible framework for RM explanation, providing a basis for more interpretable and trustworthy LLM alignment.
no code implementations • 29 May 2024 • Tom Bewley, Salim I. Amoukou, Saumitra Mishra, Daniele Magazzeni, Manuela Veloso
We introduce T-CREx, a novel model-agnostic method for local and global counterfactual explanation (CE), which summarises recourse options for both individuals and groups in the form of human-readable rules.
2 code implementations • 26 Sep 2023 • Scott Jeen, Tom Bewley, Jonathan M. Cullen
Zero-shot reinforcement learning (RL) promises to provide agents that can perform any task in an environment after an offline, reward-free pre-training phase.
no code implementations • 26 May 2023 • Tom Bewley, Jonathan Lawry, Arthur Richards
We propose a method to capture the handling abilities of fast jet pilots in a software model via reinforcement learning (RL) from human preference feedback.
no code implementations • 3 Oct 2022 • Tom Bewley, Jonathan Lawry, Arthur Richards, Rachel Craddock, Ian Henderson
Recent efforts to learn reward functions from human feedback have tended to use deep neural networks, whose lack of transparency hampers our ability to explain agent behaviour or verify alignment.
1 code implementation • 30 May 2022 • Joseph Early, Tom Bewley, Christine Evers, Sarvapali Ramchurn
We generalise the problem of reward modelling (RM) for reinforcement learning (RL) to handle non-Markovian rewards.
no code implementations • 17 Jan 2022 • Tom Bewley, Jonathan Lawry, Arthur Richards
We introduce a data-driven, model-agnostic technique for generating a human-interpretable summary of the salient points of contrast within an evolving dynamical system, such as the learning process of a control agent.
no code implementations • 20 Dec 2021 • Tom Bewley, Freddy Lecue
The potential of reinforcement learning (RL) to deliver aligned and performant agents is partially bottlenecked by the reward engineering problem.
1 code implementation • 10 Sep 2020 • Tom Bewley, Jonathan Lawry
In explainable artificial intelligence, there is increasing interest in understanding the behaviour of autonomous agents to build trust and validate performance.
Deep Reinforcement Learning
Explainable artificial intelligence
+2
no code implementations • 2 Jul 2020 • Tom Bewley
The rule extraction literature contains the notion of a fidelity-accuracy dilemma: when building an interpretable model of a black box function, optimising for fidelity is likely to reduce performance on the underlying task, and vice versa.
no code implementations • 19 Jun 2020 • Tom Bewley, Jonathan Lawry, Arthur Richards
As we deploy autonomous agents in safety-critical domains, it becomes important to develop an understanding of their internal mechanisms and representations.