Search Results for author: Diogo Cruz

Found 1 papers, 1 papers with code

Reinforcement Learning Fine-tuning of Language Models is Biased Towards More Extractable Features

1 code implementation7 Nov 2023 Diogo Cruz, Edoardo Pona, Alex Holness-Tofts, Elias Schmied, Víctor Abia Alonso, Charlie Griffin, Bogdan-Ionut Cirstea

Many capable large language models (LLMs) are developed via self-supervised pre-training followed by a reinforcement-learning fine-tuning phase, often based on human or AI feedback.

reinforcement-learning

Cannot find the paper you are looking for? You can Submit a new open access paper.