no code implementations • 15 Nov 2023 • Miguel Moura Ramos, Patrick Fernandes, António Farinhas, André F. T. Martins
A core ingredient in RLHF's success in aligning and improving large language models (LLMs) is its reward model, trained using human feedback on model outputs.