no code implementations • 31 Jan 2025 • Roberto-Rafael Maura-Rivero, Marc Lanctot, Francesco Visin, Kate Larson
Reinforcement Learning from Human Feedback (RLHF), the standard for aligning Large Language Models (LLMs) with human values, is known to fail to satisfy properties that are intuitively desirable, such as respecting the preferences of the majority \cite{ge2024axioms}.
no code implementations • 8 Jan 2025 • Roberto-Rafael Maura-Rivero, Chirag Nagpal, Roma Patel, Francesco Visin
Current methods that train large language models (LLMs) with reinforcement learning feedback, often resort to averaging outputs of multiple rewards functions during training.
1 code implementation • 31 Oct 2024 • Marc Lanctot, Kate Larson, Michael Kaisers, Quentin Berthet, Ian Gemp, Manfred Diaz, Roberto-Rafael Maura-Rivero, Yoram Bachrach, Anna Koop, Doina Precup
This optimal ranking is the maximum likelihood estimate when evaluation data (which we view as votes) are interpreted as noisy samples from a ground truth ranking, a solution to Condorcet's original voting system criteria.