Offline Reinforcement Learning Methods

Direct Preference Optimization

Introduced by Rafailov et al. in Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Papers


Paper Code Results Date Stars

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories