Reward model | Model Monster

A model trained to predict human preferences, used to guide RL training by scoring candidate outputs. Reward model quality directly affects alignment effectiveness and the behaviors the final model learns to exhibit.