In reinforcement learning, the function that assigns a numeric score (“reward”) to behaviors, guiding the model toward preferred outcomes. Reward function design influences aligned behavior and can encode tradeoffs.
In reinforcement learning, the function that assigns a numeric score (“reward”) to behaviors, guiding the model toward preferred outcomes. Reward function design influences aligned behavior and can encode tradeoffs.