Reinforcement Learning with Verifiable Rewards

Training using automatically verifiable outcomes such as correct code execution or valid mathematical proofs. RLVR is used for training reasoning models on tasks with objectively checkable answers.