Reinforcement Learning from AI Feedback

A variant of RLHF using AI-generated feedback rather than human feedback to reduce cost and scale. RLAIF may have different alignment properties than human feedback and is often combined with Constitutional AI approaches.