Value alignment | Model Monster

Ensuring AI systems pursue goals and exhibit behaviors consistent with human values and intentions. Value alignment is a central AI safety concept; misalignment between system objectives and human values can cause harmful behavior even without adversarial attack.