Research and practices aimed at reducing harmful or unintended behavior of AI systems. In enterprise, procurement, and policy contexts, “AI safety” may refer to model training and post-training methods (e.g., RLHF), system-level guardrails, abuse monitoring, and evaluation programs tied to a stated threat model.
See: Alignment; Guardrails; Red teaming; Safety policy