The range and level of tasks a model can perform reliably, often evidenced through evaluations, benchmarks, and red teaming results. Capability statements can affect intended-use scoping and governance classification.
See: Benchmark; Capabilities; Evaluation (evals); Intended use