A standardized test for measuring model performance on specific tasks, enabling comparison across models. Benchmark claims in marketing often specify which benchmark, version, and conditions; benchmarks may not reflect real-world performance.
Benchmark
C
T
See: Accuracy; Evaluation (evals)