A benchmark testing model knowledge across many academic subjects. MMLU scores are commonly cited in model marketing, but benchmarks don't guarantee real-world performance and may not reflect capabilities on domain-specific tasks.
See: Benchmark; Evaluation (evals)