Computational resources used during inference, particularly the extended processing in reasoning models that "think" before responding. Inference-time compute affects cost and latency; reasoning models may use substantially more compute per query than standard models.
See: Compute; Inference; Reasoning model