Loading...

Inference-time scaling

Computational resources used during inference, particularly the extended processing in reasoning models that "think" before responding. Inference-time compute affects cost and latency; reasoning models may use substantially more compute per query than standard models.

See: Compute; Inference; Reasoning model