Inference-time scaling | Model Monster

Computational resources used during inference, particularly the extended processing in reasoning models that "think" before responding. Inference-time compute affects cost and latency; reasoning models may use substantially more compute per query than standard models.