Memory storing prior context during inference to enable efficient generation. KV cache size affects context window limits and inference cost, which is relevant when understanding capacity constraints and pricing models.
See: Context window; Memory; Transformer