Loading...

Optimization

Methods to improve a model or system’s performance, cost, latency, or resource use (e.g., quantization, caching, batching, prompt compression). Optimization choices can affect accuracy, safety behavior, and reproducibility.

See: Key-Value cache; Latency; Quantization