Loading...

Quantization

Reducing numerical precision of model weights to decrease memory requirements and speed inference. Quantization can change model behavior in subtle ways; treat quantized models as different versions requiring separate validation for regulated deployments.

See: Edge deployment; Model compression; Performance