Distillation | Model Monster

Training a smaller "student" model to mimic a larger "teacher" model's behavior by learning from the teacher's outputs rather than the original training data. Distillation does not require access to the teacher model's weights, only the ability to query it and observe its outputs. This creates trade secret and competitive concerns: even without sharing weights, unguarded API access may allow third parties to replicate proprietary model capabilities. Distillation can also transfer copyrighted expression if the teacher's outputs are used as training data. Many model licenses and API terms of service explicitly prohibit using outputs to train other models.