The process of converting text into tokens that the model can process. Tokenization affects how different languages, technical content, and special characters are handled, potentially causing issues with non-English text or domain-specific terminology. In contrast to vectorization, tokenization is a reversible, direct translation of the input.
See: Context window; Token; Vectorization