Loading...

Token

Token
C T

The unit of text that models actually process. Tokens are not words: common words may be single tokens, but less common words are split into pieces ("contract" might be one token; "indemnification" might be three). A rough approximation is 0.75 words per token for English. Token counts determine API costs, which are typically priced per token, and they define context window limits. Understanding tokenization helps explain why non-English text and technical terminology often perform worse–they require more tokens to represent the same meaning.

See: Context window; Rate limiting; Tokenization