Context window | Model Monster

The maximum number of tokens a model can consider at once, encompassing both the input and the output being generated. Context windows have expanded dramatically–from about 4,000 tokens in early GPT-4 to over 1 million tokens in some current models–but limits still matter. When input exceeds the context window, content is truncated, often without notification to the user. Context window size is distinct from how well a model uses that context; performance often degrades on information buried in the middle or nearer to the end of long inputs.