Glossary

What is context window?

A context window is the maximum amount of text a model can take in for a single request — its working memory, measured in tokens, for everything you send and everything it generates.

← All glossary terms

A context window is the maximum amount of text a language model can attend to in a single call, measured in tokens. It is the model's working memory for one request: the system instructions, the conversation history, any retrieved documents, the user's question, and the model's own response all have to fit within it. Anything outside the window simply isn't visible to the model — it cannot reason about text it can't see.

Context windows have grown enormously, from a few thousand tokens to hundreds of thousands or more in current models, which changes what's feasible — you can fit whole documents, long conversations, or large code files into a single prompt. But a bigger window is not a free lunch. Every token in the window is processed on each call, so cost and latency scale with how much you put in. Models can also attend less reliably to information buried in the middle of a very long context, so simply dumping everything in often produces worse results than retrieving the right few passages.

In production, the context window shapes core design decisions. It is why retrieval-augmented generation exists — you can't fit an entire knowledge base in the window, so you retrieve only the relevant chunks for each query. It governs how much conversation history a chatbot can carry before it must summarise or drop older turns. And it sets a hard ceiling on tasks like summarising a document longer than the window, which then have to be broken into pieces and stitched back together.

The context window matters because it is a fundamental constraint that quietly determines architecture, cost, and quality. Teams that ignore it hit walls — truncated inputs, ballooning bills, degraded answers from over-stuffed prompts. Teams that design around it — retrieving precisely, managing history deliberately, and putting the most important information where the model attends to it best — get systems that are cheaper, faster, and more accurate. It is one of the clearest examples of why applied AI is an engineering discipline, not just a model-access problem.

From definition to deployment

Understanding the term is step one. Bring us the problem and we'll build the system that solves it — and prove it moved the number.

Start a conversation

See our work