What Is a Context Window (and Why 1 Million Tokens Is the Ceiling)

Module 1 · How Claude Code Thinks: Context, Tokens, Memory

10 min

1.1

Lesson content

This lesson is available directly inside the SEOGANT learning flow with progress tracking and course navigation.

Reading lesson Free access 10 min

Lesson format

Reading lesson with structured written material and clear navigation inside the course flow.

Access

This lesson is free and open to all visitors.

Lesson content

What Is a Context Window (and Why 1 Million Tokens Is the Ceiling)

Every time you press Enter in Claude Code, a large package of text is sent to the Claude model. That package has a hard limit called the context window. On Claude Opus 4.7 (current generation at launch in 2026), that limit is 1,000,000 tokens — roughly 750,000 English words, or about 3,000 pages of a book.

This sounds enormous. It is also finite. And it is the single physical constraint that shapes how Claude Code remembers, forgets, and costs money.

What Actually Goes Into the Window

The 1M budget is not spent on your last message alone. On every turn, Claude Code stuffs the window with everything it needs to answer correctly:

System prompt — the invisible instructions from Anthropic that teach Claude how to behave as an agent (several thousand tokens)
Your CLAUDE.md — the project-level identity file we'll spend this course mastering
Conversation history — every message you and Claude have exchanged in the current session
Tool results — every file Claude has read, every grep output, every bash command that returned text
The new message you just typed

The assistant's reply is generated inside the same budget. If 950k tokens are already used, there are only 50k left for Claude to think and write.

A Token Is Not a Word

A token is roughly 3–4 English characters, or a fragment of a word. "Claude Code" is about 3 tokens. A 200-line TypeScript file is typically 2,000–4,000 tokens. A large JSON response from an API can easily be 20,000 tokens. This is why reading a huge file "just to check something" is expensive.

Why 1 Million and Not More

The 1M ceiling is not just a dial the Anthropic team refuses to turn up. It reflects three hard realities:

Quadratic attention cost. Transformer models compare every token to every other token. Double the window, roughly quadruple the compute.
Quality degradation. Research consistently shows models "lose" information placed in the middle of very long contexts — the so-called lost-in-the-middle effect. More window doesn't linearly mean more recall.
Cost to the user. Input tokens are billed. A 1M-token request costs roughly 100× a 10k-token one. Even if the model could hold 10M tokens, almost nobody would want to pay for it on every turn.

The Illusion of Memory

Here is the mental model shift this course will keep returning to: Claude Code has no memory between sessions. When you quit and restart, the window is empty. Everything that felt like memory was just text that lived inside the 1M window for one session.

Real persistent memory — the kind where Claude knows your business, your clients, your design system on Monday and still knows it on Friday — must live outside the window, on disk, and be loaded back in at the start of each session. That loading mechanism is what the next five lessons and the rest of the course are about.

1 million tokens sounds like infinite memory. It is a hotel room — a large one — that gets cleared when you check out. This course is about building a house.