Skip to content
All posts
AI

Claude Just Killed the Context Window Problem

March 15, 2026·Read on Medium·

Opus 4.6 and Sonnet 4.6 now ship with 1M tokens at standard pricing, no premium, no beta header and no engineering workarounds required

On March 13, 2026, Anthropic made the 1M context window generally available for Claude Opus 4.6 and Sonnet 4.6. No waitlist. No beta access. No special header. No price multiplier for long requests.

That last part is the one worth paying attention to.

What actually changed

The headline number is 1 million tokens. That is 5x the previous 200K limit. But the more important change is the pricing structure.

Before this announcement, sending a long-context request to Opus 4.6 triggered a long-context premium. Once your input exceeded 200K tokens, the per-token rate jumped. The 1M window existed in beta, but using it cost more per token than a short request.

That is now gone.

A 900K-token request costs the same per-token rate as a 9K one. Opus 4.6 is billed at $5 input and $25 output per million tokens across the entire window. Sonnet 4.6 is $3 input and $15 output per million tokens, same rate end to end. Your standard account rate limits apply across every context length. There is no separate 1M rate limit tier to worry about.

The other thing that changed is the media limit. You can now send up to 600 images or PDF pages per request, up from 100. If you were building document analysis pipelines and hitting that ceiling, that limit is now six times higher.

What 1 million tokens actually is

A million tokens is approximately 750,000 words. That is roughly 10 full-length novels. In developer terms it is a large codebase, a year of Slack history, hundreds of research papers or the complete transcript of a long-running agent session including every tool call, observation and intermediate reasoning step.

The previous 200K limit was not small. Most use cases fit inside it comfortably. But the use cases that did not fit were the ones where you needed the model to reason across an entire system rather than a slice of it. Legal review across an entire contract negotiation history. Debugging that requires understanding the interaction between files across a codebase, not just the file where the error appears. Agent sessions that accumulate tool calls and context over time until the window fills and compaction erases what the agent found earlier.

Those are exactly the use cases this change addresses.

The accuracy question

A large window is only useful if the model can actually find and reason about information anywhere inside it. Filling a million-token window with text and hoping the model reads it carefully is not a strategy.

Anthropic benchmarked Opus 4.6 on MRCR v2, a multi-needle retrieval benchmark that hides eight pieces of key information across a million tokens of text and requires the model to find all of them. Opus 4.6 hit 78.3% accuracy at the full 1M window.

Source: Anthropic — MRCR v2 accuracy at 1M token context length

Sonnet 4.5, the previous generation, hit 18.5% on the same benchmark. That is a fourfold improvement in the model’s ability to actually locate and use information buried deep in context.

At 256K tokens Opus 4.6 climbs to 93% accuracy on the same test.

This matters because retrieval accuracy at scale is the real problem. Any model can technically receive a million tokens. Whether it can actually reason coherently across that entire window is a different question, and the benchmark numbers suggest Opus 4.6 is considerably more capable at it than previous models.

What this means for engineers who were working around the limit

The standard workarounds for long-context problems were chunking, RAG pipelines, lossy summarization and context clearing. Each of these involves a tradeoff.

Chunking breaks the semantic relationships between pieces of context. If the answer to your question lives at the boundary between chunk three and chunk four, a chunking pipeline may never surface it. RAG retrieval is only as good as your embedding and retrieval logic. If you embed poorly or retrieve the wrong chunks, the model never sees what it needs. Summarization loses detail and introduces irreversibility. You cannot un-summarize. Context clearing means the model starts fresh, which is fine until it needs to refer back to something that no longer exists in the window.

With a 1M window at standard pricing, the engineering cost of building and maintaining these workarounds becomes harder to justify for a significant portion of use cases. Not all of them. RAG is still the right architecture when you have data that does not fit in even 1M tokens, when you need freshness or when you need to query across a corpus rather than reason about a fixed document set. But for use cases where the problem is “I need the model to see the whole thing at once,” you can now often just give it the whole thing.

Claude Code specifically

For developers using Claude Code, this change is immediately visible.

Claude Code previously hit the context window during long sessions and triggered compaction. Compaction summarises earlier parts of the conversation and discards the detail. That meant tool call results, search findings and intermediate reasoning steps from earlier in the session could vanish before the session ended. The model would sometimes re-discover things it had already found, or lose track of constraints it had already reasoned about.

With 1M context, Claude Code sessions can accumulate five times as much history before compaction kicks in. Every tool call output, every database query result and every file read stays in the window. The session maintains integrity for longer.

Claude Code Max, Team and Enterprise users on Opus 4.6 get this automatically. No configuration required.

Where it is available

The 1M context window is available today on the Claude Platform natively, Amazon Bedrock, Google Cloud Vertex AI and Microsoft Azure Foundry.

No beta header is required. If you were previously including the long-context beta header in your requests, you can remove it. Requests over 200K tokens work automatically. If you leave the header in, it is ignored. No code changes are required unless you want to clean up.

The 1M window remains in beta for Claude Sonnet 4.5 and Sonnet 4. The GA change applies specifically to Opus 4.6 and Sonnet 4.6.

The competitive context

Google’s Gemini models have offered 1M context for some time. This announcement closes that gap for Anthropic’s flagship models and removes the pricing premium that previously made large windows expensive enough to avoid in production.

The difference Anthropic is pointing to is not window size alone. It is accuracy inside that window, combined with the instruction-following and reasoning quality the Claude models are known for. The MRCR v2 benchmark numbers are part of that argument.

Whether that holds up in your specific workload is something you can now test at standard pricing rather than at a premium.

The short version

One million tokens. Standard pricing across the full window. Up to 600 images or PDF pages per request. No beta header. No long-context surcharge. Available now on every major cloud platform.

If you were chunking, summarising or otherwise engineering around the 200K limit for a use case where the model really needed to see the whole document, the workaround may no longer be necessary.

Found this helpful?

If this article saved you time or solved a problem, consider supporting — it helps keep the writing going.

Originally published on Medium.

View on Medium
Claude Just Killed the Context Window Problem — Hafiq Iqmal — Hafiq Iqmal