field note

Claude 4.7 Costs 30% More Per Session. Anthropic Called It a Feature.

The new tokenizer in Claude Opus 4.7 inflates token counts by 1.325x to 1.47x on real content, not the 1.0-1.35x Anthropic admitted. Same price, 20-30% less work. Here's the math.

A US dollar bill cut in half lengthwise, left side visibly shrunken and compressed while the right side remains full size, photographed on a dark surface with dramatic side lighting

Anthropic shipped Claude Opus 4.7 on April 16 with a new tokenizer. The migration guide acknowledged it uses more tokens. The headline per-token pricing stayed the same. What the migration guide did not say clearly: you are now paying 20 to 30 percent more per Claude Code session, and the capability improvement that justifies the price is a 5-point gain on a single benchmark.

This is not a bug. It is a design choice presented as a footnote.

The Numbers Nobody Read

Anthropic’s official migration guide states the new tokenizer uses “roughly 1.0 to 1.35x as many tokens when processing text compared to previous models.” The operative word is “roughly.” Independent testing on real Claude Code content produced these results:

Content TypeClaude 4.6 TokensClaude 4.7 TokensRatio
CLAUDE.md (5KB)1,3992,0211.445x
User Prompt1,1221,5411.373x
Blog Post (Markdown)1,2091,6541.368x
Git Commit Log9101,2231.344x
Python Stack Trace1,7362,1701.250x
Weighted Average8,25410,9371.325x

The average sits at the top of Anthropic’s documented range. Technical documentation hits 1.47x. TypeScript code hits 1.36x. The migration guide’s 1.0-1.35x range is not wrong, but it is the floor of what real users experience, not the center.

The characters-per-token ratio tells the story mechanistically. English prose dropped from 4.33 characters per token on 4.6 to 3.60 on 4.7. TypeScript dropped from 3.66 to 2.69. The new tokenizer uses smaller sub-word merges. Words get split more granularly. The model attends to individual morphemes rather than longer substrings. Anthropic claims this improves instruction fidelity. The cost is token volume.

The Session Math

For a typical 80-turn Claude Code session with a 6KB static prefix and conversation history growing to 160K tokens:

Claude 4.6: ~$6.65 per session
Claude 4.7: ~$7.86 - $8.76 per session
Effective increase: 20-30%

The same session, the same prompts, the same outputs. The per-token price did not change. The effective price per session did.

This is not theoretical. One heavy Claude Code user with 9,667 sessions analyzed their actual spend: prompt caching represents 93% of their total Claude API bill. When the tokenizer changes, every cache write and every cache read becomes more expensive proportionally. The static prefix you wrote once and cached at session start now occupies 1.3x more cache space. Every subsequent turn pays 1.3x more to read it. Anthropic’s prompt caching is genuinely good engineering. The new tokenizer taxes it at 30%.

What Anthropic Says You Got In Return

The stated justification is instruction-following improvement. On the IFEval benchmark (verifiable constraints like “no commas” or “exactly N words”), Claude 4.7 scored 90% strict compliance versus 4.6’s 85%. A 5-point improvement on one benchmark. Anthropic framed this as the tokenizer enabling “more literal instruction following” because the model processes smaller semantic units.

Box’s evaluations tell a more nuanced story: Opus 4.7 had a 56% reduction in model calls and 50% reduction in tool calls on certain workflows. The model completes some tasks in fewer steps, which can offset the token inflation. But this is workload-dependent. For tasks where the model already completed the job correctly in 4.6, you are now paying 30% more for the same output with no offsetting efficiency gain.

The new xhigh effort level (between high and max) is genuinely useful for tuning the latency-quality tradeoff. The high-resolution image support at 3.3x previous resolution is real. Task budgets give you advisory token limits across agentic loops. These are legitimate improvements. But they are improvements that cost the same regardless of tokenizer choice. The tokenizer tax is collected from every user unconditionally.

The Cache Invalidation Problem

When you upgrade to 4.7, every cached prefix from 4.6 is invalid. The model cannot reuse cache entries encoded under the old tokenizer. Anthropic did not make a separate announcement about this. It was in the migration guide, near the bottom, alongside the token ratio disclosure.

This means every Claude Code user who updates is forced through a cold-start tax. The first session after migration pays full price for every cache write. Every cached conversation from before the upgrade is unreadable without re-encoding under the new tokenizer. If you use conversation history as a long-term memory store, as many Claude Code power users do, you are re-paying to cache it all at the new token density.

This Is Not New Information For Anthropic

Claude’s tokenizer has been less efficient than GPT-4’s since at least Claude 2.1. Third-party analysis in 2024 documented that Claude tokenized the same English text into 20-30% more tokens than GPT-4. The response was that Claude’s tokenizer was “more expressive” and supported lower-resource languages better. The CJK token ratio (1.01x) on 4.7 supports this. For Chinese, Japanese, and Korean text, the new tokenizer barely changes token counts.

But the 30% English tax has been a persistent hidden cost for every enterprise and developer using Claude at scale. The April 2026 update did not improve this. It made it worse. And the fix Anthropic shipped as a migration guide footnote is the same fix they have been shipping for two years: use prompt caching aggressively, use Haiku for cheap tasks, tune your effort levels.

Those are good engineering practices. They are not a solution to a tokenizer regression.

The Competitive Context

Seven major open-source models dropped in the first 12 days of April 2026. Llama 4, Qwen 3.6, Gemma 4, GLM-5.1, OLMo 2, Mistral Small 4, and MiniMax M2.7. Qwen 3.5 and Gemma 4 are both running on local hardware at 85 tokens per second on consumer GPUs. GLM-5.1 is MIT licensed and beats GPT-5.4 on SWE-Bench Pro. The open-source ecosystem has never been more capable per dollar.

Anthropic’s Claude Opus 4.7 is genuinely better than 4.6 at instruction following, tool use, and multi-step agentic tasks. Box’s 56% reduction in model calls is a real efficiency win for some workloads. The new features are not cosmetic.

But the tokenizer tax is a quiet price increase collected in tokens rather than dollars, framed as a feature, disclosed in a migration guide that most Claude Code users will read after their costs have already increased. The open-source models launching alongside it do not have this problem. Their tokenizers are fixed. The weights are free. The infrastructure costs are yours to optimize.

If you are running Claude 4.6 in production today, the upgrade to 4.7 is not free. Run the token counting script before you update:

from anthropic import Anthropic
client = Anthropic()

for model in ["claude-opus-4-6", "claude-opus-4-7"]:
    r = client.messages.count_tokens(
        model=model,
        messages=[{"role": "user", "content": sample_text}],
    )
    print(f"{model}: {r.input_tokens} tokens")

Measure your actual token ratio. Then decide whether the IFEval improvement, the reduced model calls, and the xhigh effort level are worth 20 to 30 percent more per session. The per-token price did not change. The effective price did. That distinction matters when you are running 10,000 sessions a day.