Analyzed: March 31, 2026 leak snapshot
This page documents the model-selection layer that is directly visible in source. Provider-specific setup details belong on separate Bedrock and Vertex pages.

Model families in source

The codebase is built around three active Claude families:
  • Opus
  • Sonnet
  • Haiku
The default mappings visible in src/utils/model/model.ts point to:
  • Opus 4.6
  • Sonnet 4.6 for first-party
  • Sonnet 4.5 as the conservative third-party default
  • Haiku 4.5 as the default small-fast model
These are implementation defaults from the analyzed snapshot, not stable product promises.

Provider routing

Provider selection is environment-driven in src/utils/model/providers.ts:
  • CLAUDE_CODE_USE_BEDROCK => Bedrock
  • CLAUDE_CODE_USE_VERTEX => Vertex AI
  • CLAUDE_CODE_USE_FOUNDRY => Foundry
  • Otherwise => first-party Anthropic API
This provider choice affects more than credentials. It also changes default model picks and some capability assumptions.

Main loop model resolution

The main loop model is resolved in priority order:
  1. In-session override, such as /model
  2. Startup override from CLI flags
  3. ANTHROPIC_MODEL
  4. Saved settings
  5. Built-in default
If a user-specified model is not allowed by the model allowlist, the code ignores it and falls back.

Defaults by account and provider

The default model setting is not universal. From getDefaultMainLoopModelSetting():
  • Anthropic internal builds default to a configured override or Opus with 1m context
  • Max and Team Premium default to Opus
  • Other users default to Sonnet
  • Third-party providers may pin older Sonnet defaults than first-party
That makes the default model a policy decision, not just a constant.

Runtime adjustments

The selected model can still change at execution time. getRuntimeMainLoopModel() applies extra rules such as:
  • opusplan using Opus in plan mode
  • haiku upgrading to Sonnet in plan mode
  • Avoiding certain 1m decisions when token pressure is too high
So there are really two stages:
  • Session model selection
  • Per-turn runtime model selection

Thinking modes

The thinking layer is defined in src/utils/thinking.ts:
  • adaptive
  • enabled with explicit token budget
  • disabled
Thinking is enabled by default unless configuration explicitly turns it off. This is one of the clearest places where the source shows Anthropic optimizing for quality first and latency second.

Adaptive thinking support

Thinking support is model-aware and provider-aware.
  • First-party and Foundry treat Claude 4+ as thinking-capable
  • Bedrock and Vertex are more conservative
  • Adaptive thinking is explicitly allowlisted for certain 4.6 models in this snapshot
The code comments are unusually direct here: changing default thinking behavior is treated as a quality-sensitive decision.

Ultrathink and keyword triggers

The source exposes a separate ultrathink concept in src/utils/thinking.ts.
  • It is gated by both a build-time feature flag and a runtime GrowthBook flag
  • It is detectable through the literal ultrathink keyword in user text
  • The code includes utilities to locate and highlight that trigger
This is not just prompt phrasing. It has dedicated runtime support.

Token estimation and small-fast model use

Token counting is handled separately from the main generation model. The source in src/services/tokenEstimation.ts shows a pattern like this:
  • Prefer Haiku for token estimation
  • Use Sonnet in cases where provider limitations make Haiku unsuitable
  • Special-case Vertex global endpoints and certain thinking scenarios
That means even when the main loop runs on Opus or Sonnet, support tasks may use a cheaper or more compatible model underneath.

Streaming behavior

The main loop is stream-oriented. Model output is consumed incrementally so the system can:
  • Render text as it arrives
  • Detect thinking blocks
  • Start tool execution while the response is still streaming
  • Recover from max_tokens conditions
Model selection and streaming are therefore tightly coupled. A model choice affects not just cost and quality, but also which runtime features are available.

Practical mental model

Claude Code does not have one single “model setting.” It has a stack:
  • Provider
  • User-selected model or alias
  • Account-tier default
  • Runtime override for special modes
  • Support-model decisions for token counting and other side tasks
If you want to understand model behavior in a session, you need all five.