This page documents the model-selection layer that is directly visible in source. Provider-specific setup details belong on separate Bedrock and Vertex pages.
Model families in source
The codebase is built around three active Claude families:- Opus
- Sonnet
- Haiku
src/utils/model/model.ts point to:
- Opus 4.6
- Sonnet 4.6 for first-party
- Sonnet 4.5 as the conservative third-party default
- Haiku 4.5 as the default small-fast model
Provider routing
Provider selection is environment-driven insrc/utils/model/providers.ts:
CLAUDE_CODE_USE_BEDROCK=> BedrockCLAUDE_CODE_USE_VERTEX=> Vertex AICLAUDE_CODE_USE_FOUNDRY=> Foundry- Otherwise => first-party Anthropic API
Main loop model resolution
The main loop model is resolved in priority order:- In-session override, such as
/model - Startup override from CLI flags
ANTHROPIC_MODEL- Saved settings
- Built-in default
Defaults by account and provider
The default model setting is not universal. FromgetDefaultMainLoopModelSetting():
- Anthropic internal builds default to a configured override or Opus with
1mcontext - Max and Team Premium default to Opus
- Other users default to Sonnet
- Third-party providers may pin older Sonnet defaults than first-party
Runtime adjustments
The selected model can still change at execution time.getRuntimeMainLoopModel() applies extra rules such as:
opusplanusing Opus in plan modehaikuupgrading to Sonnet in plan mode- Avoiding certain
1mdecisions when token pressure is too high
- Session model selection
- Per-turn runtime model selection
Thinking modes
The thinking layer is defined insrc/utils/thinking.ts:
adaptiveenabledwith explicit token budgetdisabled
Adaptive thinking support
Thinking support is model-aware and provider-aware.- First-party and Foundry treat Claude 4+ as thinking-capable
- Bedrock and Vertex are more conservative
- Adaptive thinking is explicitly allowlisted for certain 4.6 models in this snapshot
Ultrathink and keyword triggers
The source exposes a separateultrathink concept in src/utils/thinking.ts.
- It is gated by both a build-time feature flag and a runtime GrowthBook flag
- It is detectable through the literal
ultrathinkkeyword in user text - The code includes utilities to locate and highlight that trigger
Token estimation and small-fast model use
Token counting is handled separately from the main generation model. The source insrc/services/tokenEstimation.ts shows a pattern like this:
- Prefer Haiku for token estimation
- Use Sonnet in cases where provider limitations make Haiku unsuitable
- Special-case Vertex global endpoints and certain thinking scenarios
Streaming behavior
The main loop is stream-oriented. Model output is consumed incrementally so the system can:- Render text as it arrives
- Detect thinking blocks
- Start tool execution while the response is still streaming
- Recover from
max_tokensconditions
Practical mental model
Claude Code does not have one single “model setting.” It has a stack:- Provider
- User-selected model or alias
- Account-tier default
- Runtime override for special modes
- Support-model decisions for token counting and other side tasks