Models Overview

Analyzed: March 31, 2026 leak snapshot

This page documents the model-selection layer that is directly visible in source. Provider-specific setup details belong on separate Bedrock and Vertex pages.

Model families in source

The codebase is built around three active Claude families:

Opus
Sonnet
Haiku

The default mappings visible in src/utils/model/model.ts point to:

Opus 4.6
Sonnet 4.6 for first-party
Sonnet 4.5 as the conservative third-party default
Haiku 4.5 as the default small-fast model

These are implementation defaults from the analyzed snapshot, not stable product promises.

Provider routing

Provider selection is environment-driven in src/utils/model/providers.ts:

CLAUDE_CODE_USE_BEDROCK => Bedrock
CLAUDE_CODE_USE_VERTEX => Vertex AI
CLAUDE_CODE_USE_FOUNDRY => Foundry
Otherwise => first-party Anthropic API

This provider choice affects more than credentials. It also changes default model picks and some capability assumptions.

Main loop model resolution

The main loop model is resolved in priority order:

In-session override, such as /model
Startup override from CLI flags
ANTHROPIC_MODEL
Saved settings
Built-in default

If a user-specified model is not allowed by the model allowlist, the code ignores it and falls back.

Defaults by account and provider

The default model setting is not universal. From getDefaultMainLoopModelSetting():

Anthropic internal builds default to a configured override or Opus with 1m context
Max and Team Premium default to Opus
Other users default to Sonnet
Third-party providers may pin older Sonnet defaults than first-party

That makes the default model a policy decision, not just a constant.

Runtime adjustments

The selected model can still change at execution time. getRuntimeMainLoopModel() applies extra rules such as:

opusplan using Opus in plan mode
haiku upgrading to Sonnet in plan mode
Avoiding certain 1m decisions when token pressure is too high

So there are really two stages:

Session model selection
Per-turn runtime model selection

Thinking modes

The thinking layer is defined in src/utils/thinking.ts:

adaptive
enabled with explicit token budget
disabled

Thinking is enabled by default unless configuration explicitly turns it off. This is one of the clearest places where the source shows Anthropic optimizing for quality first and latency second.

Adaptive thinking support

Thinking support is model-aware and provider-aware.

First-party and Foundry treat Claude 4+ as thinking-capable
Bedrock and Vertex are more conservative
Adaptive thinking is explicitly allowlisted for certain 4.6 models in this snapshot

The code comments are unusually direct here: changing default thinking behavior is treated as a quality-sensitive decision.

Ultrathink and keyword triggers

The source exposes a separate ultrathink concept in src/utils/thinking.ts.

It is gated by both a build-time feature flag and a runtime GrowthBook flag
It is detectable through the literal ultrathink keyword in user text
The code includes utilities to locate and highlight that trigger

This is not just prompt phrasing. It has dedicated runtime support.

Token estimation and small-fast model use

Token counting is handled separately from the main generation model. The source in src/services/tokenEstimation.ts shows a pattern like this:

Prefer Haiku for token estimation
Use Sonnet in cases where provider limitations make Haiku unsuitable
Special-case Vertex global endpoints and certain thinking scenarios

That means even when the main loop runs on Opus or Sonnet, support tasks may use a cheaper or more compatible model underneath.

Streaming behavior

The main loop is stream-oriented. Model output is consumed incrementally so the system can:

Render text as it arrives
Detect thinking blocks
Start tool execution while the response is still streaming
Recover from max_tokens conditions

Model selection and streaming are therefore tightly coupled. A model choice affects not just cost and quality, but also which runtime features are available.

Practical mental model

Claude Code does not have one single “model setting.” It has a stack:

Provider
User-selected model or alias
Account-tier default
Runtime override for special modes
Support-model decisions for token counting and other side tasks

If you want to understand model behavior in a session, you need all five.

Getting Started

Architecture

Internals

Tools

Commands

Models & Inference

State & Memory

Configuration

Advanced Features

Security

Model families in source

Provider routing

Main loop model resolution

Defaults by account and provider

Runtime adjustments

Thinking modes

Adaptive thinking support

Ultrathink and keyword triggers

Token estimation and small-fast model use

Streaming behavior

Practical mental model

Getting Started

Architecture

Internals

Tools

Commands

Models & Inference

State & Memory

Configuration

Advanced Features

Security

​Model families in source

​Provider routing

​Main loop model resolution

​Defaults by account and provider

​Runtime adjustments

​Thinking modes

​Adaptive thinking support

​Ultrathink and keyword triggers

​Token estimation and small-fast model use

​Streaming behavior

​Practical mental model

​Related links

Model families in source

Provider routing

Main loop model resolution

Defaults by account and provider

Runtime adjustments

Thinking modes

Adaptive thinking support

Ultrathink and keyword triggers

Token estimation and small-fast model use

Streaming behavior

Practical mental model

Related links