Version analyzed: March 31, 2026 leak
This page traces the exact path from when a user types a message to when they see the final output.

Overview

Claude Code’s control flow is a sophisticated pipeline that transforms user input into agentic actions. The system operates in a continuous loop, executing tools, gathering results, and deciding whether to continue or stop.

Complete Flow Diagram

Phase-by-Phase Breakdown

Phase 1: User Input Processing

Location: src/utils/processUserInput/processUserInput.ts When the user presses Enter:
  1. Raw Input Capture
    • Text content
    • Pasted images/files
    • Cursor position
  2. Slash Command Detection
    Illustrative pseudocode (not verbatim source).
    if (input.startsWith('/')) {
      // Parse command name and arguments
      // Queue command for execution
    }
    
  3. Context Key Expansion
    • #File → Read file content
    • #Folder → List directory
    • #Problems → Get diagnostics
    • #Terminal → Capture terminal output
    • #Git → Show git diff
  4. Attachment Processing
    • Images → Base64 encode, resize if needed
    • PDFs → Extract text, handle page ranges
    • Files → Read content with line numbers
  5. Memory Attachment
    • Auto-attach relevant memory files
    • Filter duplicates
    • Add freshness notes

Phase 2: Query Engine Initialization

Location: src/QueryEngine.ts The QueryEngine orchestrates the entire agentic loop:
  1. Load Session State
    • Previous messages
    • Tool permission context
    • File state cache
    • Cost tracking
  2. Prepare Tools
    • Filter by permission mode
    • Apply deny rules
    • Inject MCP tools
    • Build tool schemas
  3. Initialize Tracking
    • Token budget
    • Turn counter
    • Cost accumulator
    • Abort controller

Phase 3: Prompt Assembly

Location: src/constants/prompts.ts, src/utils/queryContext.ts The system prompt is built in layers:
Illustrative pseudocode (not verbatim source).
// 1. Static prefix (cacheable)
const prefix = "You are Claude Code, Anthropic's official CLI for Claude."

// 2. Tool instructions
const toolGuidance = buildToolInstructions(tools)

// 3. User context (dynamic)
const userContext = {
  cwd: "/path/to/project",
  git_branch: "main",
  git_status: "modified: 2 files",
  os: "macOS 14.0",
  shell: "zsh"
}

// 4. System context
const systemContext = {
  date: "2026-03-31",
  capabilities: ["file_read", "bash", "web_search"]
}

// 5. Memory
const memory = loadMemoryPrompt() // MEMORY.md content

// 6. MCP instructions
const mcpInstructions = buildMcpInstructions(mcpClients)

// Final assembly
const systemPrompt = [
  prefix,
  ...toolGuidance,
  DYNAMIC_BOUNDARY, // Cache split point
  userContext,
  systemContext,
  memory,
  mcpInstructions
]
Cache Optimization:
  • Everything before DYNAMIC_BOUNDARY uses scope: 'global'
  • Dynamic content after boundary uses scope: 'session'
  • Reduces API costs significantly

Phase 4: API Communication

Location: src/query.ts, src/services/api/claude.ts The query function handles API communication:
  1. Message Normalization
    Illustrative pseudocode (not verbatim source).
    // Convert internal format to API format
    const apiMessages = normalizeMessagesForAPI(messages)
    
  2. Token Budget Check
    Illustrative pseudocode (not verbatim source).
    if (estimatedTokens > contextWindow) {
      // Trigger auto-compact
      messages = await compactMessages(messages)
    }
    
  3. Stream Request
    Illustrative pseudocode (not verbatim source).
    const stream = await anthropic.messages.stream({
      model: mainLoopModel,
      max_tokens: 8192,
      system: systemPrompt,
      messages: apiMessages,
      tools: toolSchemas,
      stream: true
    })
    
  4. Handle Stream Events
    • message_start → Initialize response
    • content_block_start → New content block
    • content_block_delta → Incremental text/thinking
    • content_block_stop → Block complete
    • message_stop → Response complete

Phase 5: The Agentic Loop

Location: src/query.ts, src/services/tools/toolOrchestration.ts The core loop that makes Claude Code “agentic”:
Illustrative pseudocode (not verbatim source).
while (true) {
  // 1. Get response from API
  const response = await streamResponse()
  
  // 2. Check for tool uses
  const toolUses = extractToolUses(response)
  
  if (toolUses.length === 0) {
    // No tools → check stop reason
    if (stopReason === 'end_turn' && tokenBudgetRemaining > 0) {
      // Continue the turn
      continue
    }
    // Done
    break
  }
  
  // 3. Execute tools (parallel when possible)
  const toolResults = await executeTools(toolUses, {
    parallel: true,
    permissionCheck: true
  })
  
  // 4. Add results to conversation
  messages.push(...toolResults)
  
  // 5. Check budget
  if (exceedsTokenBudget() || exceedsTurnLimit()) {
    break
  }
  
  // 6. Continue loop with tool results
}

Phase 6: Tool Execution

Location: src/services/tools/toolOrchestration.ts, individual tool files Tool execution follows a strict protocol:
  1. Permission Check
    Illustrative pseudocode (not verbatim source).
    const decision = await checkPermission(toolName, input, context)
    
    if (decision.type === 'deny') {
      return { error: decision.reason }
    }
    
    if (decision.type === 'prompt') {
      const approved = await promptUser(decision.message)
      if (!approved) {
        return { error: 'User denied' }
      }
    }
    
  2. Input Validation
    Illustrative pseudocode (not verbatim source).
    const validated = toolSchema.parse(input)
    
  3. Execution
    Illustrative pseudocode (not verbatim source).
    const result = await tool.execute(validated, context)
    
  4. Result Formatting
    Illustrative pseudocode (not verbatim source).
    return {
      type: 'tool_result',
      tool_use_id: toolUse.id,
      content: formatResult(result)
    }
    
  5. Progress Reporting
    Illustrative pseudocode (not verbatim source).
    // For long-running tools
    onProgress?.({ 
      status: 'running',
      message: 'Processing...'
    })
    

Phase 7: Stop Decision

Location: src/query.ts The system decides when to stop based on:
  1. Stop Reasons from API
    • end_turn → Natural completion
    • max_tokens → Output limit reached (retry with higher limit)
    • stop_sequence → Explicit stop marker
    • tool_use → Waiting for tool results
  2. Budget Checks
    • Token budget exhausted
    • Turn limit reached
    • Cost limit exceeded
    • Time limit exceeded
  3. User Interruption
    • Ctrl+C pressed
    • Stop command issued
  4. Error Conditions
    • API error (non-retryable)
    • Tool execution failure (critical)
    • Permission denial (blocking)

Phase 8: State Persistence

Location: src/utils/sessionStorage.ts After each turn:
  1. Record Transcript
    {"type":"user","content":"...","timestamp":1234567890}
    {"type":"assistant","content":"...","timestamp":1234567891}
    {"type":"tool_use","name":"bash","input":{...},"timestamp":1234567892}
    {"type":"tool_result","content":"...","timestamp":1234567893}
    
  2. Update State
    • Cost tracking
    • Token usage
    • File state cache
    • Permission context
  3. Flush to Disk
    await flushSessionStorage()
    

Special Cases

Thinking Mode

When extended thinking is enabled: Thinking blocks are:
  • Displayed in real-time
  • Preserved in conversation
  • Token accounting behavior depends on provider/model/runtime handling
  • Can be redacted in some modes

REPL Mode

When REPL mode is enabled: REPL mode wraps primitive tools (Read, Write, Edit, Bash) in a single tool call.

Auto-Compact

When approaching context limits:

Performance Characteristics

Latency Breakdown

Illustrative latency ranges for reasoning about bottlenecks (not a benchmark from instrumented source measurements):
PhaseTimeNotes
Input Processing10-50msDepends on attachments
Prompt Assembly50-100msCached after first turn
API First Token200-500msNetwork + model startup
StreamingVariable~50 tokens/sec
Tool ExecutionVariableDepends on tool
State Persistence10-30msAsync, non-blocking

Optimization Strategies

  1. Parallel Tool Execution
    • Independent tools run concurrently
    • Reduces total turn time
  2. Prompt Caching
    • Static prefix cached globally
    • Session context cached per-session
    • Saves ~80% of prompt tokens
  3. Streaming Display
    • Text rendered as it arrives
    • User sees progress immediately
    • Perceived latency reduced
  4. Lazy Loading
    • Heavy modules loaded on demand
    • Faster startup time

Error Handling

Retry Logic

Illustrative pseudocode (not verbatim source).
const retryConfig = {
  maxRetries: 3,
  backoff: 'exponential',
  retryableErrors: [
    'overloaded_error',
    'timeout',
    'rate_limit_error'
  ]
}

Fallback Models

Illustrative pseudocode (not verbatim source).
if (primaryModelFails) {
  // Try fallback model
  const fallbackResponse = await query({
    ...params,
    model: fallbackModel
  })
}

Graceful Degradation

  • Tool failures don’t crash the session
  • Permission denials return error messages
  • API errors show user-friendly messages
  • State always persists, even on crash

Architecture Overview

High-level system architecture

Agent Loop

Deep dive into the agentic loop

Prompt Assembly

How system prompts are constructed

Tools Overview

Understanding tool execution