Agent Loop Internals - Claude Code Internals

Version analyzed: March 31, 2026 leak
This page explains the core loop that enables Claude Code to autonomously execute tools and make decisions.

What is the Agent Loop?

The agent loop is the heart of Claude Code’s autonomy. It’s a continuous cycle where:

Claude generates a response (text + tool calls)
Tools are executed
Results are fed back to Claude
Claude decides whether to continue or stop
Repeat until task completion

This loop is what makes Claude Code “agentic” — it can work autonomously toward a goal without requiring user input after every step.

The Loop Architecture

Core Components

1. QueryEngine Class

Location: src/QueryEngine.ts The QueryEngine orchestrates the entire loop. It’s instantiated once per conversation and maintains state across multiple turns.

class QueryEngine {
  private mutableMessages: Message[]      // Conversation history
  private abortController: AbortController // Cancellation
  private permissionDenials: SDKPermissionDenial[]
  private totalUsage: NonNullableUsage    // Token tracking
  private readFileState: FileStateCache   // File cache
  
  async *submitMessage(prompt): AsyncGenerator<SDKMessage> {
    // Main loop entry point
  }
}

Key Responsibilities:

Maintain conversation state
Track token usage and costs
Handle permission denials
Manage file state cache
Coordinate with query() function

2. query() Function

Location: src/query.ts The inner loop that handles a single turn (potentially multiple API calls).

async function* query(params: QueryParams): AsyncGenerator {
  // Loop state
  let state = {
    messages,
    turnCount: 0,
    maxOutputTokensRecoveryCount: 0,
    // ... more state
  }
  
  while (true) {
    // 1. Call API
    const response = await callClaude(state.messages)
    
    // 2. Stream response
    for await (const event of response) {
      yield event  // Text, thinking, tool uses
    }
    
    // 3. Check stop reason
    const decision = decideNextAction(response.stopReason, state)
    
    if (decision.type === 'terminal') {
      return decision  // Done
    }
    
    // 4. Execute tools if present
    if (hasToolUses(response)) {
      const results = await executeTools(response.toolUses)
      state.messages.push(...results)
    }
    
    // 5. Check budgets
    if (exceedsBudget(state)) {
      return { type: 'budget_exhausted' }
    }
    
    // 6. Continue loop
    state.turnCount++
  }
}

Loop Phases

Phase 1: API Call

The loop starts by calling the Anthropic API with the current conversation state. Request Structure:

{
  model: "claude-sonnet-4-6",
  max_tokens: 8192,
  system: systemPrompt,      // Assembled prompt
  messages: apiMessages,      // Normalized messages
  tools: toolSchemas,         // Available tools
  stream: true                // Streaming enabled
}

Streaming Events:

message_start - Response begins
content_block_start - New content block (text/thinking/tool_use)
content_block_delta - Incremental content
content_block_stop - Block complete
message_delta - Usage updates
message_stop - Response complete

Phase 2: Content Processing

As content streams in, it’s processed by type: Text Blocks:

if (block.type === 'text') {
  yield { type: 'text_delta', text: block.text }
  // Rendered immediately to user
}

Thinking Blocks:

if (block.type === 'thinking') {
  yield { type: 'thinking_delta', thinking: block.thinking }
  // Displayed in thinking UI
}

Tool Use Blocks:

if (block.type === 'tool_use') {
  // Queued for execution after response completes
  toolUses.push({
    id: block.id,
    name: block.name,
    input: block.input
  })
}

Phase 3: Tool Execution

After the response completes, tools are executed. Execution Strategy:

// Partition tools into batches
const batches = partitionToolCalls(toolUses)

for (const batch of batches) {
  if (batch.isConcurrencySafe) {
    // Execute read-only tools in parallel
    await runToolsConcurrently(batch.tools)
  } else {
    // Execute write tools serially
    await runToolsSerially(batch.tools)
  }
}

Concurrency Safety: Tools declare whether they’re safe to run in parallel:

class FileReadTool {
  isConcurrencySafe(input) {
    return true  // Reading is safe
  }
}

class FileWriteTool {
  isConcurrencySafe(input) {
    return false  // Writing must be serial
  }
}

Permission Checking: Before execution, each tool goes through permission checks:

async function runToolUse(toolUse, canUseTool) {
  // 1. Check permission
  const decision = await canUseTool(
    tool,
    toolUse.input,
    context
  )
  
  if (decision.type === 'deny') {
    return { error: decision.reason }
  }
  
  if (decision.type === 'prompt') {
    const approved = await promptUser(decision.message)
    if (!approved) {
      return { error: 'User denied' }
    }
  }
  
  // 2. Execute tool
  const result = await tool.execute(toolUse.input, context)
  
  // 3. Return result
  return {
    type: 'tool_result',
    tool_use_id: toolUse.id,
    content: result
  }
}

Phase 4: Stop Decision

After processing the response, the loop decides whether to continue. Stop Reasons from API:

Stop Reason	Meaning	Action
`end_turn`	Natural completion	Check budget, maybe continue
`max_tokens`	Output limit reached	Retry with higher limit
`stop_sequence`	Explicit stop marker	Terminal - stop
`tool_use`	Waiting for tool results	Continue with results

Decision Logic:

function decideNextAction(stopReason, state) {
  // 1. Tool uses always continue
  if (stopReason === 'tool_use') {
    return { type: 'continue', reason: 'tool_execution' }
  }
  
  // 2. max_tokens triggers retry
  if (stopReason === 'max_tokens') {
    if (state.maxOutputTokensRecoveryCount < 3) {
      return { 
        type: 'continue', 
        reason: 'max_tokens_recovery',
        newMaxTokens: state.maxOutputTokens * 2
      }
    }
  }
  
  // 3. end_turn checks budget
  if (stopReason === 'end_turn') {
    if (hasTokenBudgetRemaining() && state.turnCount < maxTurns) {
      return { type: 'continue', reason: 'budget_continuation' }
    }
  }
  
  // 4. Otherwise stop
  return { type: 'terminal', reason: stopReason }
}

Phase 5: Budget Checks

Multiple budget types can limit the loop: Token Budget:

const tokenBudget = 500_000  // 500k tokens
const tokensUsed = sumTokens(state.messages)

if (tokensUsed > tokenBudget) {
  return { type: 'budget_exhausted', budget: 'tokens' }
}

Turn Limit:

if (state.turnCount >= maxTurns) {
  return { type: 'budget_exhausted', budget: 'turns' }
}

Cost Limit:

const costUSD = calculateCost(state.totalUsage)

if (costUSD > maxBudgetUsd) {
  return { type: 'budget_exhausted', budget: 'cost' }
}

Task Budget (API-level):

// Sent to API, enforced server-side
{
  output_config: {
    task_budget: {
      total: 100_000  // Total output tokens for turn
    }
  }
}

Advanced Features

Auto-Continue on end_turn

When Claude stops with end_turn but has token budget remaining, the loop automatically continues:

if (stopReason === 'end_turn' && hasTokenBudgetRemaining()) {
  // Add a continuation message
  messages.push({
    type: 'user',
    content: '<continue>'  // Implicit continuation
  })
  
  // Loop continues
  continue
}

This enables Claude to work on complex tasks without artificial turn limits.

max_tokens Recovery

When Claude hits the output token limit mid-response:

if (stopReason === 'max_tokens') {
  // Double the limit and retry
  maxOutputTokens *= 2
  
  // Retry the same request
  continue
}

Retries up to 3 times before giving up.

Thinking Mode Integration

When thinking is enabled, thinking blocks are preserved across the loop:

// Thinking blocks must stay in context
if (hasThinkingBlocks(messages)) {
  // Keep thinking blocks for entire trajectory
  // (current turn + tool results + next turn)
}

Context Compression

When approaching context limits, the loop triggers auto-compact:

if (estimatedTokens > contextWindow * 0.9) {
  // Summarize old messages
  const compacted = await compactMessages(messages)
  
  // Insert boundary marker
  messages = [
    ...compacted,
    { type: 'system', content: '<compact_boundary>' },
    ...recentMessages
  ]
}

Parallel Tool Execution

The loop optimizes tool execution by running safe tools in parallel: Concurrency Limit:

const MAX_CONCURRENT_TOOLS = 10  // Configurable via env var

Error Handling

The loop handles various error conditions:

API Errors

try {
  const response = await callAPI(messages)
} catch (error) {
  if (isRetryable(error)) {
    // Exponential backoff
    await sleep(backoffMs)
    continue
  }
  
  // Non-retryable - stop
  return { type: 'error', error }
}

Tool Errors

try {
  const result = await tool.execute(input)
} catch (error) {
  // Return error as tool result
  return {
    type: 'tool_result',
    tool_use_id: toolUse.id,
    content: error.message,
    is_error: true
  }
}

Permission Denials

if (userDeniedPermission) {
  // Track denial
  permissionDenials.push({
    tool: toolName,
    reason: 'user_denied'
  })
  
  // Return error result
  return {
    type: 'tool_result',
    tool_use_id: toolUse.id,
    content: 'Permission denied by user',
    is_error: true
  }
}

State Management

The loop maintains several types of state:

Conversation State

{
  messages: Message[],           // Full conversation
  totalUsage: Usage,             // Cumulative token usage
  turnCount: number,             // Number of turns
  permissionDenials: Denial[]    // Denied tools
}

Tool Context

{
  tools: Tools,                  // Available tools
  mcpClients: MCPClient[],       // MCP connections
  readFileCache: FileCache,      // File content cache
  inProgressToolUseIDs: Set<string>  // Currently executing
}

Budget Tracking

{
  tokenBudget: number,           // Remaining tokens
  maxTurns: number,              // Turn limit
  maxBudgetUsd: number,          // Cost limit
  taskBudget: { total: number }  // API task budget
}

Performance Characteristics

Latency Per Turn

Phase	Typical Time	Notes
API First Token	200-500ms	Network + model startup
Streaming	Variable	~50 tokens/sec
Tool Execution	Variable	Depends on tool
Permission Check	0-∞	Waits for user if needed
State Persistence	10-30ms	Async, non-blocking

Throughput

Parallel tools: Up to 10 concurrent
Serial tools: One at a time
API calls: One at a time (streaming)

Memory Usage

Messages: Grows with conversation
File cache: Bounded by cache size
Tool results: Truncated if too large

Control Flow

High-level flow from input to output

Prompt Assembly

How system prompts are built

Tools Overview

Understanding tool execution

State Management

How state is managed across turns

Getting Started

Architecture

Internals

Tools

Commands

Models & Inference

State & Memory

Configuration

Advanced Features

Security

​What is the Agent Loop?

​The Loop Architecture

​Core Components

​1. QueryEngine Class

​2. query() Function

​Loop Phases

​Phase 1: API Call

​Phase 2: Content Processing

​Phase 3: Tool Execution

​Phase 4: Stop Decision

​Phase 5: Budget Checks

​Advanced Features

​Auto-Continue on end_turn

​max_tokens Recovery

​Thinking Mode Integration

​Context Compression

​Parallel Tool Execution

​Error Handling

​API Errors

​Tool Errors

​Permission Denials

​State Management

​Conversation State

​Tool Context

​Budget Tracking

​Performance Characteristics

​Latency Per Turn

​Throughput

​Memory Usage

​Related Links

Control Flow

Prompt Assembly

Tools Overview

State Management

What is the Agent Loop?

The Loop Architecture

Core Components

1. QueryEngine Class

2. query() Function

Loop Phases

Phase 1: API Call

Phase 2: Content Processing

Phase 3: Tool Execution

Phase 4: Stop Decision

Phase 5: Budget Checks

Advanced Features

Auto-Continue on end_turn

max_tokens Recovery

Thinking Mode Integration

Context Compression

Parallel Tool Execution

Error Handling

API Errors

Tool Errors

Permission Denials

State Management

Conversation State

Tool Context

Budget Tracking

Performance Characteristics

Latency Per Turn

Throughput

Memory Usage

Related Links