Version analyzed: March 31, 2026 leak
This page explains the core loop that enables Claude Code to autonomously execute tools and make decisions.

What is the Agent Loop?

The agent loop is the heart of Claude Code’s autonomy. It’s a continuous cycle where:
  1. Claude generates a response (text + tool calls)
  2. Tools are executed
  3. Results are fed back to Claude
  4. Claude decides whether to continue or stop
  5. Repeat until task completion
This loop is what makes Claude Code “agentic” — it can work autonomously toward a goal without requiring user input after every step.

The Loop Architecture

Core Components

1. QueryEngine Class

Location: src/QueryEngine.ts The QueryEngine orchestrates the entire loop. It’s instantiated once per conversation and maintains state across multiple turns.
class QueryEngine {
  private mutableMessages: Message[]      // Conversation history
  private abortController: AbortController // Cancellation
  private permissionDenials: SDKPermissionDenial[]
  private totalUsage: NonNullableUsage    // Token tracking
  private readFileState: FileStateCache   // File cache
  
  async *submitMessage(prompt): AsyncGenerator<SDKMessage> {
    // Main loop entry point
  }
}
Key Responsibilities:
  • Maintain conversation state
  • Track token usage and costs
  • Handle permission denials
  • Manage file state cache
  • Coordinate with query() function

2. query() Function

Location: src/query.ts The inner loop that handles a single turn (potentially multiple API calls).
async function* query(params: QueryParams): AsyncGenerator {
  // Loop state
  let state = {
    messages,
    turnCount: 0,
    maxOutputTokensRecoveryCount: 0,
    // ... more state
  }
  
  while (true) {
    // 1. Call API
    const response = await callClaude(state.messages)
    
    // 2. Stream response
    for await (const event of response) {
      yield event  // Text, thinking, tool uses
    }
    
    // 3. Check stop reason
    const decision = decideNextAction(response.stopReason, state)
    
    if (decision.type === 'terminal') {
      return decision  // Done
    }
    
    // 4. Execute tools if present
    if (hasToolUses(response)) {
      const results = await executeTools(response.toolUses)
      state.messages.push(...results)
    }
    
    // 5. Check budgets
    if (exceedsBudget(state)) {
      return { type: 'budget_exhausted' }
    }
    
    // 6. Continue loop
    state.turnCount++
  }
}

Loop Phases

Phase 1: API Call

The loop starts by calling the Anthropic API with the current conversation state. Request Structure:
{
  model: "claude-sonnet-4-6",
  max_tokens: 8192,
  system: systemPrompt,      // Assembled prompt
  messages: apiMessages,      // Normalized messages
  tools: toolSchemas,         // Available tools
  stream: true                // Streaming enabled
}
Streaming Events:
  • message_start - Response begins
  • content_block_start - New content block (text/thinking/tool_use)
  • content_block_delta - Incremental content
  • content_block_stop - Block complete
  • message_delta - Usage updates
  • message_stop - Response complete

Phase 2: Content Processing

As content streams in, it’s processed by type: Text Blocks:
if (block.type === 'text') {
  yield { type: 'text_delta', text: block.text }
  // Rendered immediately to user
}
Thinking Blocks:
if (block.type === 'thinking') {
  yield { type: 'thinking_delta', thinking: block.thinking }
  // Displayed in thinking UI
}
Tool Use Blocks:
if (block.type === 'tool_use') {
  // Queued for execution after response completes
  toolUses.push({
    id: block.id,
    name: block.name,
    input: block.input
  })
}

Phase 3: Tool Execution

After the response completes, tools are executed. Execution Strategy:
// Partition tools into batches
const batches = partitionToolCalls(toolUses)

for (const batch of batches) {
  if (batch.isConcurrencySafe) {
    // Execute read-only tools in parallel
    await runToolsConcurrently(batch.tools)
  } else {
    // Execute write tools serially
    await runToolsSerially(batch.tools)
  }
}
Concurrency Safety: Tools declare whether they’re safe to run in parallel:
class FileReadTool {
  isConcurrencySafe(input) {
    return true  // Reading is safe
  }
}

class FileWriteTool {
  isConcurrencySafe(input) {
    return false  // Writing must be serial
  }
}
Permission Checking: Before execution, each tool goes through permission checks:
async function runToolUse(toolUse, canUseTool) {
  // 1. Check permission
  const decision = await canUseTool(
    tool,
    toolUse.input,
    context
  )
  
  if (decision.type === 'deny') {
    return { error: decision.reason }
  }
  
  if (decision.type === 'prompt') {
    const approved = await promptUser(decision.message)
    if (!approved) {
      return { error: 'User denied' }
    }
  }
  
  // 2. Execute tool
  const result = await tool.execute(toolUse.input, context)
  
  // 3. Return result
  return {
    type: 'tool_result',
    tool_use_id: toolUse.id,
    content: result
  }
}

Phase 4: Stop Decision

After processing the response, the loop decides whether to continue. Stop Reasons from API:
Stop ReasonMeaningAction
end_turnNatural completionCheck budget, maybe continue
max_tokensOutput limit reachedRetry with higher limit
stop_sequenceExplicit stop markerTerminal - stop
tool_useWaiting for tool resultsContinue with results
Decision Logic:
function decideNextAction(stopReason, state) {
  // 1. Tool uses always continue
  if (stopReason === 'tool_use') {
    return { type: 'continue', reason: 'tool_execution' }
  }
  
  // 2. max_tokens triggers retry
  if (stopReason === 'max_tokens') {
    if (state.maxOutputTokensRecoveryCount < 3) {
      return { 
        type: 'continue', 
        reason: 'max_tokens_recovery',
        newMaxTokens: state.maxOutputTokens * 2
      }
    }
  }
  
  // 3. end_turn checks budget
  if (stopReason === 'end_turn') {
    if (hasTokenBudgetRemaining() && state.turnCount < maxTurns) {
      return { type: 'continue', reason: 'budget_continuation' }
    }
  }
  
  // 4. Otherwise stop
  return { type: 'terminal', reason: stopReason }
}

Phase 5: Budget Checks

Multiple budget types can limit the loop: Token Budget:
const tokenBudget = 500_000  // 500k tokens
const tokensUsed = sumTokens(state.messages)

if (tokensUsed > tokenBudget) {
  return { type: 'budget_exhausted', budget: 'tokens' }
}
Turn Limit:
if (state.turnCount >= maxTurns) {
  return { type: 'budget_exhausted', budget: 'turns' }
}
Cost Limit:
const costUSD = calculateCost(state.totalUsage)

if (costUSD > maxBudgetUsd) {
  return { type: 'budget_exhausted', budget: 'cost' }
}
Task Budget (API-level):
// Sent to API, enforced server-side
{
  output_config: {
    task_budget: {
      total: 100_000  // Total output tokens for turn
    }
  }
}

Advanced Features

Auto-Continue on end_turn

When Claude stops with end_turn but has token budget remaining, the loop automatically continues:
if (stopReason === 'end_turn' && hasTokenBudgetRemaining()) {
  // Add a continuation message
  messages.push({
    type: 'user',
    content: '<continue>'  // Implicit continuation
  })
  
  // Loop continues
  continue
}
This enables Claude to work on complex tasks without artificial turn limits.

max_tokens Recovery

When Claude hits the output token limit mid-response:
if (stopReason === 'max_tokens') {
  // Double the limit and retry
  maxOutputTokens *= 2
  
  // Retry the same request
  continue
}
Retries up to 3 times before giving up.

Thinking Mode Integration

When thinking is enabled, thinking blocks are preserved across the loop:
// Thinking blocks must stay in context
if (hasThinkingBlocks(messages)) {
  // Keep thinking blocks for entire trajectory
  // (current turn + tool results + next turn)
}

Context Compression

When approaching context limits, the loop triggers auto-compact:
if (estimatedTokens > contextWindow * 0.9) {
  // Summarize old messages
  const compacted = await compactMessages(messages)
  
  // Insert boundary marker
  messages = [
    ...compacted,
    { type: 'system', content: '<compact_boundary>' },
    ...recentMessages
  ]
}

Parallel Tool Execution

The loop optimizes tool execution by running safe tools in parallel: Concurrency Limit:
const MAX_CONCURRENT_TOOLS = 10  // Configurable via env var

Error Handling

The loop handles various error conditions:

API Errors

try {
  const response = await callAPI(messages)
} catch (error) {
  if (isRetryable(error)) {
    // Exponential backoff
    await sleep(backoffMs)
    continue
  }
  
  // Non-retryable - stop
  return { type: 'error', error }
}

Tool Errors

try {
  const result = await tool.execute(input)
} catch (error) {
  // Return error as tool result
  return {
    type: 'tool_result',
    tool_use_id: toolUse.id,
    content: error.message,
    is_error: true
  }
}

Permission Denials

if (userDeniedPermission) {
  // Track denial
  permissionDenials.push({
    tool: toolName,
    reason: 'user_denied'
  })
  
  // Return error result
  return {
    type: 'tool_result',
    tool_use_id: toolUse.id,
    content: 'Permission denied by user',
    is_error: true
  }
}

State Management

The loop maintains several types of state:

Conversation State

{
  messages: Message[],           // Full conversation
  totalUsage: Usage,             // Cumulative token usage
  turnCount: number,             // Number of turns
  permissionDenials: Denial[]    // Denied tools
}

Tool Context

{
  tools: Tools,                  // Available tools
  mcpClients: MCPClient[],       // MCP connections
  readFileCache: FileCache,      // File content cache
  inProgressToolUseIDs: Set<string>  // Currently executing
}

Budget Tracking

{
  tokenBudget: number,           // Remaining tokens
  maxTurns: number,              // Turn limit
  maxBudgetUsd: number,          // Cost limit
  taskBudget: { total: number }  // API task budget
}

Performance Characteristics

Latency Per Turn

PhaseTypical TimeNotes
API First Token200-500msNetwork + model startup
StreamingVariable~50 tokens/sec
Tool ExecutionVariableDepends on tool
Permission Check0-∞Waits for user if needed
State Persistence10-30msAsync, non-blocking

Throughput

  • Parallel tools: Up to 10 concurrent
  • Serial tools: One at a time
  • API calls: One at a time (streaming)

Memory Usage

  • Messages: Grows with conversation
  • File cache: Bounded by cache size
  • Tool results: Truncated if too large

Control Flow

High-level flow from input to output

Prompt Assembly

How system prompts are built

Tools Overview

Understanding tool execution

State Management

How state is managed across turns