Version analyzed: March 31, 2026 leak
This page explains the core loop that enables Claude Code to autonomously execute tools and make decisions.
What is the Agent Loop?
The agent loop is the heart of Claude Code’s autonomy. It’s a continuous cycle where:
Claude generates a response (text + tool calls)
Tools are executed
Results are fed back to Claude
Claude decides whether to continue or stop
Repeat until task completion
This loop is what makes Claude Code “agentic” — it can work autonomously toward a goal without requiring user input after every step.
The Loop Architecture
Core Components
1. QueryEngine Class
Location: src/QueryEngine.ts
The QueryEngine orchestrates the entire loop. It’s instantiated once per conversation and maintains state across multiple turns.
class QueryEngine {
private mutableMessages : Message [] // Conversation history
private abortController : AbortController // Cancellation
private permissionDenials : SDKPermissionDenial []
private totalUsage : NonNullableUsage // Token tracking
private readFileState : FileStateCache // File cache
async * submitMessage ( prompt ) : AsyncGenerator < SDKMessage > {
// Main loop entry point
}
}
Key Responsibilities:
Maintain conversation state
Track token usage and costs
Handle permission denials
Manage file state cache
Coordinate with query() function
2. query() Function
Location: src/query.ts
The inner loop that handles a single turn (potentially multiple API calls).
async function* query ( params : QueryParams ) : AsyncGenerator {
// Loop state
let state = {
messages ,
turnCount: 0 ,
maxOutputTokensRecoveryCount: 0 ,
// ... more state
}
while ( true ) {
// 1. Call API
const response = await callClaude ( state . messages )
// 2. Stream response
for await ( const event of response ) {
yield event // Text, thinking, tool uses
}
// 3. Check stop reason
const decision = decideNextAction ( response . stopReason , state )
if ( decision . type === 'terminal' ) {
return decision // Done
}
// 4. Execute tools if present
if ( hasToolUses ( response )) {
const results = await executeTools ( response . toolUses )
state . messages . push ( ... results )
}
// 5. Check budgets
if ( exceedsBudget ( state )) {
return { type: 'budget_exhausted' }
}
// 6. Continue loop
state . turnCount ++
}
}
Loop Phases
Phase 1: API Call
The loop starts by calling the Anthropic API with the current conversation state.
Request Structure:
{
model : "claude-sonnet-4-6" ,
max_tokens : 8192 ,
system : systemPrompt , // Assembled prompt
messages : apiMessages , // Normalized messages
tools : toolSchemas , // Available tools
stream : true // Streaming enabled
}
Streaming Events:
message_start - Response begins
content_block_start - New content block (text/thinking/tool_use)
content_block_delta - Incremental content
content_block_stop - Block complete
message_delta - Usage updates
message_stop - Response complete
Phase 2: Content Processing
As content streams in, it’s processed by type:
Text Blocks:
if ( block . type === 'text' ) {
yield { type: 'text_delta' , text: block . text }
// Rendered immediately to user
}
Thinking Blocks:
if ( block . type === 'thinking' ) {
yield { type: 'thinking_delta' , thinking: block . thinking }
// Displayed in thinking UI
}
Tool Use Blocks:
if ( block . type === 'tool_use' ) {
// Queued for execution after response completes
toolUses . push ({
id: block . id ,
name: block . name ,
input: block . input
})
}
After the response completes, tools are executed.
Execution Strategy:
// Partition tools into batches
const batches = partitionToolCalls ( toolUses )
for ( const batch of batches ) {
if ( batch . isConcurrencySafe ) {
// Execute read-only tools in parallel
await runToolsConcurrently ( batch . tools )
} else {
// Execute write tools serially
await runToolsSerially ( batch . tools )
}
}
Concurrency Safety:
Tools declare whether they’re safe to run in parallel:
class FileReadTool {
isConcurrencySafe ( input ) {
return true // Reading is safe
}
}
class FileWriteTool {
isConcurrencySafe ( input ) {
return false // Writing must be serial
}
}
Permission Checking:
Before execution, each tool goes through permission checks:
async function runToolUse ( toolUse , canUseTool ) {
// 1. Check permission
const decision = await canUseTool (
tool ,
toolUse . input ,
context
)
if ( decision . type === 'deny' ) {
return { error: decision . reason }
}
if ( decision . type === 'prompt' ) {
const approved = await promptUser ( decision . message )
if ( ! approved ) {
return { error: 'User denied' }
}
}
// 2. Execute tool
const result = await tool . execute ( toolUse . input , context )
// 3. Return result
return {
type: 'tool_result' ,
tool_use_id: toolUse . id ,
content: result
}
}
Phase 4: Stop Decision
After processing the response, the loop decides whether to continue.
Stop Reasons from API:
Stop Reason Meaning Action end_turnNatural completion Check budget, maybe continue max_tokensOutput limit reached Retry with higher limit stop_sequenceExplicit stop marker Terminal - stop tool_useWaiting for tool results Continue with results
Decision Logic:
function decideNextAction ( stopReason , state ) {
// 1. Tool uses always continue
if ( stopReason === 'tool_use' ) {
return { type: 'continue' , reason: 'tool_execution' }
}
// 2. max_tokens triggers retry
if ( stopReason === 'max_tokens' ) {
if ( state . maxOutputTokensRecoveryCount < 3 ) {
return {
type: 'continue' ,
reason: 'max_tokens_recovery' ,
newMaxTokens: state . maxOutputTokens * 2
}
}
}
// 3. end_turn checks budget
if ( stopReason === 'end_turn' ) {
if ( hasTokenBudgetRemaining () && state . turnCount < maxTurns ) {
return { type: 'continue' , reason: 'budget_continuation' }
}
}
// 4. Otherwise stop
return { type: 'terminal' , reason: stopReason }
}
Phase 5: Budget Checks
Multiple budget types can limit the loop:
Token Budget:
const tokenBudget = 500_000 // 500k tokens
const tokensUsed = sumTokens ( state . messages )
if ( tokensUsed > tokenBudget ) {
return { type: 'budget_exhausted' , budget: 'tokens' }
}
Turn Limit:
if ( state . turnCount >= maxTurns ) {
return { type: 'budget_exhausted' , budget: 'turns' }
}
Cost Limit:
const costUSD = calculateCost ( state . totalUsage )
if ( costUSD > maxBudgetUsd ) {
return { type: 'budget_exhausted' , budget: 'cost' }
}
Task Budget (API-level):
// Sent to API, enforced server-side
{
output_config : {
task_budget : {
total : 100_000 // Total output tokens for turn
}
}
}
Advanced Features
Auto-Continue on end_turn
When Claude stops with end_turn but has token budget remaining, the loop automatically continues:
if ( stopReason === 'end_turn' && hasTokenBudgetRemaining ()) {
// Add a continuation message
messages . push ({
type: 'user' ,
content: '<continue>' // Implicit continuation
})
// Loop continues
continue
}
This enables Claude to work on complex tasks without artificial turn limits.
max_tokens Recovery
When Claude hits the output token limit mid-response:
if ( stopReason === 'max_tokens' ) {
// Double the limit and retry
maxOutputTokens *= 2
// Retry the same request
continue
}
Retries up to 3 times before giving up.
Thinking Mode Integration
When thinking is enabled, thinking blocks are preserved across the loop:
// Thinking blocks must stay in context
if ( hasThinkingBlocks ( messages )) {
// Keep thinking blocks for entire trajectory
// (current turn + tool results + next turn)
}
Context Compression
When approaching context limits, the loop triggers auto-compact:
if ( estimatedTokens > contextWindow * 0.9 ) {
// Summarize old messages
const compacted = await compactMessages ( messages )
// Insert boundary marker
messages = [
... compacted ,
{ type: 'system' , content: '<compact_boundary>' },
... recentMessages
]
}
The loop optimizes tool execution by running safe tools in parallel:
Concurrency Limit:
const MAX_CONCURRENT_TOOLS = 10 // Configurable via env var
Error Handling
The loop handles various error conditions:
API Errors
try {
const response = await callAPI ( messages )
} catch ( error ) {
if ( isRetryable ( error )) {
// Exponential backoff
await sleep ( backoffMs )
continue
}
// Non-retryable - stop
return { type: 'error' , error }
}
try {
const result = await tool . execute ( input )
} catch ( error ) {
// Return error as tool result
return {
type: 'tool_result' ,
tool_use_id: toolUse . id ,
content: error . message ,
is_error: true
}
}
Permission Denials
if ( userDeniedPermission ) {
// Track denial
permissionDenials . push ({
tool: toolName ,
reason: 'user_denied'
})
// Return error result
return {
type: 'tool_result' ,
tool_use_id: toolUse . id ,
content: 'Permission denied by user' ,
is_error: true
}
}
State Management
The loop maintains several types of state:
Conversation State
{
messages : Message [], // Full conversation
totalUsage : Usage , // Cumulative token usage
turnCount : number , // Number of turns
permissionDenials : Denial [] // Denied tools
}
Tool Context
{
tools : Tools , // Available tools
mcpClients : MCPClient [], // MCP connections
readFileCache : FileCache , // File content cache
inProgressToolUseIDs : Set < string > // Currently executing
}
Budget Tracking
{
tokenBudget : number , // Remaining tokens
maxTurns : number , // Turn limit
maxBudgetUsd : number , // Cost limit
taskBudget : { total : number } // API task budget
}
Latency Per Turn
Phase Typical Time Notes API First Token 200-500ms Network + model startup Streaming Variable ~50 tokens/sec Tool Execution Variable Depends on tool Permission Check 0-∞ Waits for user if needed State Persistence 10-30ms Async, non-blocking
Throughput
Parallel tools: Up to 10 concurrent
Serial tools: One at a time
API calls: One at a time (streaming)
Memory Usage
Messages: Grows with conversation
File cache: Bounded by cache size
Tool results: Truncated if too large
Control Flow High-level flow from input to output
Prompt Assembly How system prompts are built
Tools Overview Understanding tool execution
State Management How state is managed across turns