If you're running multiple AI agents and spending $10,000/month on LLM costs, where do the biggest savings come from? This playbook walks through the four most impactful optimizations, based on common agent cost patterns and published model pricing.
Step 1: Get Visibility
Before optimizing anything, instrument your agents with per-agent cost tracking. Without this, you're guessing. A typical five-agent setup might reveal a highly uneven cost distribution — often one or two agents account for the majority of spend.
Optimization 1: Fix Retry Waste
A common pattern: agents that produce invalid output (bad JSON, hallucinated tool calls) trigger retries that silently multiply costs. If your agent's error rate is 20-30%, you're paying 20-30% more than you should.
Fix: Add structured output validation, fix malformed schemas, and implement proper error handling.
Typical impact: 20-35% cost reduction on the affected agent.
Optimization 2: Model Routing
Many agents use a single expensive model for everything. Based on published pricing, Claude Haiku is ~60x cheaper than Opus for input tokens, and GPT-4o-mini is ~17x cheaper than GPT-4o. Most support or FAQ workloads don't need the expensive model.
Fix: Classify task complexity and route to the cheapest model that can handle it.
Typical impact: 50-75% cost reduction on agents handling mixed-complexity workloads.
Optimization 3: Prompt Diet
System prompts tend to grow over time as features and guardrails are added. A 6,000-token system prompt is charged on every single call. Most of that content — examples, edge cases, formatting instructions — can be loaded on demand.
Fix: Split into a lean base prompt + conditional context modules.
Typical impact: 30-60% token reduction per call.
Optimization 4: Batch Processing
Some agents make real-time calls for work that doesn't need real-time responses. OpenAI's Batch API offers a 50% discount for async processing.
Fix: Move non-urgent work (monitoring, reporting, batch analysis) to batch APIs.
Typical impact: 50% cost reduction on eligible workloads.
The Compound Effect
Applied together, these four optimizations can reduce total agent spend by 50-70%. None of them require changing what your agents do — only how efficiently they do it.
The lesson: you can't optimize what you can't measure. Instrument first, optimize second.