If your agents write and execute code, you're paying for two things: the LLM calls to generate code, and the compute to run it. Most cost tracking tools only see the first part. AgentBurn tracks both.
The E2B Cost Model
E2B charges per second of sandbox compute time. A sandbox that runs for 30 seconds costs significantly more than the LLM call that generated the code. For iterative code generation (write → run → fix → run), compute costs can exceed LLM costs.
Instrumenting E2B Costs
import e2b
import time
sandbox = e2b.Sandbox()
start = time.time()
result = sandbox.run_code(generated_code)
duration = time.time() - start
# Track the compute cost
ingest_event(
agent_id="code-gen-agent",
provider="e2b",
operation="sandbox_execution",
cost_usd=duration * E2B_COST_PER_SECOND,
metadata=json.dumps({"duration_s": duration, "exit_code": result.exit_code})
)
Illustrative Cost Split
For a code generation agent that iterates until tests pass, the cost breakdown might look like:
- LLM calls (code generation): Multiple iterations at a few cents per call
- E2B sandbox (execution): Multiple runs at several seconds each — compute cost can match or exceed LLM cost
Without tracking E2B costs alongside LLM costs, you might think each task costs half of what it actually does. The compute portion is invisible without explicit tracking.
Optimization Strategies
- Keep sandboxes warm — Reuse sandboxes across iterations to avoid cold start costs
- Set execution timeouts — Cap sandbox runtime at 30s to prevent infinite loops
- Generate tests first — Let the agent write tests before code to reduce iteration cycles
- Track iteration count — Use AgentBurn's metadata field to log how many attempts each task takes