OpenClaw Cost & Token Optimization Guide

Most people try OpenClaw and quit after seeing the bill. Default settings can burn $35-40/day without you realizing it. Here's how to cut your costs by 60-70% without sacrificing quality. Based on the guide by Prajwal Tomar.

Why Default Settings Destroy Your Wallet

OpenClaw sends everything to your primary model by default. Heartbeat checks? Opus. Calendar lookups? Opus. Sub-agents doing parallel work? All Opus.

It's like hiring a neurosurgeon to check if you have a pulse. It works, but it's absurdly expensive for what you're getting.

The problem isn't OpenClaw itself — it's that the defaults prioritize quality over cost, and most users never change them. Every single request, no matter how trivial, hits your most expensive model.

The Real Cost of Defaults

At Opus pricing ($15/$75 per million input/output tokens), a power user sending 200 messages/day can easily spend $943/month. One commenter reported burning through 500 million tokens in their first month before getting cut off.

The fix isn't to use OpenClaw less. It's to use it smarter.

1. Model Tiering: The Biggest Win

Different tasks need different models. Using Opus for everything is the #1 reason bills explode. 80% of requests don't need Opus.

Complex reasoning — Opus / GPT-5.2 (worth the cost for hard problems)
Daily work — Sonnet 4.6 / DeepSeek R1 (95% cheaper than Opus)
Simple tasks — Gemini Flash (98% cheaper than Opus)

{
  "models": {
    "primary": "sonnet-4.6",
    "reasoning": "opus-4.6",
    "simple": "gemini-flash",
    "routing": {
      "enabled": true,
      "provider": "openrouter",
      "strategy": "auto"
    }
  }
}

OpenRouter can auto-route based on prompt complexity — simple tasks go to sub-$1 models, hard problems go to Opus. The routing decision happens in under 1ms. No quality loss.

2. Fix Session Memory Bloat

This one shocks people. Every message loads ~50kb of history into context. That's millions of wasted tokens per month just from session initialization.

The fix is a single session initialization prompt that strips the bloat.

Before vs. After

Before: 50kb context per session = ~40 cents per session start

After: 8kb context per session = ~5 cents per session start

That's an 84% reduction on every single session, and it adds up fast across hundreds of daily sessions.

{
  "sessionInit": {
    "contextMode": "minimal",
    "loadMemory": "on-demand",
    "maxHistoryTokens": 4000,
    "prompt": "Load only active task context. Skip completed items. Summarize older sessions to key decisions only."
  }
}

3. Install QMD for Local Search

Without QMD, every research query sends entire documents to the API. With QMD, search happens locally — zero tokens consumed.

QMD (built by someone at Shopify) indexes your knowledge base locally using BM25 + vector search, then sends only the relevant snippets to the API instead of whole documents.

How QMD Cuts Research Tokens by 90%

Instead of sending a 20-page document to the API and asking "find the relevant section," QMD searches locally and sends only the 2-3 paragraphs that matter.

Without QMD: 20,000 tokens per research query
With QMD: 2,000 tokens per research query

Multiply that by dozens of queries per day and the savings are massive. See our Memory Management Guide for the full QMD setup.

4. Avoid the "Thinking" Token Trap

Reasoning models like o1 and DeepSeek R1 generate "thinking" tokens — internal chain-of-thought that you pay for but never see. These can be 3-5x the visible output.

A response that looks like 500 tokens might actually cost you 2,500 tokens behind the scenes.

{
  "reasoning": {
    "models": ["o1", "deepseek-r1"],
    "useOnly": ["complex-analysis", "debugging", "architecture"],
    "avoid": ["simple-edits", "formatting", "lookups", "classifications"]
  }
}

Use reasoning models only for complex analysis, debugging, and architecture decisions. For everything else, use non-reasoning models. This alone can cut costs by 30-40%.

5. Use Local Models for Repetitive Work

Tasks you do 100+ times a day should never hit a paid API. Run Ollama with Llama 3.2 locally for:

Email sorting and classification
Calendar parsing
Simple text transformations
Boilerplate generation

One-time setup. Zero ongoing cost. Reserve your API budget for creative and complex work where model quality actually matters.

{
  "localModels": {
    "provider": "ollama",
    "model": "llama3.2",
    "tasks": ["classification", "parsing", "formatting"],
    "fallback": "sonnet-4.6"
  }
}

6. Stop Paying for Web Search

Perplexity API costs add up fast when you're running agents 24/7. Exa AI provides free web search for your agents via MCP. Setup takes 30 seconds.

One user reported this single change saved them a projected $200/month.

7. Route Heartbeats to Cheap Models

OpenClaw sends periodic "are you alive?" health checks. By default, these hit your paid API. That's thousands of unnecessary calls per month billed at full price.

The Heartbeat Fix

Route heartbeat checks to Gemini Flash at $0.10 per million tokens — or use local Ollama for $0 (though this has some known bugs).

These checks don't need intelligence. They need a pulse response. Don't pay Opus prices for a ping.

What You'll Actually Save

These are real numbers from optimized configurations, combining model tiering, session memory fixes, local models, and search optimization.

Usage Level	Before	After	Savings
Light (50 msgs/day)	$200/mo	$70/mo	65%
Power (200 msgs/day)	$943/mo	$347/mo	63%
Heavy (500+ msgs/day)	$3,000/mo	$1,000/mo	67%

The Bottom Line

OpenClaw's default settings optimize for quality, not cost. That's fine if money is no object, but most users need both. The key changes, in order of impact:

Tier your models — don't use Opus for everything
Fix session memory bloat — one prompt change, instant savings
Install QMD — local search means zero research tokens
Avoid thinking token waste — use reasoning models sparingly
Use Ollama for repetitive tasks — free local inference
Route heartbeats to Flash — stop paying Opus prices for pings
Switch to free web search — Exa AI via MCP

As one commenter put it: "Model routing and token discipline are basically margin strategy." Cost control isn't optional — it's what separates sustainable OpenClaw usage from a bill that makes you quit.

Want us to optimize your OpenClaw costs?

ClawEasy configures model tiering, session optimization, and local model routing as part of every deployment. We'll cut your token bill without touching your output quality.

Get Started

This guide is based on Prajwal Tomar's comprehensive cost optimization thread. Follow him for more OpenClaw tips and real-world benchmarks.