OpenClaw Cost & Token Optimization Guide
Most people try OpenClaw and quit after seeing the bill. Default settings can burn $35-40/day without you realizing it. Here's how to cut your costs by 60-70% without sacrificing quality. Based on the guide by Prajwal Tomar.
Why Default Settings Destroy Your Wallet
OpenClaw sends everything to your primary model by default. Heartbeat checks? Opus. Calendar lookups? Opus. Sub-agents doing parallel work? All Opus.
It's like hiring a neurosurgeon to check if you have a pulse. It works, but it's absurdly expensive for what you're getting.
The problem isn't OpenClaw itself โ it's that the defaults prioritize quality over cost, and most users never change them. Every single request, no matter how trivial, hits your most expensive model.
The Real Cost of Defaults
At Opus pricing ($15/$75 per million input/output tokens), a power user sending 200 messages/day can easily spend $943/month. One commenter reported burning through 500 million tokens in their first month before getting cut off.
The fix isn't to use OpenClaw less. It's to use it smarter.
1. Model Tiering: The Biggest Win
Different tasks need different models. Using Opus for everything is the #1 reason bills explode. 80% of requests don't need Opus.
- Complex reasoning โ Opus / GPT-5.2 (worth the cost for hard problems)
- Daily work โ Sonnet 4.6 / DeepSeek R1 (95% cheaper than Opus)
- Simple tasks โ Gemini Flash (98% cheaper than Opus)
{
"models": {
"primary": "sonnet-4.6",
"reasoning": "opus-4.6",
"simple": "gemini-flash",
"routing": {
"enabled": true,
"provider": "openrouter",
"strategy": "auto"
}
}
}OpenRouter can auto-route based on prompt complexity โ simple tasks go to sub-$1 models, hard problems go to Opus. The routing decision happens in under 1ms. No quality loss.
2. Fix Session Memory Bloat
This one shocks people. Every message loads ~50kb of history into context. That's millions of wasted tokens per month just from session initialization.
The fix is a single session initialization prompt that strips the bloat.
Before vs. After
Before: 50kb context per session = ~40 cents per session start
After: 8kb context per session = ~5 cents per session start
That's an 84% reduction on every single session, and it adds up fast across hundreds of daily sessions.
{
"sessionInit": {
"contextMode": "minimal",
"loadMemory": "on-demand",
"maxHistoryTokens": 4000,
"prompt": "Load only active task context. Skip completed items. Summarize older sessions to key decisions only."
}
}3. Install QMD for Local Search
Without QMD, every research query sends entire documents to the API. With QMD, search happens locally โ zero tokens consumed.
QMD (built by someone at Shopify) indexes your knowledge base locally using BM25 + vector search, then sends only the relevant snippets to the API instead of whole documents.
How QMD Cuts Research Tokens by 90%
Instead of sending a 20-page document to the API and asking "find the relevant section," QMD searches locally and sends only the 2-3 paragraphs that matter.
- Without QMD: 20,000 tokens per research query
- With QMD: 2,000 tokens per research query
Multiply that by dozens of queries per day and the savings are massive. See our Memory Management Guide for the full QMD setup.
4. Avoid the "Thinking" Token Trap
Reasoning models like o1 and DeepSeek R1 generate "thinking" tokens โ internal chain-of-thought that you pay for but never see. These can be 3-5x the visible output.
A response that looks like 500 tokens might actually cost you 2,500 tokens behind the scenes.
{
"reasoning": {
"models": ["o1", "deepseek-r1"],
"useOnly": ["complex-analysis", "debugging", "architecture"],
"avoid": ["simple-edits", "formatting", "lookups", "classifications"]
}
}Use reasoning models only for complex analysis, debugging, and architecture decisions. For everything else, use non-reasoning models. This alone can cut costs by 30-40%.
5. Use Local Models for Repetitive Work
Tasks you do 100+ times a day should never hit a paid API. Run Ollama with Llama 3.2 locally for:
- Email sorting and classification
- Calendar parsing
- Simple text transformations
- Boilerplate generation
One-time setup. Zero ongoing cost. Reserve your API budget for creative and complex work where model quality actually matters.
{
"localModels": {
"provider": "ollama",
"model": "llama3.2",
"tasks": ["classification", "parsing", "formatting"],
"fallback": "sonnet-4.6"
}
}6. Stop Paying for Web Search
Perplexity API costs add up fast when you're running agents 24/7. Exa AI provides free web search for your agents via MCP. Setup takes 30 seconds.
One user reported this single change saved them a projected $200/month.
7. Route Heartbeats to Cheap Models
OpenClaw sends periodic "are you alive?" health checks. By default, these hit your paid API. That's thousands of unnecessary calls per month billed at full price.
The Heartbeat Fix
Route heartbeat checks to Gemini Flash at $0.10 per million tokens โ or use local Ollama for $0 (though this has some known bugs).
These checks don't need intelligence. They need a pulse response. Don't pay Opus prices for a ping.
What You'll Actually Save
These are real numbers from optimized configurations, combining model tiering, session memory fixes, local models, and search optimization.
| Usage Level | Before | After | Savings |
|---|---|---|---|
| Light (50 msgs/day) | $200/mo | $70/mo | 65% |
| Power (200 msgs/day) | $943/mo | $347/mo | 63% |
| Heavy (500+ msgs/day) | $3,000/mo | $1,000/mo | 67% |
The Bottom Line
OpenClaw's default settings optimize for quality, not cost. That's fine if money is no object, but most users need both. The key changes, in order of impact:
- Tier your models โ don't use Opus for everything
- Fix session memory bloat โ one prompt change, instant savings
- Install QMD โ local search means zero research tokens
- Avoid thinking token waste โ use reasoning models sparingly
- Use Ollama for repetitive tasks โ free local inference
- Route heartbeats to Flash โ stop paying Opus prices for pings
- Switch to free web search โ Exa AI via MCP
As one commenter put it: "Model routing and token discipline are basically margin strategy." Cost control isn't optional โ it's what separates sustainable OpenClaw usage from a bill that makes you quit.
Want us to optimize your OpenClaw costs?
ClawEasy configures model tiering, session optimization, and local model routing as part of every deployment. We'll cut your token bill without touching your output quality.
Get StartedThis guide is based on Prajwal Tomar's comprehensive cost optimization thread. Follow him for more OpenClaw tips and real-world benchmarks.