Model Cost Analysis
The 30× price gap between DeepSeek V4 Flash and Claude Opus looks enormous — until you ask: how many times can Flash retry before it costs the same as one Opus call? That number — the feedback scaler — tells you when token economics favour cheap models with iteration over expensive models with precision.
Cost per task (10K prompt + 2K output)
One-shot cost for a typical exchange. The gap between Flash ($0.002) and Opus ($0.30) is 150×.
The Feedback Scaler
How many retries / feedback iterations can you run with a cheap model before it costs the same as one call to the expensive model?
Reading the scaler: DeepSeek V4 Flash can retry 30 times before matching the cost of one Claude Sonnet 4 call. If those 30 iterations of summarisation, verification, and self-correction produce a better result than one Sonnet call — and they often do — the cheap model is the economically rational choice. The break-even point where more tokens of computation (feedback, CoT, ensembling) substitute for more parameters of model capacity.
Cost per million output tokens
Output is where the real cost lives. A single long-generation task (e.g., code synthesis, report writing) costs dramatically more on premium models.
Our actual token usage (daily estimate)
Daily estimated token consumption from local Hermes agent sessions. Each message ~800 tokens (500 prompt + 300 completion).