Model Cost Analysis

Token economics across leading LLMs — and how feedback loops close the gap.

The 30× price gap between DeepSeek V4 Flash and Claude Opus looks enormous — until you ask: how many times can Flash retry before it costs the same as one Opus call? That number — the feedback scaler — tells you when token economics favour cheap models with iteration over expensive models with precision.

Cost per task (10K prompt + 2K output)

One-shot cost for a typical exchange. The gap between Flash ($0.002) and Opus ($0.30) is 150×.

The Feedback Scaler

How many retries / feedback iterations can you run with a cheap model before it costs the same as one call to the expensive model?

Reading the scaler: DeepSeek V4 Flash can retry 30 times before matching the cost of one Claude Sonnet 4 call. If those 30 iterations of summarisation, verification, and self-correction produce a better result than one Sonnet call — and they often do — the cheap model is the economically rational choice. The break-even point where more tokens of computation (feedback, CoT, ensembling) substitute for more parameters of model capacity.

Cost per million output tokens

Output is where the real cost lives. A single long-generation task (e.g., code synthesis, report writing) costs dramatically more on premium models.

Our actual token usage (daily estimate)

Daily estimated token consumption from local Hermes agent sessions. Each message ~800 tokens (500 prompt + 300 completion).