Less bloat. Less noise. More signal.
Every inbound prompt is matched to a known pattern, the subject matter extracted as a variable, and the request rewritten to a canonical template before it reaches the model. Typical token reduction: 35–55%. Cache hit rates jump from <5% to 20–40%. Faster compute. Lower cost. More predictable outputs.
Included as standard
Layers 1 + 2 of the Prompt Optimiser ship with every TokenOne® plan at no extra charge.
Layer 3 (LLM-rewrite) is opt-in for tenants with bespoke patterns and cost-metered transparently through the wallet · same model as Guardrails. Most workloads never need Layer 3 to land the headline savings.
Original prompt → canonical prompt.
The user’s freeform request gets matched to a pattern, its subject extracted, and bound into a clean template. Same intent; far fewer tokens; cache-friendly shape.
Original · 487 tokens
Hey there, hope you're well! So I've been working on this Q3 earnings document and I really need your help summarising it. I want the summary to be short · like, really focused on the key points. Please make sure it's accurate and don't add any speculation. Here's the document: [40-page earnings transcript pasted in full] Could you also include the headline numbers? Like, revenue, EBITDA, that kind of thing. Thanks so much, appreciate the help! Let me know if you need any clarification.
Canonical · 232 tokens · −52%
[pattern: summarise_financial_doc · v3] Summarise the following financial document. Output: 5 bullet points + headline numbers (revenue, EBITDA). No speculation. Document: [40-page earnings transcript]
The user’s actual subject matter · the document · is preserved verbatim and bound into the canonical template as a variable. Politeness, restatement, hedging and formatting noise vanish. Token count drops 52% on this example; cache hit on this canonical shape across the next user with a similar request: 100%.
What stretching every token looks like, compounded.
Direct token waste
35–55%
Of inbound tokens are bloat · politeness, restatement, hedging, formatting noise. Removed before dispatch.
Cache hit rate
<5% → 20–40%
Different users phrasing the same request differently currently miss cache. Canonical shapes collide. Hits return free.
Compounded effect
~50–60%
Effective spend reduction on common patterns when token waste and cache hit rate compound. Latency drops in proportion.
Less bloat in. More signal out.
Included as standard with every TokenOne® plan. See the original-vs-canonical diff on your own workloads.