Module · Prompt Optimiser · Included as standard · Patent pending

Less bloat. Less noise. More signal.

Every inbound prompt is matched to a known pattern, the subject matter extracted as a variable, and the request rewritten to a canonical template before it reaches the model. Typical token reduction: 35–55%. Cache hit rates jump from <5% to 20–40%. Faster compute. Lower cost. More predictable outputs.

Included as standard

Layers 1 + 2 of the Prompt Optimiser ship with every TokenOne® plan at no extra charge.

Layer 3 (LLM-rewrite) is opt-in for tenants with bespoke patterns and cost-metered transparently through the wallet · same model as Guardrails. Most workloads never need Layer 3 to land the headline savings.

What it does

Original prompt → canonical prompt.

The user’s freeform request gets matched to a pattern, its subject extracted, and bound into a clean template. Same intent; far fewer tokens; cache-friendly shape.

Original · 487 tokens

Hey there, hope you're well! So I've been working on this Q3 earnings document and I really need your help summarising it. I want the summary to be short · like, really focused on the key points. Please make sure it's accurate and don't add any speculation. Here's the document: [40-page earnings transcript pasted in full] Could you also include the headline numbers? Like, revenue, EBITDA, that kind of thing. Thanks so much, appreciate the help! Let me know if you need any clarification.

Canonical · 232 tokens · −52%

[pattern: summarise_financial_doc · v3] Summarise the following financial document. Output: 5 bullet points + headline numbers (revenue, EBITDA). No speculation. Document: [40-page earnings transcript]

The user’s actual subject matter · the document · is preserved verbatim and bound into the canonical template as a variable. Politeness, restatement, hedging and formatting noise vanish. Token count drops 52% on this example; cache hit on this canonical shape across the next user with a similar request: 100%.

The numbers

What stretching every token looks like, compounded.

Direct token waste

35–55%

Of inbound tokens are bloat · politeness, restatement, hedging, formatting noise. Removed before dispatch.

Cache hit rate

<5% → 20–40%

Different users phrasing the same request differently currently miss cache. Canonical shapes collide. Hits return free.

Compounded effect

~50–60%

Effective spend reduction on common patterns when token waste and cache hit rate compound. Latency drops in proportion.

Less bloat in. More signal out.

Included as standard with every TokenOne® plan. See the original-vs-canonical diff on your own workloads.