Module · Guardrails · Included as standard · Patent pending

Every prompt and response, governed.

Guardrails enforce the rules of the TokenOne® scheme at network time, not after the fact. Inbound prompts are screened for injection, PII, secrets and policy violations before they touch a model. Outbound responses are screened again for leakage, toxicity and grounding before they leave the network. Every decision is logged, replayable and auditable · across every provider and TokenOne AI. Included as standard with every plan.

Included as standard

Layers 1 + 2 of Guardrails ship with every TokenOne® plan at no extra charge.

Layer 3 (LLM-judge) is opt-in and metered transparently through the wallet · only triggered when Layer 2 can’t decide confidently. Most workloads never need it. The deterministic + classifier tiers cover ~99% of real-world matches at steady state.

Inbound

What we screen before the model sees it.

Six categories of inbound risk, evaluated in parallel within a hard latency budget. Every detector returns a verdict, score and matched span · not a black-box yes/no.

Prompt injection / jailbreak

Detect role-play attacks, encoded payloads and "ignore previous instructions" patterns before they reach the model.

PII

Email, phone, SSN, NI, card, IP, names. Detect, redact or block based on policy.

Secrets

API keys, tokens, JWTs and high-entropy strings. Stop accidental credential exfiltration.

Topic / scope

Allow- and deny-lists with optional embedding similarity. Keep the conversation on-mission.

Toxicity

Hate, harassment, violence and self-harm patterns. Regex now, classifier-grade later.

Custom YAML matchers

Tenant-defined rules in YAML for industry-specific policy · versioned, replayable.

Outbound

What we screen before the response leaves.

The model can produce sensitive data even when the prompt was clean. Outbound guardrails screen again · PII leakage, system-prompt extraction, hallucination, refusal calibration and format compliance.

PII / secret leakage

Catch sensitive data the model produced · even when it never saw it in the prompt.

System-prompt leakage

Detect attempts to extract or echo your system prompt back to the caller.

Toxicity (outbound)

Catch harmful generations the model produced despite a clean prompt.

Grounding / hallucination

For RAG workloads, verify cited sources actually support the claim.

Refusal calibration

Flag over-refusal (unhelpful) and jailbroken-refusal (the safety failed open).

Format / schema

JSON, tool-call shape and contract compliance · every response validated before it leaves.

Audit + replay

Every guardrail decision, traceable.

One unified decision row per request · every detector verdict, the rule that fired, the action taken, the policy version, the latency budget consumed. Replayable from snapshot.

  • Per-request decision row joinable to the request ledger and the policy version.
  • Replay any decision against a different policy version to test changes safely.
  • Compliance report PDF includes a Guardrails section per tenant per period.
  • Customer-trust Guardrail Proof page shareable with auditors and procurement.
  • Drift detection on false-positive and false-negative rates · ML proposals gated by human review.
  • Single kill-switch flips guardrails to fail-open under incident response · logged, audited, time-bound.

Bidirectional content guardrails. Built into the network.

See the policy bundles, action vocabulary and decision feed in your environment. Talk to us about regulated workloads.