Every prompt and response, governed.
Guardrails enforce the rules of the TokenOne® scheme at network time, not after the fact. Inbound prompts are screened for injection, PII, secrets and policy violations before they touch a model. Outbound responses are screened again for leakage, toxicity and grounding before they leave the network. Every decision is logged, replayable and auditable · across every provider and TokenOne AI. Included as standard with every plan.
Included as standard
Layers 1 + 2 of Guardrails ship with every TokenOne® plan at no extra charge.
Layer 3 (LLM-judge) is opt-in and metered transparently through the wallet · only triggered when Layer 2 can’t decide confidently. Most workloads never need it. The deterministic + classifier tiers cover ~99% of real-world matches at steady state.
What we screen before the model sees it.
Six categories of inbound risk, evaluated in parallel within a hard latency budget. Every detector returns a verdict, score and matched span · not a black-box yes/no.
Prompt injection / jailbreak
Detect role-play attacks, encoded payloads and "ignore previous instructions" patterns before they reach the model.
PII
Email, phone, SSN, NI, card, IP, names. Detect, redact or block based on policy.
Secrets
API keys, tokens, JWTs and high-entropy strings. Stop accidental credential exfiltration.
Topic / scope
Allow- and deny-lists with optional embedding similarity. Keep the conversation on-mission.
Toxicity
Hate, harassment, violence and self-harm patterns. Regex now, classifier-grade later.
Custom YAML matchers
Tenant-defined rules in YAML for industry-specific policy · versioned, replayable.
What we screen before the response leaves.
The model can produce sensitive data even when the prompt was clean. Outbound guardrails screen again · PII leakage, system-prompt extraction, hallucination, refusal calibration and format compliance.
PII / secret leakage
Catch sensitive data the model produced · even when it never saw it in the prompt.
System-prompt leakage
Detect attempts to extract or echo your system prompt back to the caller.
Toxicity (outbound)
Catch harmful generations the model produced despite a clean prompt.
Grounding / hallucination
For RAG workloads, verify cited sources actually support the claim.
Refusal calibration
Flag over-refusal (unhelpful) and jailbroken-refusal (the safety failed open).
Format / schema
JSON, tool-call shape and contract compliance · every response validated before it leaves.
Every guardrail decision, traceable.
One unified decision row per request · every detector verdict, the rule that fired, the action taken, the policy version, the latency budget consumed. Replayable from snapshot.
- Per-request decision row joinable to the request ledger and the policy version.
- Replay any decision against a different policy version to test changes safely.
- Compliance report PDF includes a Guardrails section per tenant per period.
- Customer-trust Guardrail Proof page shareable with auditors and procurement.
- Drift detection on false-positive and false-negative rates · ML proposals gated by human review.
- Single kill-switch flips guardrails to fail-open under incident response · logged, audited, time-bound.
Bidirectional content guardrails. Built into the network.
See the policy bundles, action vocabulary and decision feed in your environment. Talk to us about regulated workloads.