Authentication

All three proxies accept a to_live_* or to_dev_* key on either the Anthropic-style x-api-key header or the OpenAI-style Authorization: Bearer header. The Gemini proxy also accepts the Google convention x-goog-api-key.

Anthropic shape

POST /v1/anthropic-proxy/v1/messages
Content-Type: application/json
x-api-key: to_live_...

{
  "model": "claude-sonnet-4",
  "messages": [{ "role": "user", "content": "..." }],
  "max_tokens": 1024,
  "stream": true
}

Streaming pipes upstream SSE straight through. Health check: GET /v1/anthropic-proxy/health.

OpenAI shape

POST /v1/openai-proxy/v1/chat/completions
Content-Type: application/json
Authorization: Bearer to_live_...

{
  "model": "gpt-4o",
  "messages": [{ "role": "user", "content": "..." }],
  "stream": true,
  "stream_options": { "include_usage": true }
}

Also exposes GET /v1/openai-proxy/v1/models (curated TokenOne Delivery^{^®} list) and GET /v1/openai-proxy/health.

Gemini shape

POST /v1/gemini-proxy/v1beta/models/gemini-1.5-pro:generateContent
Content-Type: application/json
x-goog-api-key: to_live_...

{
  "contents": [{ "role": "user", "parts": [{ "text": "..." }] }]
}

# For streaming:
POST /v1/gemini-proxy/v1beta/models/gemini-1.5-pro:streamGenerateContent?alt=sse

Rate limits

Per-tenant token bucket, Redis-backed across replicas. Defaults: Anthropic 60/min, OpenAI 120/min, Gemini 60/min. 429 response includes Retry-After + X-RateLimit-Limit + X-RateLimit-Remaining headers.

Error responses

Each proxy returns errors in its native provider’s shape (so existing client error handling works). Common codes: 401 key invalid, 429 rate limit, 503 upstream unconfigured, 502 upstream failure.