Skip to main content

Error Handling

The gateway sits between your SDK and the upstream LLM provider. When the upstream returns an error, the gateway translates it so that the response your SDK receives is consistent with the SDK’s expectations — even when the upstream’s native error shape is different from the SDK’s. This page covers what the gateway guarantees about error responses, how cross-provider translation works, and how to inspect the gateway’s classification headers for richer error handling.

Guarantees at a glance

SurfaceWhat you see
x-to11-error-code headerStable cross-provider classification — same value regardless of upstream or SDK.
x-to11-upstream-provider headerProvider slug for the upstream that returned the error.
Response body (cross-provider routes)Translated into the surface SDK’s native error envelope so SDK retry logic triggers.
Response body (same-provider routes)Forwarded verbatim from the upstream.
Response body (native passthrough surfaces)Forwarded byte-for-byte.
Response body (5xx errors)Message redacted to provider returned status N.
HTTP statusVerbatim from the upstream.

Headers always classify

Every error response carries two stable headers regardless of which provider, which SDK surface, or which routing strategy produced it:
x-to11-error-code: rate_limit_exceeded
x-to11-upstream-provider: anthropic
The x-to11-error-code taxonomy (TO11-787) is the canonical way to classify gateway errors. Values are stable across all combinations of upstream provider and surface SDK:
x-to11-error-codeMeaning
authUpstream rejected credentials.
forbiddenGeneric 403 — permission denied.
bad_requestMalformed request, including context-length and other 4xx conditions.
quota_exceededAccount-level quota exhausted; retry will not clear.
rate_limitedThrottled — SDK retry with backoff will likely succeed.
overloadedServer under capacity pressure; retry with backoff.
content_policy_violationUpstream content / safety filter blocked the response.
model_not_foundModel name did not resolve at the upstream.
organization_not_verifiedGated-model verification flow required.
upstream_errorGeneric 5xx upstream condition (body redacted).
Build retry, alerting, and metrics dashboards on these headers — not on response body text. The header value is the same whether you called the gateway through the OpenAI SDK against an Anthropic upstream, the Anthropic SDK against a Bedrock upstream, or any other cross-provider route.

Bodies on cross-provider routes match the surface SDK

When your SDK’s wire format does not match the upstream provider (for example, OpenAI SDK → Anthropic upstream), the gateway lifts the upstream’s native error envelope into its internal domain model and lowers it into the SDK’s surface envelope. The net effect: your SDK’s built-in retry policy fires correctly on cross-provider errors that previously would have surfaced as a generic gateway-shaped envelope. The classic example is Anthropic’s 529 overloaded_error. Through an OpenAI SDK client, the gateway translates it to OpenAI’s rate_limit_error + rate_limit_exceeded body shape so the official OpenAI SDK’s automatic retry loop activates:
Upstream (Anthropic):
  HTTP/1.1 529
  { "type": "error",
    "error": { "type": "overloaded_error", "message": "Overloaded" } }

Gateway response (OpenAI surface):
  HTTP/1.1 529
  Retry-After: 30
  x-to11-error-code: overloaded
  x-to11-upstream-provider: anthropic
  { "error": { "type": "rate_limit_error",
               "code": "rate_limit_exceeded",
               "message": "Overloaded" } }
This works in both directions and across all providers in the registry. The full mapping table — every (provider, status, upstream error.type, error.code) → DomainError pairing and every DomainError → surface envelope pairing — lives at apps/gateway/crates/gateway-providers/docs/error-translation.md in the gateway-providers crate.

Retry-After is threaded through

When the upstream response carries a Retry-After header (integer-seconds form), the gateway re-emits it on the translated response. SDKs that honour Retry-After (the official OpenAI and Anthropic SDKs both do) back off accordingly. When the upstream omits the header, the gateway does not synthesise a default — Retry-After is absent on the gateway response too, and the SDK falls back to its own backoff policy.

Same-provider routes remain verbatim

When the surface SDK and the upstream provider match (OpenAI SDK → OpenAI upstream, Anthropic SDK → Anthropic upstream), the gateway short-circuits before translation (TO11-802) and forwards the upstream error body verbatim. This preserves any provider-specific signal — header fields, param indicators, request IDs — that the SDK might inspect beyond the standard envelope shape. The TO11-787 headers (x-to11-error-code, x-to11-upstream-provider) are still attached. The HTTP status is still verbatim.

Native passthrough remains byte-for-byte

Surfaces explicitly opted into native passthrough (TO11-798) bypass normalisation entirely on the success path; the error path behaves the same way — the response is forwarded byte-for-byte from the upstream. Use this when you need bit-identical compatibility with a provider’s native API and accept that cross-provider routes are off the table.

5xx bodies are redacted

For any error with HTTP status >= 500, the response body’s message is rewritten to a conservative provider returned status N form (TO11-380). The upstream’s full message text never reaches the client, even if it would have carried useful debugging signal — that signal lives in your gateway traces and Datadog/Honeycomb spans, not in the response body. On-the-wire status (502, 503, 529, etc.) and the x-to11-error-code header still reach the client unchanged, so your SDK can classify and retry without depending on body text.

Examples

OpenAI SDK retrying through an Anthropic upstream overload

The official OpenAI Python SDK retries automatically on rate_limit_error bodies with Retry-After headers. With the gateway in front of an Anthropic upstream, an Anthropic 529 overloaded_error is translated to OpenAI’s rate_limit_exceeded body shape, so the retry loop just works:
from openai import OpenAI

# Gateway is configured to route this model to an Anthropic upstream.
client = OpenAI(base_url="http://localhost:4000/v1")

response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Hello"}],
    # The SDK will automatically retry on rate_limit_exceeded responses.
)
Behind the scenes the gateway is doing:
Anthropic 529 overloaded_error
  → DomainError::Overloaded { retry_after: Some(30s) }
  → OpenAI-shape { error: { type: "rate_limit_error",
                            code: "rate_limit_exceeded",
                            message: "Overloaded" } }
  + Retry-After: 30
  + x-to11-error-code: overloaded
  + x-to11-upstream-provider: anthropic
The OpenAI SDK sees rate_limit_exceeded + Retry-After, sleeps the requested interval, and retries — even though the upstream was Anthropic.

Inspecting x-to11-error-code for richer classification

When you need to differentiate quota-exhaustion (retry will not clear) from generic rate-limiting (retry with backoff clears) or from server-side overload, the response body’s error.code is intentionally lossy — it always maps to rate_limit_exceeded on the OpenAI surface so SDK retry behaviour works uniformly. The x-to11-error-code header is where the precise classification lives:
import httpx

resp = httpx.post(
    "http://localhost:4000/v1/chat/completions",
    json={
        "model": "claude-sonnet-4-6",
        "messages": [{"role": "user", "content": "Hello"}],
    },
    headers={"Authorization": "Bearer ..."},
)

if resp.status_code >= 400:
    code = resp.headers.get("x-to11-error-code")
    upstream = resp.headers.get("x-to11-upstream-provider")
    if code == "quota_exceeded":
        # Don't retry — account-level exhaustion.
        raise RuntimeError(f"Quota exhausted on {upstream}")
    elif code in ("rate_limited", "overloaded"):
        # Backoff and retry per Retry-After.
        ...
    elif code == "auth":
        # Credentials are wrong — fail fast.
        ...

Falling back on unrecognised envelopes

Each upstream adapter lifts its native error envelope into the gateway’s domain model via into_domain_error. When the envelope shape is unrecognised (an empty body, malformed JSON, or a new upstream error kind the adapter has not yet been taught), the gateway falls back to its legacy envelope parser. The x-to11-error-code header still classifies correctly; the response body is the gateway’s generic shape rather than a translated one. A few currently-known gaps live on the fallback path:
  • AWS Bedrock native envelopes ({"__type": "ThrottlingException", ...}) are not yet recognised by a native lift impl.
  • Google Vertex / Gemini native envelopes ({"error":{"code":429,"status":"RESOURCE_EXHAUSTED"}}) are not yet recognised.
These are tracked as TO11-803 follow-ups.

Mid-stream SSE error frames

When an upstream returns 200 OK for a streaming request and then emits an error envelope inside a data: frame partway through the stream, the gateway routes that frame through the same four-branch decision tree the status-time error pipeline uses:
BranchWhen it firesWhat the client sees
Native passthrough (TO11-798)Anthropic SDK caller (x-api-key + anthropic-version) hitting an Anthropic upstreamThe upstream event: error\ndata: {...} frame is forwarded byte-for-byte.
Same-provider verbatim (TO11-802)Surface SDK and upstream are the same wire family (OpenAI → OpenAI)The upstream data: {"error":...} frame is forwarded byte-for-byte.
Cross-provider domain translation (TO11-803)Surface SDK and upstream differ (OpenAI SDK → Anthropic upstream, or vice-versa)The upstream envelope is lifted into the gateway’s DomainError model, then lowered into the surface SDK’s native error envelope and re-wrapped as an SSE error frame.
Legacy fallbackThe upstream adapter does not recognise the envelope shapeA minimal surface-shape error frame is emitted carrying the upstream message text.
In all four branches the gateway emits a structured egress.error_translated_mid_stream tracing event with to11_code, error_translation, upstream_provider, surface, and endpoint attributes — the same attribute set as the status-time egress.error_translated event, so dashboards can pivot on a single taxonomy across status-time and mid-stream error rates. SSE framing is preserved per the surface SDK’s expected shape:
  • OpenAI surface — plain data: {"error":{...}}\n\n (the OpenAI Python SDK parses the top-level error key into the appropriate exception class).
  • Anthropic surfaceevent: error\ndata: {"type":"error","error":{...}}\n\n (the Anthropic SDK dispatches by SSE event name).
The HTTP status line stays at 200 OK for the mid-stream-error path — the error lives in the response body, not on the status line. The full reference mapping — every provider × status × (error.type, error.code)DomainError mapping and every render direction — is published at apps/gateway/crates/gateway-providers/docs/error-translation.md.

Performance budget

The error-translation funnel runs on every error response — across all 28 providers, every endpoint, both same-provider and cross-provider routes. The gateway pins two budgets on this hot path so the funnel never becomes a meaningful contributor to client-perceived latency or to memory pressure on the gateway process:
BudgetLimitWhy
p99 funnel overhead vs raw forward50µs per translationKeeps the funnel below the noise floor of the upstream round-trip (typically tens of milliseconds) so error responses are not noticeably slower than success responses.
Heap allocations per translation (regression guard)50 heapsToday’s measured worst-case (cross-provider 5xx Overloaded) sits at ~34 heaps; the guard catches a regression that adds ~15+ heaps to a single translation.
Heap allocations per translation (aspirational target)3 heapsReaching this requires eliminating the serde_json round-trip in from_domain_error. Tracked as a follow-up in the parity initiative’s backlog.
These budgets are enforced by two test/bench surfaces:
  • The latency budget runs in a nightly criterion bench (apps/gateway/crates/gateway/benches/error_translation_overhead.rs) that compares the funnel path against a raw axum forward across 12 representative cells (same-provider 4xx + 5xx, cross-provider 4xx + 5xx, OpenAI-compat 4xx + 5xx). The bench runs as a dedicated CI job (.github/workflows/gateway-perf-budget.yml) on a fixed-tier runner so measurements aren’t skewed by noisy-neighbour scheduling; default PR CI is unaffected.
  • The allocation regression guard runs on every PR as a dedicated integration test (apps/gateway/crates/gateway/tests/error_translation_allocation_budget.rs) that installs #[global_allocator] dhat::Alloc and samples HeapStats::total_blocks across a hot loop. dhat’s per-allocation backtrace bookkeeping would invalidate latency measurements, so the two contracts live in separate test binaries.
A budget breach fails the corresponding run and surfaces the offending cell via notifications, so the regression isolates to a specific provider adapter rather than landing as an opaque cross-cutting slowdown.
  • API Reference — endpoints, status codes, request/response shapes.
  • Telemetry — how error classification shows up in spans and metrics.
  • Streaming — error-handling on the streaming path.