Error Handling
The gateway sits between your SDK and the upstream LLM provider. When the upstream returns an error, the gateway translates it so that the response your SDK receives is consistent with the SDK’s expectations — even when the upstream’s native error shape is different from the SDK’s. This page covers what the gateway guarantees about error responses, how cross-provider translation works, and how to inspect the gateway’s classification headers for richer error handling.Guarantees at a glance
| Surface | What you see |
|---|---|
x-to11-error-code header | Stable cross-provider classification — same value regardless of upstream or SDK. |
x-to11-upstream-provider header | Provider slug for the upstream that returned the error. |
| Response body (cross-provider routes) | Translated into the surface SDK’s native error envelope so SDK retry logic triggers. |
| Response body (same-provider routes) | Forwarded verbatim from the upstream. |
| Response body (native passthrough surfaces) | Forwarded byte-for-byte. |
| Response body (5xx errors) | Message redacted to provider returned status N. |
| HTTP status | Verbatim from the upstream. |
Headers always classify
Every error response carries two stable headers regardless of which provider, which SDK surface, or which routing strategy produced it:x-to11-error-code taxonomy (TO11-787) is the canonical way to classify gateway errors. Values are stable across all combinations of upstream provider and surface SDK:
x-to11-error-code | Meaning |
|---|---|
auth | Upstream rejected credentials. |
forbidden | Generic 403 — permission denied. |
bad_request | Malformed request, including context-length and other 4xx conditions. |
quota_exceeded | Account-level quota exhausted; retry will not clear. |
rate_limited | Throttled — SDK retry with backoff will likely succeed. |
overloaded | Server under capacity pressure; retry with backoff. |
content_policy_violation | Upstream content / safety filter blocked the response. |
model_not_found | Model name did not resolve at the upstream. |
organization_not_verified | Gated-model verification flow required. |
upstream_error | Generic 5xx upstream condition (body redacted). |
Bodies on cross-provider routes match the surface SDK
When your SDK’s wire format does not match the upstream provider (for example, OpenAI SDK → Anthropic upstream), the gateway lifts the upstream’s native error envelope into its internal domain model and lowers it into the SDK’s surface envelope. The net effect: your SDK’s built-in retry policy fires correctly on cross-provider errors that previously would have surfaced as a generic gateway-shaped envelope. The classic example is Anthropic’s529 overloaded_error. Through an OpenAI SDK client, the gateway translates it to OpenAI’s rate_limit_error + rate_limit_exceeded body shape so the official OpenAI SDK’s automatic retry loop activates:
(provider, status, upstream error.type, error.code) → DomainError pairing and every DomainError → surface envelope pairing — lives at apps/gateway/crates/gateway-providers/docs/error-translation.md in the gateway-providers crate.
Retry-After is threaded through
When the upstream response carries aRetry-After header (integer-seconds form), the gateway re-emits it on the translated response. SDKs that honour Retry-After (the official OpenAI and Anthropic SDKs both do) back off accordingly. When the upstream omits the header, the gateway does not synthesise a default — Retry-After is absent on the gateway response too, and the SDK falls back to its own backoff policy.
Same-provider routes remain verbatim
When the surface SDK and the upstream provider match (OpenAI SDK → OpenAI upstream, Anthropic SDK → Anthropic upstream), the gateway short-circuits before translation (TO11-802) and forwards the upstream error body verbatim. This preserves any provider-specific signal — header fields,param indicators, request IDs — that the SDK might inspect beyond the standard envelope shape.
The TO11-787 headers (x-to11-error-code, x-to11-upstream-provider) are still attached. The HTTP status is still verbatim.
Native passthrough remains byte-for-byte
Surfaces explicitly opted into native passthrough (TO11-798) bypass normalisation entirely on the success path; the error path behaves the same way — the response is forwarded byte-for-byte from the upstream. Use this when you need bit-identical compatibility with a provider’s native API and accept that cross-provider routes are off the table.5xx bodies are redacted
For any error with HTTP status>= 500, the response body’s message is rewritten to a conservative provider returned status N form (TO11-380). The upstream’s full message text never reaches the client, even if it would have carried useful debugging signal — that signal lives in your gateway traces and Datadog/Honeycomb spans, not in the response body.
On-the-wire status (502, 503, 529, etc.) and the x-to11-error-code header still reach the client unchanged, so your SDK can classify and retry without depending on body text.
Examples
OpenAI SDK retrying through an Anthropic upstream overload
The official OpenAI Python SDK retries automatically onrate_limit_error bodies with Retry-After headers. With the gateway in front of an Anthropic upstream, an Anthropic 529 overloaded_error is translated to OpenAI’s rate_limit_exceeded body shape, so the retry loop just works:
rate_limit_exceeded + Retry-After, sleeps the requested interval, and retries — even though the upstream was Anthropic.
Inspecting x-to11-error-code for richer classification
When you need to differentiate quota-exhaustion (retry will not clear) from generic rate-limiting (retry with backoff clears) or from server-side overload, the response body’s error.code is intentionally lossy — it always maps to rate_limit_exceeded on the OpenAI surface so SDK retry behaviour works uniformly. The x-to11-error-code header is where the precise classification lives:
Falling back on unrecognised envelopes
Each upstream adapter lifts its native error envelope into the gateway’s domain model viainto_domain_error. When the envelope shape is unrecognised (an empty body, malformed JSON, or a new upstream error kind the adapter has not yet been taught), the gateway falls back to its legacy envelope parser. The x-to11-error-code header still classifies correctly; the response body is the gateway’s generic shape rather than a translated one.
A few currently-known gaps live on the fallback path:
- AWS Bedrock native envelopes (
{"__type": "ThrottlingException", ...}) are not yet recognised by a native lift impl. - Google Vertex / Gemini native envelopes (
{"error":{"code":429,"status":"RESOURCE_EXHAUSTED"}}) are not yet recognised.
Mid-stream SSE error frames
When an upstream returns200 OK for a streaming request and then emits an error envelope inside a data: frame partway through the stream, the gateway routes that frame through the same four-branch decision tree the status-time error pipeline uses:
| Branch | When it fires | What the client sees |
|---|---|---|
| Native passthrough (TO11-798) | Anthropic SDK caller (x-api-key + anthropic-version) hitting an Anthropic upstream | The upstream event: error\ndata: {...} frame is forwarded byte-for-byte. |
| Same-provider verbatim (TO11-802) | Surface SDK and upstream are the same wire family (OpenAI → OpenAI) | The upstream data: {"error":...} frame is forwarded byte-for-byte. |
| Cross-provider domain translation (TO11-803) | Surface SDK and upstream differ (OpenAI SDK → Anthropic upstream, or vice-versa) | The upstream envelope is lifted into the gateway’s DomainError model, then lowered into the surface SDK’s native error envelope and re-wrapped as an SSE error frame. |
| Legacy fallback | The upstream adapter does not recognise the envelope shape | A minimal surface-shape error frame is emitted carrying the upstream message text. |
egress.error_translated_mid_stream tracing event with to11_code, error_translation, upstream_provider, surface, and endpoint attributes — the same attribute set as the status-time egress.error_translated event, so dashboards can pivot on a single taxonomy across status-time and mid-stream error rates.
SSE framing is preserved per the surface SDK’s expected shape:
- OpenAI surface — plain
data: {"error":{...}}\n\n(the OpenAI Python SDK parses the top-levelerrorkey into the appropriate exception class). - Anthropic surface —
event: error\ndata: {"type":"error","error":{...}}\n\n(the Anthropic SDK dispatches by SSE event name).
200 OK for the mid-stream-error path — the error lives in the response body, not on the status line. The full reference mapping — every provider × status × (error.type, error.code) → DomainError mapping and every render direction — is published at apps/gateway/crates/gateway-providers/docs/error-translation.md.
Performance budget
The error-translation funnel runs on every error response — across all 28 providers, every endpoint, both same-provider and cross-provider routes. The gateway pins two budgets on this hot path so the funnel never becomes a meaningful contributor to client-perceived latency or to memory pressure on the gateway process:| Budget | Limit | Why |
|---|---|---|
| p99 funnel overhead vs raw forward | ≤ 50µs per translation | Keeps the funnel below the noise floor of the upstream round-trip (typically tens of milliseconds) so error responses are not noticeably slower than success responses. |
| Heap allocations per translation (regression guard) | ≤ 50 heaps | Today’s measured worst-case (cross-provider 5xx Overloaded) sits at ~34 heaps; the guard catches a regression that adds ~15+ heaps to a single translation. |
| Heap allocations per translation (aspirational target) | ≤ 3 heaps | Reaching this requires eliminating the serde_json round-trip in from_domain_error. Tracked as a follow-up in the parity initiative’s backlog. |
- The latency budget runs in a nightly criterion bench (
apps/gateway/crates/gateway/benches/error_translation_overhead.rs) that compares the funnel path against a raw axum forward across 12 representative cells (same-provider 4xx + 5xx, cross-provider 4xx + 5xx, OpenAI-compat 4xx + 5xx). The bench runs as a dedicated CI job (.github/workflows/gateway-perf-budget.yml) on a fixed-tier runner so measurements aren’t skewed by noisy-neighbour scheduling; default PR CI is unaffected. - The allocation regression guard runs on every PR as a dedicated integration test (
apps/gateway/crates/gateway/tests/error_translation_allocation_budget.rs) that installs#[global_allocator] dhat::Allocand samplesHeapStats::total_blocksacross a hot loop. dhat’s per-allocation backtrace bookkeeping would invalidate latency measurements, so the two contracts live in separate test binaries.
Related
- API Reference — endpoints, status codes, request/response shapes.
- Telemetry — how error classification shows up in spans and metrics.
- Streaming — error-handling on the streaming path.