Skip to main content
To get automatic failover, create a routing rule with the Fallback chain strategy. to11 tries the targets in the order you declare them and moves to the next one when a target fails with a connection error or a 5xx response.

Before you start

  • Two or more targets for the same model — different providers, or different stored credentials for the same provider.
  • Each provider connected under Project → AI Gateway → Providers, with the model in its model list. See Connecting a provider.

Create a Fallback chain rule

  1. Open Project → AI Gateway → Routing.
  2. Start a new rule with the create-rule flow.
  3. Choose the Fallback chain strategy.
  4. Set the model the rule applies to (for example gpt-4o).
  5. Add the targets in priority order — the first is the primary, the rest are tried in turn. Each target names a provider and a stored credential.
  6. Save the rule.

How fallback works

to11 tries the targets in declared order. If the current target fails (a connection error or an HTTP 5xx), the next target is attempted immediately.
Request arrives for gpt-4o
  |
  v
Try primary target
  |
  +-- Success --> return response
  |
  +-- Failure (5xx / connection error)
        |
        v
      Try next target
        |
        +-- Success --> return response
        +-- Failure --> continue down the chain, then return an error to the caller

Retries within a target

Each target attempt is retried with exponential backoff before to11 moves on to the next target. The default policy is up to 2 retries with a 500ms base backoff. A routing rule can override this default when a model needs different behavior.
Retries handle a transient blip on a single target; the fallback chain handles a target that stays down. Together they give you per-target resilience plus failover across targets.

Send a request

Call the model by name. to11 handles the failover transparently:
curl https://gw.to11.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-to11-authorization: Bearer $TO11_API_KEY" \
  -H "x-to11-project-id: $TO11_PROJECT_ID" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
When the primary target is down, the response comes from the next target in the chain. To confirm which target served the request — and to see the failover attempts — open Projects → Traces in Observe. Each attempt is recorded with its target and attempt number, so you can reconstruct the full chain.

Next steps

Weighted split

Split traffic across targets by weight.

Direct routing

Route a model to a single managed target.

Routing overview

The three-layer model and top-down resolution.