Skip to main content

Documentation Index

Fetch the complete documentation index at: https://to11.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Fallback Routing

To configure automatic failover, use the fallback strategy. The gateway tries targets in declaration order and moves to the next one when it encounters a connection error or 5xx response.

Prerequisites

  • At least two targets for the same model, each backed by a different provider or API key.

Full configuration example

The following config sets OpenAI as the primary target and Azure OpenAI as the backup for gpt-4o.
[providers.openai]
base_url = "https://api.openai.com/v1"
credential = "env::OPENAI_API_KEY"
models = ["gpt-4o"]

[providers.azure-openai]
base_url = "https://myresource.openai.azure.com/openai"
credential = "env::AZURE_OPENAI_API_KEY"
auth_type = "api_key_header"
models = ["gpt-4o"]

[targets.openai-primary]
model = "gpt-4o"

[targets.azure-fallback]
model = "gpt-4o"

[routes.gpt4o-failover]
models = ["gpt-4o"]
strategy = "fallback"
targets = ["openai-primary", "azure-fallback"]

How fallback works

The gateway tries targets in the order they appear in the targets list. If the current target fails (connection error or HTTP 5xx), the next target is attempted immediately.
Request arrives for gpt-4o
  |
  v
Try openai-primary
  |
  +-- Success --> return response
  |
  +-- Failure (5xx / connection error)
        |
        v
      Try azure-fallback
        |
        +-- Success --> return response
        |
        +-- Failure
              |
              v
            Retry openai-primary (degraded-mode fallback)
              |
              +-- Success --> return response
              +-- Failure --> return error to caller
When all targets have been tried and failed, the gateway retries the first target one additional time. This degraded-mode fallback avoids returning an error when a transient issue on the primary has already resolved by the time the chain completes.

Retry configuration

Add exponential backoff to each attempt with [routing.retry]. Retry wraps every managed-route attempt — combined with fallback, this gives automatic failover with retries.
[routing.retry]
max_retries = 3
backoff_base_ms = 250
With this configuration, each target attempt is retried up to 3 times with exponential backoff (250 ms, 500 ms, 1000 ms) before the gateway moves to the next target in the chain.

Per-route retry override

Override the global retry settings for a specific route when you need different behaviour.
[routes.gpt4o-failover.retry]
max_retries = 5
backoff_base_ms = 1000
This override applies only to the gpt4o-failover route. All other routes continue to use the global [routing.retry] settings.

Multi-step fallback

For complex scenarios where each step in the fallback chain uses its own routing strategy, use steps instead of targets.
[routes.gpt4o-multi]
models = ["gpt-4o"]
strategy = "fallback"

[[routes.gpt4o-multi.steps]]
strategy = "weighted"
targets = ["openai-primary", "openai-secondary"]

[[routes.gpt4o-multi.steps]]
strategy = "single"
targets = ["azure-fallback"]
Step 1 uses weighted selection across two OpenAI keys. If both fail, step 2 falls back to Azure. Steps execute as an ordered fallback chain — the gateway moves to the next step only when the current step’s targets are exhausted.
When steps is present, the top-level targets field on the route is ignored. Each step defines its own strategy and targets.

Deprecation notice

The [routing.circuit_breaker] configuration section is deprecated and ignored at runtime. It is accepted for backward compatibility only. If enabled = true is set, a deprecation warning is logged at startup. Use [routing.retry] with the fallback strategy instead.

Testing it

Simulate a provider outage by pointing a target at an unreachable URL, then verify the gateway falls back to the next target.
curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
When the primary target is down, the response comes from the fallback target. Check the gateway logs to confirm which target served the request.

Next steps

Function Routing

Decouple task names from model names with named functions.

Weighted Routing

Split traffic across providers by weight.

Configuration

Full TOML reference for all gateway settings.