Skip to main content

Documentation Index

Fetch the complete documentation index at: https://to11.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Routes

A route is a named entry in [routes.*] that maps one or more model names to a set of targets with a selection strategy. When a caller sends a request for a model that matches a route, the gateway selects a target according to the strategy and proxies the request with gateway-owned credentials.

Fields

FieldTypeRequiredDescription
endpointstringYesEndpoint type: chat, embeddings, audio_speech, audio_transcription, or image_generation.
modelsarrayYesModel names this route handles. A request whose model field matches any entry is routed here.
strategystringYesSelection strategy: single, weighted, or fallback.
targetsarrayYes*Target names for simple (single-step) routes. Mutually exclusive with steps.
stepsarrayNoMulti-step fallback chain. Mutually exclusive with targets — only one may be specified.
retrytableNoPer-route retry configuration. Overrides the global [routing.retry] setting.

Strategies

Each route declares a strategy that controls how the gateway picks a target.

Single

[routes.primary-gpt4o]
endpoint = "chat"
models   = ["gpt-4o"]
strategy = "single"
targets  = ["openai-primary"]
One target, deterministic. Every request for the route’s models goes to the same upstream. This is the default when a route has exactly one target, and is what the gateway creates automatically for bare targets.

Weighted

[routes.balanced-gpt4o]
endpoint = "chat"
models   = ["gpt-4o"]
strategy = "weighted"
targets  = ["openai-primary", "azure-secondary"]
Selects a target at random, proportional to each target’s weight. Useful for gradual migrations (shift 10% of traffic to a new provider) or cost optimisation (route most traffic to a cheaper endpoint). Weights are relative integers — 70 and 30 produce the same distribution as 7 and 3.

Fallback

[routes.resilient-gpt4o]
endpoint = "chat"
models   = ["gpt-4o"]
strategy = "fallback"
targets  = ["openai-primary", "azure-backup"]
Tries targets in declaration order. If the first target fails (connection error or 5xx response), the next target is attempted immediately. When all targets have been tried and failed, the first target is retried once more — this handles transient issues that may have resolved during the fallback chain.
The fallback strategy does not maintain health state between requests. Each request starts from the first target in the list. For persistent provider outages, this means the first target is attempted (and fails) on every request before the fallback is reached. This trade-off keeps the routing layer stateless and predictable.

Multi-step routes

For more sophisticated failover, routes support steps — an ordered list where each step defines its own strategy and targets. Steps execute as a fallback chain: if all targets in step 1 are exhausted, step 2 is tried.
[routes.multi-step-gpt4o]
endpoint = "chat"
models   = ["gpt-4o"]
strategy = "fallback"

[[routes.multi-step-gpt4o.steps]]
strategy = "weighted"
targets  = ["openai-east", "openai-west"]

[[routes.multi-step-gpt4o.steps]]
strategy = "single"
targets  = ["azure-fallback"]
In this example, the gateway first distributes traffic across openai-east and openai-west by weight. If both fail, it falls back to azure-fallback.

Per-route retry

Each route can override the global retry configuration:
[routes.critical-gpt4o]
endpoint = "chat"
models   = ["gpt-4o"]
strategy = "fallback"
targets  = ["openai-primary", "azure-backup"]

[routes.critical-gpt4o.retry]
max_retries     = 5
backoff_base_ms = 100
When retry is absent from a route, the global [routing.retry] configuration applies. When that is also absent, the defaults are max_retries = 2 and backoff_base_ms = 500. Retries use exponential backoff: the delay before attempt n is backoff_base_ms * 2^(n-1) milliseconds.

How routes fit into the routing hierarchy

Routes are the second layer (L2) in the gateway’s routing system:
Request (model = "gpt-4o")
  |
  v
L3: Function match?  --> [functions.*]   (explicit task alias)
  |  no
  v
L2: Route match?     --> [routes.*]      (managed, gateway-owned key)
  |  no
  v
L1: Provider lookup  --> [providers.*]   (passthrough, caller's key)
  |  no
  v
404 Unknown Model
The dispatcher checks functions first, then routes, then providers. A model can appear at multiple layers — the highest-priority match wins.

Multi-endpoint routes

Routes support all endpoint types via the endpoint field:
[routes.managed-embeddings]
endpoint = "embeddings"
models = ["text-embedding-3-small"]
strategy = "single"
targets = ["embed-primary"]

[targets.embed-primary]
model = "text-embedding-3-small"
credential = "env::MANAGED_OPENAI_KEY"
When a request for text-embedding-3-small hits /v1/embeddings, the route matches and uses the gateway-managed credential instead of the caller’s key.
Routes match by both model name AND endpoint type. A chat route for gpt-4o does not match embedding requests for the same model.

Next steps

Functions

Named task aliases that decouple caller intent from model choice.

Targets

How targets pair models with gateway-owned credentials.

Routing Overview

Full routing system design: managed vs passthrough, resolution flow.