Routes

A route is a named entry in [routes.*] that maps one or more model names to a set of targets with a selection strategy. When a caller sends a request for a model that matches a route, the gateway selects a target according to the strategy and proxies the request with gateway-owned credentials.

Fields

Field	Type	Required	Description
`endpoint`	string	Yes	Endpoint type: `chat`, `embeddings`, `audio_speech`, `audio_transcription`, or `image_generation`.
`models`	array	Yes	Model names this route handles. A request whose `model` field matches any entry is routed here.
`strategy`	string	Yes	Selection strategy: `single`, `weighted`, or `fallback`.
`targets`	array	Yes*	Target names for simple (single-step) routes. Mutually exclusive with `steps`.
`steps`	array	No	Multi-step fallback chain. Mutually exclusive with `targets` — only one may be specified.
`retry`	table	No	Per-route retry configuration. Overrides the global `[routing.retry]` setting.

Strategies

Each route declares a strategy that controls how the gateway picks a target.

Single

[routes.primary-gpt4o]
endpoint = "chat"
models   = ["gpt-4o"]
strategy = "single"
targets  = ["openai-primary"]

One target, deterministic. Every request for the route’s models goes to the same upstream. This is the default when a route has exactly one target, and is what the gateway creates automatically for bare targets.

Weighted

[routes.balanced-gpt4o]
endpoint = "chat"
models   = ["gpt-4o"]
strategy = "weighted"
targets  = ["openai-primary", "azure-secondary"]

Selects a target at random, proportional to each target’s weight. Useful for gradual migrations (shift 10% of traffic to a new provider) or cost optimisation (route most traffic to a cheaper endpoint). Weights are relative integers — 70 and 30 produce the same distribution as 7 and 3.

Fallback

[routes.resilient-gpt4o]
endpoint = "chat"
models   = ["gpt-4o"]
strategy = "fallback"
targets  = ["openai-primary", "azure-backup"]

Tries targets in declaration order. If the first target fails (connection error or 5xx response), the next target is attempted immediately. When all targets have been tried and failed, the first target is retried once more — this handles transient issues that may have resolved during the fallback chain.

The fallback strategy does not maintain health state between requests. Each request starts from the first target in the list. For persistent provider outages, this means the first target is attempted (and fails) on every request before the fallback is reached. This trade-off keeps the routing layer stateless and predictable.

Multi-step routes

For more sophisticated failover, routes support steps — an ordered list where each step defines its own strategy and targets. Steps execute as a fallback chain: if all targets in step 1 are exhausted, step 2 is tried.

[routes.multi-step-gpt4o]
endpoint = "chat"
models   = ["gpt-4o"]
strategy = "fallback"

[[routes.multi-step-gpt4o.steps]]
strategy = "weighted"
targets  = ["openai-east", "openai-west"]

[[routes.multi-step-gpt4o.steps]]
strategy = "single"
targets  = ["azure-fallback"]

In this example, the gateway first distributes traffic across openai-east and openai-west by weight. If both fail, it falls back to azure-fallback.

Per-route retry

Each route can override the global retry configuration:

[routes.critical-gpt4o]
endpoint = "chat"
models   = ["gpt-4o"]
strategy = "fallback"
targets  = ["openai-primary", "azure-backup"]

[routes.critical-gpt4o.retry]
max_retries     = 5
backoff_base_ms = 100

When retry is absent from a route, the global [routing.retry] configuration applies. When that is also absent, the defaults are max_retries = 2 and backoff_base_ms = 500. Retries use exponential backoff: the delay before attempt n is backoff_base_ms * 2^(n-1) milliseconds.

How routes fit into the routing hierarchy

Routes are the second layer (L2) in the gateway’s routing system:

Request (model = "gpt-4o")
  |
  v
L3: Function match?  --> [functions.*]   (explicit task alias)
  |  no
  v
L2: Route match?     --> [routes.*]      (managed, gateway-owned key)
  |  no
  v
L1: Provider lookup  --> [providers.*]   (passthrough, caller's key)
  |  no
  v
404 Unknown Model

The dispatcher checks functions first, then routes, then providers. A model can appear at multiple layers — the highest-priority match wins.

Multi-endpoint routes

Routes support all endpoint types via the endpoint field:

[routes.managed-embeddings]
endpoint = "embeddings"
models = ["text-embedding-3-small"]
strategy = "single"
targets = ["embed-primary"]

[targets.embed-primary]
model = "text-embedding-3-small"
credential = "env::MANAGED_OPENAI_KEY"

When a request for text-embedding-3-small hits /v1/embeddings, the route matches and uses the gateway-managed credential instead of the caller’s key.

Routes match by both model name AND endpoint type. A chat route for gpt-4o does not match embedding requests for the same model.

Next steps

Functions

Named task aliases that decouple caller intent from model choice.

Targets

How targets pair models with gateway-owned credentials.

Routing Overview

Full routing system design: managed vs passthrough, resolution flow.

Get Started

Concepts

Routing

Reference

Security

Telemetry

Routes

Routes

Fields

Strategies

Single

Weighted

Fallback

Multi-step routes

Per-route retry

How routes fit into the routing hierarchy

Multi-endpoint routes

Next steps

Functions

Targets

Routing Overview

Get Started

Concepts

Routing

Reference

Security

Telemetry

Documentation Index

​Routes

​Fields

​Strategies

​Single

​Weighted

​Fallback

​Multi-step routes

​Per-route retry

​How routes fit into the routing hierarchy

​Multi-endpoint routes

​Next steps

Functions

Targets

Routing Overview

Routes

Fields

Strategies

Single

Weighted

Fallback

Multi-step routes

Per-route retry

How routes fit into the routing hierarchy

Multi-endpoint routes

Next steps