Routing Overview

The gateway resolves every inbound request through a three-layer routing model. Each layer is independently useful and backward compatible — you opt in to higher layers as your deployment needs them.

Three-layer routing model

Layer	Routing key	Config	Gateway owns
L1: Passthrough	Model name	`[providers.*]` only	Nothing — proxies directly
L2: Managed routing	Model name	`[routes.]` + `[targets.]`	Credentials + routing strategy
L3: Function routing	Function name	`[functions.*]`	Credentials + routing + task abstraction

L1 is the default. Every model registered in a provider’s models array can be used immediately — the caller supplies their own API key and the gateway forwards it. L2 adds gateway-owned credentials and traffic distribution strategies (single, weighted, fallback) for specific models. L3 introduces a task-level indirection layer where callers request a function name (e.g. summarize) instead of a model name.

Resolution flow

When a request arrives, the gateway checks for an explicit namespace prefix first. If none is present, it resolves top-down through the three layers.

Request (model field)
  |
  v
Explicit prefix? (function::, route::, provider::)
  +-- Yes --> dispatch to named layer
  +-- No  --> top-down resolution:
       |
       v
     L3: Function match? --> function dispatch
       |  No
       v
     L2: Route match? --> managed route dispatch
       |  No
       v
     L1: Provider match? --> passthrough
       |  No
       v
     404 UnknownModel

The ordering means a model name can appear in both [providers.*] and [routes.*] — managed routing takes precedence when configured. Similarly, a function name that collides with a model name resolves to the function first.

Namespace prefixes

Explicit prefixes let callers target a specific routing layer, bypassing top-down resolution.

Prefix	Layer	Example	Resolution
`function::name`	L3	`function::summarize`	Direct function lookup
`route::name`	L2	`route::balanced-gpt4o`	Direct route lookup
`provider::model`	L1	`openai::gpt-4o`	Direct provider lookup
(no prefix)	Auto	`gpt-4o` or `summarize`	Top-down resolution

Explicit prefixes prevent accidental shadowing. Without a prefix, adding a function named summarize would shadow a provider model with the same name. Prefixes make the routing layer visible to the caller.

Endpoint support

All three routing layers support all endpoint types.

Endpoint	L1 Passthrough	L2 Managed Route	L3 Function	Prefix Parsing
Chat completions	✅	✅	✅	✅
Responses / Messages	✅	✅	✅	✅
Embeddings	✅	✅	✅	✅
Audio speech	✅	✅	✅	✅
Audio transcription	✅	✅	✅	✅
Image generation	✅	✅	✅	✅
Count tokens	✅	—	—	—
Files	✅	—	—	—

Both functions and routes require an endpoint field that declares which endpoint type they serve. Sending a request to the wrong endpoint returns 400.

Endpoint types

Functions and routes declare which endpoint they serve via the endpoint field:

Value	HTTP Endpoint	Example models
`chat`	`/v1/chat/completions`	gpt-4o, claude-sonnet-4-6
`embeddings`	`/v1/embeddings`	text-embedding-3-small
`audio_speech`	`/v1/audio/speech`	tts-1, tts-1-hd
`audio_transcription`	`/v1/audio/transcriptions`	whisper-1
`image_generation`	`/v1/images/generations`	dall-e-3

The endpoint field is required. The gateway rejects functions and routes without it at startup.

Routing strategies

Strategy	Description	Available on
`single`	Route to one target	Routes, Functions
`weighted`	Random selection by weight	Routes, Functions
`fallback`	Sequential failover with retry	Routes, Functions
`experiment`	A/B test with weighted variants	Functions only

Utility — no routing

Endpoint	Purpose
`GET /v1/models`	Synthetic model list from config
`GET /health`	Health check

Failover and retry

The gateway uses fast sequential failover combined with retry and exponential backoff. The fallback strategy tries targets in declaration order — if the first target fails (connection error or 5xx), the next target is attempted immediately. When all targets have been tried, the first target is retried once more as a degraded-mode fallback. Configure [routing.retry] for automatic retries with exponential backoff:

[routing.retry]
max_retries     = 3
backoff_base_ms = 250

Functions can override the global retry configuration with per-function settings:

[functions.summarize]
endpoint = "chat"
strategy = "fallback"
models   = ["gpt-4o", "claude-sonnet-4-6"]

[functions.summarize.retry]
max_retries     = 4
backoff_base_ms = 1000

The [routing.circuit_breaker] configuration section is deprecated and ignored at runtime. Use [routing.retry] with the fallback strategy instead.

Scope and limitations

No model remapping. The model field in the request body is forwarded as-is to the upstream provider. If your target uses a different model name upstream, you must use the upstream name in the target configuration.
No health-aware weighted routing. The weighted strategy does not consult target health. An unhealthy target continues to receive its proportional share of traffic. Use the fallback strategy for automatic failover.

Next steps

Passthrough

How L1 passthrough mode works.

Simple Routing

Set up gateway-owned credentials for a model.

Providers

Supported providers and format translation.

Configuration

Full TOML reference for all gateway settings.

​Routing Overview

​Three-layer routing model

​Resolution flow

​Namespace prefixes

​Endpoint support

​Endpoint types

​Routing strategies

​Utility — no routing

​Failover and retry

​Scope and limitations

​Next steps

Passthrough

Simple Routing

Providers

Configuration

Routing Overview

Three-layer routing model

Resolution flow

Namespace prefixes

Endpoint support

Endpoint types

Routing strategies

Utility — no routing

Failover and retry

Scope and limitations

Next steps