Skip to main content

Documentation Index

Fetch the complete documentation index at: https://to11.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Routing Overview

The gateway resolves every inbound request through a three-layer routing model. Each layer is independently useful and backward compatible — you opt in to higher layers as your deployment needs them.

Three-layer routing model

LayerRouting keyConfigGateway owns
L1: PassthroughModel name[providers.*] onlyNothing — proxies directly
L2: Managed routingModel name[routes.*] + [targets.*]Credentials + routing strategy
L3: Function routingFunction name[functions.*]Credentials + routing + task abstraction
L1 is the default. Every model registered in a provider’s models array can be used immediately — the caller supplies their own API key and the gateway forwards it. L2 adds gateway-owned credentials and traffic distribution strategies (single, weighted, fallback) for specific models. L3 introduces a task-level indirection layer where callers request a function name (e.g. summarize) instead of a model name.

Resolution flow

When a request arrives, the gateway checks for an explicit namespace prefix first. If none is present, it resolves top-down through the three layers.
Request (model field)
  |
  v
Explicit prefix? (function::, route::, provider::)
  +-- Yes --> dispatch to named layer
  +-- No  --> top-down resolution:
       |
       v
     L3: Function match? --> function dispatch
       |  No
       v
     L2: Route match? --> managed route dispatch
       |  No
       v
     L1: Provider match? --> passthrough
       |  No
       v
     404 UnknownModel
The ordering means a model name can appear in both [providers.*] and [routes.*] — managed routing takes precedence when configured. Similarly, a function name that collides with a model name resolves to the function first.

Namespace prefixes

Explicit prefixes let callers target a specific routing layer, bypassing top-down resolution.
PrefixLayerExampleResolution
function::nameL3function::summarizeDirect function lookup
route::nameL2route::balanced-gpt4oDirect route lookup
provider::modelL1openai::gpt-4oDirect provider lookup
(no prefix)Autogpt-4o or summarizeTop-down resolution
Explicit prefixes prevent accidental shadowing. Without a prefix, adding a function named summarize would shadow a provider model with the same name. Prefixes make the routing layer visible to the caller.

Endpoint support

All three routing layers support all endpoint types.
EndpointL1 PassthroughL2 Managed RouteL3 FunctionPrefix Parsing
Chat completions
Responses / Messages
Embeddings
Audio speech
Audio transcription
Image generation
Count tokens
Files
Both functions and routes require an endpoint field that declares which endpoint type they serve. Sending a request to the wrong endpoint returns 400.

Endpoint types

Functions and routes declare which endpoint they serve via the endpoint field:
ValueHTTP EndpointExample models
chat/v1/chat/completionsgpt-4o, claude-sonnet-4-6
embeddings/v1/embeddingstext-embedding-3-small
audio_speech/v1/audio/speechtts-1, tts-1-hd
audio_transcription/v1/audio/transcriptionswhisper-1
image_generation/v1/images/generationsdall-e-3
The endpoint field is required. The gateway rejects functions and routes without it at startup.

Routing strategies

StrategyDescriptionAvailable on
singleRoute to one targetRoutes, Functions
weightedRandom selection by weightRoutes, Functions
fallbackSequential failover with retryRoutes, Functions
experimentA/B test with weighted variantsFunctions only

Utility — no routing

EndpointPurpose
GET /v1/modelsSynthetic model list from config
GET /healthHealth check

Failover and retry

The gateway uses fast sequential failover combined with retry and exponential backoff. The fallback strategy tries targets in declaration order — if the first target fails (connection error or 5xx), the next target is attempted immediately. When all targets have been tried, the first target is retried once more as a degraded-mode fallback. Configure [routing.retry] for automatic retries with exponential backoff:
[routing.retry]
max_retries     = 3
backoff_base_ms = 250
Functions can override the global retry configuration with per-function settings:
[functions.summarize]
endpoint = "chat"
strategy = "fallback"
models   = ["gpt-4o", "claude-sonnet-4-6"]

[functions.summarize.retry]
max_retries     = 4
backoff_base_ms = 1000
The [routing.circuit_breaker] configuration section is deprecated and ignored at runtime. Use [routing.retry] with the fallback strategy instead.

Scope and limitations

  • No model remapping. The model field in the request body is forwarded as-is to the upstream provider. If your target uses a different model name upstream, you must use the upstream name in the target configuration.
  • No health-aware weighted routing. The weighted strategy does not consult target health. An unhealthy target continues to receive its proportional share of traffic. Use the fallback strategy for automatic failover.

Next steps

Passthrough

How L1 passthrough mode works.

Simple Routing

Set up gateway-owned credentials for a model.

Providers

Supported providers and format translation.

Configuration

Full TOML reference for all gateway settings.