Skip to main content

Documentation Index

Fetch the complete documentation index at: https://to11.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Function Routing

To decouple task names from model names, define functions in your gateway configuration. Your application sends model: "function::summarize" instead of model: "gpt-4o", and the gateway resolves the function to the configured targets. Swapping models becomes a config change — no application code needs to change.

Prerequisites

  • At least one provider configured with one or more models.

Option A — Inline models shorthand

The simplest approach references model names directly. The gateway creates internal targets for each model automatically.
[providers.openai]
base_url = "https://api.openai.com/v1"
credential = "env::OPENAI_API_KEY"
models = ["gpt-4o", "gpt-4o-mini"]

[providers.anthropic]
base_url = "https://api.anthropic.com/v1"
credential = "env::ANTHROPIC_API_KEY"
models = ["claude-sonnet-4-6"]

[functions.summarize]
endpoint = "chat"
strategy = "fallback"
models = ["gpt-4o", "claude-sonnet-4-6"]
The gateway creates ephemeral targets from the models list. With the fallback strategy, it tries gpt-4o first. If that fails, it falls back to claude-sonnet-4-6. No [targets.*] blocks are needed.

Option B — Named targets

When you need per-target weights, custom credentials, or specific timeout overrides, reference named targets instead.
[providers.openai]
base_url = "https://api.openai.com/v1"
credential = "env::OPENAI_API_KEY"
models = ["gpt-4o"]

[targets.openai-primary]
model = "gpt-4o"
weight = 70
credential = "env::MANAGED_OPENAI_KEY"

[targets.openai-secondary]
model = "gpt-4o"
weight = 30
credential = "env::MANAGED_OPENAI_KEY_2"

[functions.extract]
endpoint = "chat"
strategy = "weighted"
targets = ["openai-primary", "openai-secondary"]
This distributes 70 % of extract traffic to one API key and 30 % to another.

Calling a function

Use the function:: prefix in the model field to route to a function.
curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "function::summarize",
    "messages": [{"role": "user", "content": "Summarise this article..."}]
  }'
Unprefixed names also work — the gateway resolves unprefixed names top-down: function first, then route, then provider. If a function named summarize exists, it matches before any route or provider model with the same name. The explicit function:: prefix is recommended to make the routing layer visible and avoid surprises.

Resolution order

When the gateway receives an unprefixed model name, it resolves top-down through three layers.
model = "summarize" (unprefixed)
  |
  v
Layer 3: Does a function named "summarize" exist?
  +-- Yes --> function dispatch
  |
  +-- No
        |
        v
      Layer 2: Does a route match "summarize"?
        +-- Yes --> managed route dispatch
        |
        +-- No
              |
              v
            Layer 1: Is "summarize" in a provider's models list?
              +-- Yes --> passthrough
              +-- No  --> 404 Unknown Model
Explicit prefixes bypass the cascade: function::summarize goes directly to function dispatch without checking routes or providers.

Per-function retry

Override the global retry settings for a specific function.
[functions.summarize]
endpoint = "chat"
strategy = "fallback"
models = ["gpt-4o", "claude-sonnet-4-6"]

[functions.summarize.retry]
max_retries = 4
backoff_base_ms = 1000
This applies only to the summarize function. Other functions and routes use the global [routing.retry] settings.

Validation rules

A function must use exactly one of models, targets, or steps. The gateway rejects the configuration at load time if multiple fields are present. Each inline model name in the models list must exist in at least one provider’s models array. Functions also support multi-step fallback chains via steps — see Functions (concept) for details.

No fallthrough on failure

If a function matches but all its targets are exhausted (every target has failed and retries are spent), the gateway returns an error to the caller. It does not fall through to Layer 2 routes or Layer 1 passthrough. This behaviour is consistent with managed route handling.

Multi-endpoint functions

Functions work for all endpoint types. Set the endpoint field to declare which HTTP endpoint the function serves.

Embedding function

[functions.embed]
endpoint = "embeddings"
strategy = "single"
models = ["text-embedding-3-small"]
const embedding = await client.embeddings.create({
  model: "function::embed",
  input: "Search query text",
});

Image generation function

[functions.generate-image]
endpoint = "image_generation"
strategy = "single"
models = ["dall-e-3"]
const image = await client.images.generate({
  model: "function::generate-image",
  prompt: "A sunset over mountains",
});

Audio speech function

[functions.speak]
endpoint = "audio_speech"
strategy = "single"
models = ["tts-1"]
const speech = await client.audio.speech.create({
  model: "function::speak",
  input: "Hello, welcome to our platform.",
  voice: "alloy",
});

Audio transcription function

[functions.transcribe]
endpoint = "audio_transcription"
strategy = "single"
models = ["whisper-1"]
Audio transcription functions support model routing and model rewrite, but not variant param overrides. The multipart/form-data body format makes param injection fragile with binary audio data. Operators who need different language or response_format values should create separate functions.

Endpoint mismatch

If you send a request to the wrong endpoint, the gateway returns 400:
function "embed": endpoint mismatch — declared as embeddings, called from chat
This happens when model: "function::embed" is sent to /v1/chat/completions but the function has endpoint = "embeddings". Use the correct HTTP endpoint for the function’s declared type.

Variant params

When using strategy = "experiment", each variant can override request parameters. Params are flat key-value pairs at the same level as model and weight:
[functions.summarize.variants.fast]
model = "gpt-4o-mini"
weight = 50
temperature = 0.2
max_tokens = 500
See Experiment Routing for full details and examples.

Known fields per endpoint

EndpointValidated params
chattemperature, max_tokens, top_p, frequency_penalty, presence_penalty, seed, stop, response_format, n
embeddingsdimensions, encoding_format
image_generationsize, quality, style, n, response_format
audio_speechvoice, speed, response_format
Unknown params are passed through to the provider with a startup warning. Cross-endpoint params (e.g., temperature on an embedding function) are rejected at startup.

Next steps

Routing Overview

Managed vs passthrough routing and the full resolution flow.

Experiment Routing

A/B test models and parameters with weighted variants.

Fallback Routing

Automatic failover between providers.

Configuration

Full TOML reference for all gateway settings.