Skip to main content

Documentation Index

Fetch the complete documentation index at: https://to11.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Experiment Routing

Run controlled experiments to compare models, providers, and parameter configurations for the same task. The gateway selects a variant by weighted random on each request, tags telemetry with the variant name, and returns the selection in the X-Gateway-Variant response header.

Use case: Compare models for summarization

You want to know whether GPT-4o-mini or Claude produces better summaries at lower cost. Instead of splitting traffic at the application level, configure an experiment function:
[providers.openai]
base_url = "https://api.openai.com/v1"
credential = "env::OPENAI_API_KEY"
models = ["gpt-4o", "gpt-4o-mini"]

[providers.anthropic]
base_url = "https://api.anthropic.com/v1"
credential = "env::ANTHROPIC_API_KEY"
models = ["claude-sonnet-4-6"]

[functions.summarize]
endpoint = "chat"
strategy = "experiment"

[functions.summarize.variants.fast]
model = "gpt-4o-mini"
weight = 50
temperature = 0.2
max_tokens = 500

[functions.summarize.variants.quality]
model = "claude-sonnet-4-6"
weight = 50
temperature = 0.7
max_tokens = 1500
Send requests using the function name:
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:4000/v1",
  apiKey: "anything", // gateway owns credentials
});

const response = await client.chat.completions.create({
  model: "function::summarize",
  messages: [{ role: "user", content: "Summarize this quarterly report..." }],
});
The gateway selects fast or quality by weight on each request. The caller’s code never changes — you adjust weights or swap models in config.

How variant selection works

  1. Request arrives with model: "function::summarize"
  2. Gateway looks up the function and finds strategy = "experiment"
  3. Weighted random selects a variant (50/50 in this case)
  4. The variant’s model and params are applied to the upstream request
  5. OTel span includes gateway.variant.name, gateway.variant.weight
  6. Response includes X-Gateway-Variant: fast (or quality) header

Variant params (explicit-wins)

Each variant can override inference parameters. If the variant sets a param, it overrides the caller’s value. If the variant omits a param, the caller’s value passes through.
[functions.summarize.variants.conservative]
model = "gpt-4o-mini"
weight = 50
temperature = 0.2       # overrides caller's temperature
max_tokens = 500        # overrides caller's max_tokens
# top_p not set → caller's value passes through

[functions.summarize.variants.creative]
model = "gpt-4o"
weight = 50
temperature = 0.9       # different temperature for this variant
max_tokens = 2000
Params are validated per endpoint type. Setting temperature on an embedding function is a config error. Unknown params pass through with a startup warning.
Protected fields (model, messages, input, file, prompt, stream) cannot be set as variant params. They are routing or content fields managed by the gateway.

Use case: A/B test embedding models

Experiments work on all endpoint types, not just chat:
[functions.embed]
endpoint = "embeddings"
strategy = "experiment"

[functions.embed.variants.small]
model = "text-embedding-3-small"
weight = 50
dimensions = 1536

[functions.embed.variants.large]
model = "text-embedding-3-large"
weight = 50
dimensions = 3072
const embedding = await client.embeddings.create({
  model: "function::embed",
  input: "The quick brown fox",
});
// X-Gateway-Variant header tells you which model was used

Use case: Cross-provider A/B test

Compare the same task across providers:
[functions.classify]
endpoint = "chat"
strategy = "experiment"

[functions.classify.variants.openai]
model = "gpt-4o-mini"
weight = 50

[functions.classify.variants.anthropic]
model = "claude-haiku-4-5-20251001"
weight = 50
No params needed — this is a pure model comparison. The gateway handles credential routing automatically.

Use case: Audio speech voice comparison

[functions.speak]
endpoint = "audio_speech"
strategy = "experiment"

[functions.speak.variants.alloy]
model = "tts-1"
weight = 50
voice = "alloy"
speed = 1.0

[functions.speak.variants.nova]
model = "tts-1-hd"
weight = 50
voice = "nova"
speed = 0.9

Analyzing experiments

Query your OTel data (ClickHouse, Grafana, etc.) to compare variant performance:
SELECT
    SpanAttributes['gateway.variant.name'] AS variant,
    count() AS requests,
    avg(Duration) / 1e6 AS avg_latency_ms,
    quantile(0.95)(Duration) / 1e6 AS p95_latency_ms,
    avg(SpanAttributes['gen_ai.usage.input_tokens']::Int64) AS avg_input_tokens,
    avg(SpanAttributes['gen_ai.usage.output_tokens']::Int64) AS avg_output_tokens
FROM otel_traces
WHERE SpanAttributes['gateway.function.name'] = 'summarize'
  AND Timestamp > now() - INTERVAL 24 HOUR
GROUP BY variant
ORDER BY variant

Telemetry attributes

AttributeTypeDescription
gateway.variant.namestringSelected variant name
gateway.variant.weightintVariant’s configured weight
gateway.experiment.total_variantsintNumber of variants in the function
gateway.function.namestringFunction name
gateway.function.endpointstringEndpoint type (chat, embeddings, etc.)

Requirements

  • At least 2 variants per experiment function
  • All variant weights must be > 0
  • Each variant specifies either model (inline) or target (named reference), not both
  • strategy = "experiment" is only valid on functions, not routes

What experiments don’t do

  • No adaptive optimization — weights are static. Automatic adjustment based on metrics is planned.
  • No cross-variant fallback — if a variant’s endpoint fails, the request fails. Use strategy = "fallback" for reliability.
  • No prompt management — variants don’t own prompts. The caller sends messages; the gateway routes.