Documentation Index
Fetch the complete documentation index at: https://to11.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Experiment Routing
Run controlled experiments to compare models, providers, and parameter configurations for the same task. The gateway selects a variant by weighted random on each request, tags telemetry with the variant name, and returns the selection in the X-Gateway-Variant response header.
Use case: Compare models for summarization
You want to know whether GPT-4o-mini or Claude produces better summaries at lower cost. Instead of splitting traffic at the application level, configure an experiment function:
[providers.openai]
base_url = "https://api.openai.com/v1"
credential = "env::OPENAI_API_KEY"
models = ["gpt-4o", "gpt-4o-mini"]
[providers.anthropic]
base_url = "https://api.anthropic.com/v1"
credential = "env::ANTHROPIC_API_KEY"
models = ["claude-sonnet-4-6"]
[functions.summarize]
endpoint = "chat"
strategy = "experiment"
[functions.summarize.variants.fast]
model = "gpt-4o-mini"
weight = 50
temperature = 0.2
max_tokens = 500
[functions.summarize.variants.quality]
model = "claude-sonnet-4-6"
weight = 50
temperature = 0.7
max_tokens = 1500
Send requests using the function name:
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:4000/v1",
apiKey: "anything", // gateway owns credentials
});
const response = await client.chat.completions.create({
model: "function::summarize",
messages: [{ role: "user", content: "Summarize this quarterly report..." }],
});
The gateway selects fast or quality by weight on each request. The caller’s code never changes — you adjust weights or swap models in config.
How variant selection works
- Request arrives with
model: "function::summarize"
- Gateway looks up the function and finds
strategy = "experiment"
- Weighted random selects a variant (50/50 in this case)
- The variant’s model and params are applied to the upstream request
- OTel span includes
gateway.variant.name, gateway.variant.weight
- Response includes
X-Gateway-Variant: fast (or quality) header
Variant params (explicit-wins)
Each variant can override inference parameters. If the variant sets a param, it overrides the caller’s value. If the variant omits a param, the caller’s value passes through.
[functions.summarize.variants.conservative]
model = "gpt-4o-mini"
weight = 50
temperature = 0.2 # overrides caller's temperature
max_tokens = 500 # overrides caller's max_tokens
# top_p not set → caller's value passes through
[functions.summarize.variants.creative]
model = "gpt-4o"
weight = 50
temperature = 0.9 # different temperature for this variant
max_tokens = 2000
Params are validated per endpoint type. Setting temperature on an embedding function is a config error. Unknown params pass through with a startup warning.
Protected fields (model, messages, input, file, prompt, stream) cannot be set as variant params. They are routing or content fields managed by the gateway.
Use case: A/B test embedding models
Experiments work on all endpoint types, not just chat:
[functions.embed]
endpoint = "embeddings"
strategy = "experiment"
[functions.embed.variants.small]
model = "text-embedding-3-small"
weight = 50
dimensions = 1536
[functions.embed.variants.large]
model = "text-embedding-3-large"
weight = 50
dimensions = 3072
const embedding = await client.embeddings.create({
model: "function::embed",
input: "The quick brown fox",
});
// X-Gateway-Variant header tells you which model was used
Use case: Cross-provider A/B test
Compare the same task across providers:
[functions.classify]
endpoint = "chat"
strategy = "experiment"
[functions.classify.variants.openai]
model = "gpt-4o-mini"
weight = 50
[functions.classify.variants.anthropic]
model = "claude-haiku-4-5-20251001"
weight = 50
No params needed — this is a pure model comparison. The gateway handles credential routing automatically.
Use case: Audio speech voice comparison
[functions.speak]
endpoint = "audio_speech"
strategy = "experiment"
[functions.speak.variants.alloy]
model = "tts-1"
weight = 50
voice = "alloy"
speed = 1.0
[functions.speak.variants.nova]
model = "tts-1-hd"
weight = 50
voice = "nova"
speed = 0.9
Analyzing experiments
Query your OTel data (ClickHouse, Grafana, etc.) to compare variant performance:
SELECT
SpanAttributes['gateway.variant.name'] AS variant,
count() AS requests,
avg(Duration) / 1e6 AS avg_latency_ms,
quantile(0.95)(Duration) / 1e6 AS p95_latency_ms,
avg(SpanAttributes['gen_ai.usage.input_tokens']::Int64) AS avg_input_tokens,
avg(SpanAttributes['gen_ai.usage.output_tokens']::Int64) AS avg_output_tokens
FROM otel_traces
WHERE SpanAttributes['gateway.function.name'] = 'summarize'
AND Timestamp > now() - INTERVAL 24 HOUR
GROUP BY variant
ORDER BY variant
Telemetry attributes
| Attribute | Type | Description |
|---|
gateway.variant.name | string | Selected variant name |
gateway.variant.weight | int | Variant’s configured weight |
gateway.experiment.total_variants | int | Number of variants in the function |
gateway.function.name | string | Function name |
gateway.function.endpoint | string | Endpoint type (chat, embeddings, etc.) |
Requirements
- At least 2 variants per experiment function
- All variant weights must be > 0
- Each variant specifies either
model (inline) or target (named reference), not both
strategy = "experiment" is only valid on functions, not routes
What experiments don’t do
- No adaptive optimization — weights are static. Automatic adjustment based on metrics is planned.
- No cross-variant fallback — if a variant’s endpoint fails, the request fails. Use
strategy = "fallback" for reliability.
- No prompt management — variants don’t own prompts. The caller sends messages; the gateway routes.