Experiment Routing

Run controlled experiments to compare models, providers, and parameter configurations for the same task. The gateway selects a variant by weighted random on each request, tags telemetry with the variant name, and returns the selection in the X-Gateway-Variant response header.

Use case: Compare models for summarization

You want to know whether GPT-4o-mini or Claude produces better summaries at lower cost. Instead of splitting traffic at the application level, configure an experiment function:

[providers.openai]
base_url = "https://api.openai.com/v1"
credential = "env::OPENAI_API_KEY"
models = ["gpt-4o", "gpt-4o-mini"]

[providers.anthropic]
base_url = "https://api.anthropic.com/v1"
credential = "env::ANTHROPIC_API_KEY"
models = ["claude-sonnet-4-6"]

[functions.summarize]
endpoint = "chat"
strategy = "experiment"

[functions.summarize.variants.fast]
model = "gpt-4o-mini"
weight = 50
temperature = 0.2
max_tokens = 500

[functions.summarize.variants.quality]
model = "claude-sonnet-4-6"
weight = 50
temperature = 0.7
max_tokens = 1500

Send requests using the function name:

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:4000/v1",
  apiKey: "anything", // gateway owns credentials
});

const response = await client.chat.completions.create({
  model: "function::summarize",
  messages: [{ role: "user", content: "Summarize this quarterly report..." }],
});

The gateway selects fast or quality by weight on each request. The caller’s code never changes — you adjust weights or swap models in config.

How variant selection works

Request arrives with model: "function::summarize"
Gateway looks up the function and finds strategy = "experiment"
Weighted random selects a variant (50/50 in this case)
The variant’s model and params are applied to the upstream request
OTel span includes gateway.variant.name, gateway.variant.weight
Response includes X-Gateway-Variant: fast (or quality) header

Variant params (explicit-wins)

Each variant can override inference parameters. If the variant sets a param, it overrides the caller’s value. If the variant omits a param, the caller’s value passes through.

[functions.summarize.variants.conservative]
model = "gpt-4o-mini"
weight = 50
temperature = 0.2       # overrides caller's temperature
max_tokens = 500        # overrides caller's max_tokens
# top_p not set → caller's value passes through

[functions.summarize.variants.creative]
model = "gpt-4o"
weight = 50
temperature = 0.9       # different temperature for this variant
max_tokens = 2000

Params are validated per endpoint type. Setting temperature on an embedding function is a config error. Unknown params pass through with a startup warning.

Protected fields (model, messages, input, file, prompt, stream) cannot be set as variant params. They are routing or content fields managed by the gateway.

Use case: A/B test embedding models

Experiments work on all endpoint types, not just chat:

[functions.embed]
endpoint = "embeddings"
strategy = "experiment"

[functions.embed.variants.small]
model = "text-embedding-3-small"
weight = 50
dimensions = 1536

[functions.embed.variants.large]
model = "text-embedding-3-large"
weight = 50
dimensions = 3072

const embedding = await client.embeddings.create({
  model: "function::embed",
  input: "The quick brown fox",
});
// X-Gateway-Variant header tells you which model was used

Use case: Cross-provider A/B test

Compare the same task across providers:

[functions.classify]
endpoint = "chat"
strategy = "experiment"

[functions.classify.variants.openai]
model = "gpt-4o-mini"
weight = 50

[functions.classify.variants.anthropic]
model = "claude-haiku-4-5-20251001"
weight = 50

No params needed — this is a pure model comparison. The gateway handles credential routing automatically.

Use case: Audio speech voice comparison

[functions.speak]
endpoint = "audio_speech"
strategy = "experiment"

[functions.speak.variants.alloy]
model = "tts-1"
weight = 50
voice = "alloy"
speed = 1.0

[functions.speak.variants.nova]
model = "tts-1-hd"
weight = 50
voice = "nova"
speed = 0.9

Analyzing experiments

Query your OTel data (ClickHouse, Grafana, etc.) to compare variant performance:

SELECT
    SpanAttributes['gateway.variant.name'] AS variant,
    count() AS requests,
    avg(Duration) / 1e6 AS avg_latency_ms,
    quantile(0.95)(Duration) / 1e6 AS p95_latency_ms,
    avg(SpanAttributes['gen_ai.usage.input_tokens']::Int64) AS avg_input_tokens,
    avg(SpanAttributes['gen_ai.usage.output_tokens']::Int64) AS avg_output_tokens
FROM otel_traces
WHERE SpanAttributes['gateway.function.name'] = 'summarize'
  AND Timestamp > now() - INTERVAL 24 HOUR
GROUP BY variant
ORDER BY variant

Telemetry attributes

Attribute	Type	Description
`gateway.variant.name`	string	Selected variant name
`gateway.variant.weight`	int	Variant’s configured weight
`gateway.experiment.total_variants`	int	Number of variants in the function
`gateway.function.name`	string	Function name
`gateway.function.endpoint`	string	Endpoint type (chat, embeddings, etc.)

Requirements

At least 2 variants per experiment function
All variant weights must be > 0
Each variant specifies either model (inline) or target (named reference), not both
strategy = "experiment" is only valid on functions, not routes

What experiments don’t do

No adaptive optimization — weights are static. Automatic adjustment based on metrics is planned.
No cross-variant fallback — if a variant’s endpoint fails, the request fails. Use strategy = "fallback" for reliability.
No prompt management — variants don’t own prompts. The caller sends messages; the gateway routes.

Get Started

Concepts

Routing

Reference

Security

Telemetry

Experiment Routing

Experiment Routing

Use case: Compare models for summarization

How variant selection works

Variant params (explicit-wins)

Use case: A/B test embedding models

Use case: Cross-provider A/B test

Use case: Audio speech voice comparison

Analyzing experiments

Telemetry attributes

Requirements

What experiments don’t do

Get Started

Concepts

Routing

Reference

Security

Telemetry

Documentation Index

​Experiment Routing

​Use case: Compare models for summarization

​How variant selection works

​Variant params (explicit-wins)

​Use case: A/B test embedding models

​Use case: Cross-provider A/B test

​Use case: Audio speech voice comparison

​Analyzing experiments

​Telemetry attributes

​Requirements

​What experiments don’t do

Experiment Routing

Use case: Compare models for summarization

How variant selection works

Variant params (explicit-wins)

Use case: A/B test embedding models

Use case: Cross-provider A/B test

Use case: Audio speech voice comparison

Analyzing experiments

Telemetry attributes

Requirements

What experiments don’t do