Skip to main content

Documentation Index

Fetch the complete documentation index at: https://to11.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Weighted Routing

To distribute traffic across providers or API keys, configure a route with the weighted strategy. The gateway selects a target at random, proportional to each target’s weight value.

Use cases

  • Cost optimisation — send most traffic to a cheaper endpoint, overflow to a premium one.
  • Gradual migration — shift 10 % of traffic to a new provider and increase the share over time.
  • Load distribution — spread requests across multiple API keys for the same model to avoid rate limits.

Prerequisites

  • At least two providers or two API keys for the same model.
  • Each provider declared in a [providers.*] block with the model in its models list.

Full configuration example

The following config splits gpt-4o traffic 70/30 between OpenAI and Azure OpenAI.
[providers.openai]
base_url = "https://api.openai.com/v1"
credential = "env::OPENAI_API_KEY"
models = ["gpt-4o"]

[providers.azure-openai]
base_url = "https://myresource.openai.azure.com/openai"
credential = "env::AZURE_OPENAI_API_KEY"
auth_type = "api_key_header"
models = ["gpt-4o"]

[targets.openai-primary]
model = "gpt-4o"
weight = 70

[targets.azure-secondary]
model = "gpt-4o"
weight = 30

[routes.gpt4o-weighted]
models = ["gpt-4o"]
strategy = "weighted"
targets = ["openai-primary", "azure-secondary"]

How weights work

The gateway uses weighted random selection. Each request picks a target with probability proportional to its weight relative to the total.
Request arrives for gpt-4o
  |
  v
Total weight = 70 + 30 = 100
  |
  +-- random(0..99) < 70   --> openai-primary   (~70 % of requests)
  +-- random(0..99) >= 70  --> azure-secondary   (~30 % of requests)
Weights are relative, not percentages. A 7/3 split behaves identically to 70/30 or 700/300.

Default weight

When weight is omitted from a target, it defaults to 1. Two targets without explicit weights receive a 50/50 distribution.
[targets.key-a]
model = "gpt-4o"

[targets.key-b]
model = "gpt-4o"

# Equivalent to weight = 1 each --> 50/50 split

Testing it

Send a few requests and observe the gateway logs to see which target is selected on each attempt.
curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
Over a large number of requests, the distribution converges on the configured weights.

Limitations

Weighted routing does not consult target health. If a target is returning errors, it continues to receive its proportional share of traffic. Use the fallback strategy for automatic failover, or combine weighted routing with fallback using multi-step routes.

Next steps

Fallback Routing

Automatic failover when a provider is down.

Routing Overview

Managed vs passthrough routing and the resolution flow.

Configuration

Full TOML reference for all gateway settings.