Weighted Routing
To distribute traffic across providers or API keys, configure a route with the weighted strategy. The gateway selects a target at random, proportional to each target’s weight value.
Use cases
Cost optimisation — send most traffic to a cheaper endpoint, overflow to a premium one.
Gradual migration — shift 10 % of traffic to a new provider and increase the share over time.
Load distribution — spread requests across multiple API keys for the same model to avoid rate limits.
Prerequisites
At least two providers or two API keys for the same model.
Each provider declared in a [providers.*] block with the model in its models list.
Full configuration example
The following config splits gpt-4o traffic 70/30 between OpenAI and Azure OpenAI.
[ providers . openai ]
base_url = "https://api.openai.com/v1"
credential = "env::OPENAI_API_KEY"
models = [ "gpt-4o" ]
[ providers . azure-openai ]
base_url = "https://myresource.openai.azure.com/openai"
credential = "env::AZURE_OPENAI_API_KEY"
auth_type = "api_key_header"
models = [ "gpt-4o" ]
[ targets . openai-primary ]
model = "gpt-4o"
weight = 70
[ targets . azure-secondary ]
model = "gpt-4o"
weight = 30
[ routes . gpt4o-weighted ]
models = [ "gpt-4o" ]
strategy = "weighted"
targets = [ "openai-primary" , "azure-secondary" ]
How weights work
The gateway uses weighted random selection. Each request picks a target with probability proportional to its weight relative to the total.
Request arrives for gpt-4o
|
v
Total weight = 70 + 30 = 100
|
+-- random(0..99) < 70 --> openai-primary (~70 % of requests)
+-- random(0..99) >= 70 --> azure-secondary (~30 % of requests)
Weights are relative, not percentages. A 7/3 split behaves identically to 70/30 or 700/300.
Default weight
When weight is omitted from a target, it defaults to 1. Two targets without explicit weights receive a 50/50 distribution.
[ targets . key-a ]
model = "gpt-4o"
[ targets . key-b ]
model = "gpt-4o"
# Equivalent to weight = 1 each --> 50/50 split
Testing it
Send a few requests and observe the gateway logs to see which target is selected on each attempt.
curl http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello"}]
}'
Over a large number of requests, the distribution converges on the configured weights.
Limitations
Weighted routing does not consult target health. If a target is returning errors, it continues to receive its proportional share of traffic. Use the fallback strategy for automatic failover, or combine weighted routing with fallback using multi-step routes.
Next steps
Fallback Routing Automatic failover when a provider is down.
Routing Overview Managed vs passthrough routing and the resolution flow.
Configuration Full TOML reference for all gateway settings.