Weighted Routing

To distribute traffic across providers or API keys, configure a route with the weighted strategy. The gateway selects a target at random, proportional to each target’s weight value.

Use cases

Cost optimisation — send most traffic to a cheaper endpoint, overflow to a premium one.
Gradual migration — shift 10 % of traffic to a new provider and increase the share over time.
Load distribution — spread requests across multiple API keys for the same model to avoid rate limits.

Prerequisites

At least two providers or two API keys for the same model.
Each provider declared in a [providers.*] block with the model in its models list.

Full configuration example

The following config splits gpt-4o traffic 70/30 between OpenAI and Azure OpenAI.

[providers.openai]
base_url = "https://api.openai.com/v1"
credential = "env::OPENAI_API_KEY"
models = ["gpt-4o"]

[providers.azure-openai]
base_url = "https://myresource.openai.azure.com/openai"
credential = "env::AZURE_OPENAI_API_KEY"
auth_type = "api_key_header"
models = ["gpt-4o"]

[targets.openai-primary]
model = "gpt-4o"
weight = 70

[targets.azure-secondary]
model = "gpt-4o"
weight = 30

[routes.gpt4o-weighted]
models = ["gpt-4o"]
strategy = "weighted"
targets = ["openai-primary", "azure-secondary"]

How weights work

The gateway uses weighted random selection. Each request picks a target with probability proportional to its weight relative to the total.

Request arrives for gpt-4o
  |
  v
Total weight = 70 + 30 = 100
  |
  +-- random(0..99) < 70   --> openai-primary   (~70 % of requests)
  +-- random(0..99) >= 70  --> azure-secondary   (~30 % of requests)

Weights are relative, not percentages. A 7/3 split behaves identically to 70/30 or 700/300.

Default weight

When weight is omitted from a target, it defaults to 1. Two targets without explicit weights receive a 50/50 distribution.

[targets.key-a]
model = "gpt-4o"

[targets.key-b]
model = "gpt-4o"

# Equivalent to weight = 1 each --> 50/50 split

Testing it

Send a few requests and observe the gateway logs to see which target is selected on each attempt.

curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Over a large number of requests, the distribution converges on the configured weights.

Limitations

Weighted routing does not consult target health. If a target is returning errors, it continues to receive its proportional share of traffic. Use the fallback strategy for automatic failover, or combine weighted routing with fallback using multi-step routes.

Next steps

Fallback Routing

Automatic failover when a provider is down.

Routing Overview

Managed vs passthrough routing and the resolution flow.

Configuration

Full TOML reference for all gateway settings.

Get Started

Concepts

Routing

Reference

Security

Telemetry

Weighted Routing

Weighted Routing

Use cases

Prerequisites

Full configuration example

How weights work

Default weight

Testing it

Limitations

Next steps

Fallback Routing

Routing Overview

Configuration

Get Started

Concepts

Routing

Reference

Security

Telemetry

Documentation Index

​Weighted Routing

​Use cases

​Prerequisites

​Full configuration example

​How weights work

​Default weight

​Testing it

​Limitations

​Next steps

Fallback Routing

Routing Overview

Configuration

Weighted Routing

Use cases

Prerequisites

Full configuration example

How weights work

Default weight

Testing it

Limitations

Next steps