Skip to main content
To distribute traffic across several targets for the same model, create a routing rule with the Weighted split strategy. For each request, to11 picks a target at random, with the probability of each target proportional to its weight.

Use cases

  • Cost control — send most traffic to a cheaper target and the rest to a premium one.
  • Gradual migration — shift a small share of traffic to a new provider and raise it over time.
  • Load distribution — spread requests across multiple credentials for the same model to ease rate limits.

Before you start

  • Two or more targets for the same model — different providers, or different stored credentials for the same provider.
  • Each provider connected under Project → AI Gateway → Providers, with the model in its model list. See Connecting a provider.

Create a Weighted split rule

  1. Open Project → AI Gateway → Routing.
  2. Start a new rule with the create-rule flow.
  3. Choose the Weighted split strategy.
  4. Set the model the rule applies to (for example gpt-4o).
  5. Add each target — its provider and stored credential — and give it a weight as a percentage. For a 70/30 split, set the primary to 70 and the secondary to 30.
  6. Save the rule.

How weights work

Selection is random, proportional to the weights. A target weighted 70 receives roughly 70% of requests and one weighted 30 receives roughly 30% — a 70/30 split behaves the same as 7/3. The split is statistical, so it converges on the configured proportions over a large number of requests rather than alternating on every call. If you leave weights equal, traffic is divided evenly.

Send a request

Call the model by name, exactly as you would for a single target. to11 selects a target per request:
curl https://gw.to11.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-to11-authorization: Bearer $TO11_API_KEY" \
  -H "x-to11-project-id: $TO11_PROJECT_ID" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
To see which target served each request, open Projects → Traces in Observe. Each trace records the resolved target and the strategy that selected it, so you can confirm the distribution over time.

Limitations

Weighted split is not health-aware. A target that is returning errors keeps receiving its proportional share of traffic. When you need automatic failover, use a Fallback chain instead, or combine the two.

Next steps

Fallback chain

Fail over to the next target on error.

Direct routing

Route a model to a single managed target.

Routing overview

The three-layer model and top-down resolution.