Use cases
- Cost control — send most traffic to a cheaper target and the rest to a premium one.
- Gradual migration — shift a small share of traffic to a new provider and raise it over time.
- Load distribution — spread requests across multiple credentials for the same model to ease rate limits.
Before you start
- Two or more targets for the same model — different providers, or different stored credentials for the same provider.
- Each provider connected under Project → AI Gateway → Providers, with the model in its model list. See Connecting a provider.
Create a Weighted split rule
- Open Project → AI Gateway → Routing.
- Start a new rule with the create-rule flow.
- Choose the Weighted split strategy.
- Set the model the rule applies to (for example
gpt-4o). - Add each target — its provider and stored credential — and give it a weight as a percentage. For a 70/30 split, set the primary to 70 and the secondary to 30.
- Save the rule.
How weights work
Selection is random, proportional to the weights. A target weighted 70 receives roughly 70% of requests and one weighted 30 receives roughly 30% — a 70/30 split behaves the same as 7/3. The split is statistical, so it converges on the configured proportions over a large number of requests rather than alternating on every call. If you leave weights equal, traffic is divided evenly.Send a request
Call the model by name, exactly as you would for a single target. to11 selects a target per request:Limitations
Next steps
Fallback chain
Fail over to the next target on error.
Direct routing
Route a model to a single managed target.
Routing overview
The three-layer model and top-down resolution.