Skip to main content

Documentation Index

Fetch the complete documentation index at: https://to11.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Configuration Reference

The gateway loads its configuration from a TOML file. By default it reads config/gateway.toml relative to the working directory. All sections except [server] are optional. For an explanation of routing concepts, see Routing. For provider-specific details, see Providers.

Full config reference

[server]
host = "127.0.0.1"
port = 4000

[defaults.provider]
timeout_ms = 30000

[providers.openai]
base_url = "https://api.openai.com/v1"
models = ["gpt-4o", "gpt-4o-mini", "o3"]
timeout_ms = 30000
# credential defaults to "env::OPENAI_API_KEY" for the openai provider

[providers.anthropic]
base_url = "https://api.anthropic.com/v1"
credential = "env::ANTHROPIC_API_KEY"
models = ["claude-sonnet-4-6", "claude-haiku-4-5"]
timeout_ms = 60000

[models.o3]
timeout_ms = 120000

[targets.openai-primary]
model = "gpt-4o"
weight = 70
credential = "env::MANAGED_OPENAI_KEY"

[targets.openai-fallback]
model = "gpt-4o"
weight = 30
credential = "env::MANAGED_OPENAI_KEY_2"

[routes.balanced-gpt4o]
endpoint = "chat"
models = ["gpt-4o"]
strategy = "weighted"
targets = ["openai-primary", "openai-fallback"]

[functions.summarize]
endpoint = "chat"
strategy = "fallback"
models = ["gpt-4o", "claude-sonnet-4-6"]

[functions.summarize.retry]
max_retries = 4
backoff_base_ms = 1000

[routing.retry]
max_retries = 2
backoff_base_ms = 500

[security]
enabled = true
blocklist = ["ignore previous instructions", "dan mode"]

# [tenant_routing]
# enabled = true
# redis_url = "redis://localhost:6379"
# lru_capacity = 1000
# anti_entropy_interval_secs = 300
# api_fallback_url = "http://localhost:4500/internal/routing/configs"
# api_fallback_timeout_ms = 5000
# api_fallback_auth_token = ""
# lru_idle_timeout_secs = 3600
# private_key_path = "/etc/gateway/x25519.key"

[telemetry]
enabled = true
otlp_endpoint = "http://localhost:4317"
service_name = "gateway"
environment = "production"
sample_ratio = 0.1
log_level = "info"

[genai_telemetry]
enabled = false
otlp_endpoint = "http://localhost:4317"
service_name = "gateway-genai-telemetry"
environment = "production"
sample_ratio = 1.0
metrics_enabled = true
metrics_export_interval_secs = 30
metrics_export_timeout_secs = 10
capture_content = true
otlp_bearer_token = ""

Section reference

[server]

Network binding for the gateway process.
FieldTypeDefaultDescription
hoststring"127.0.0.1"Bind address
portinteger4000Listen port

[defaults.provider]

Global defaults applied to every provider unless the provider overrides them explicitly.
FieldTypeDefaultDescription
timeout_msintegerDefault request timeout in milliseconds for all providers
auth_typestringDefault authentication method: bearer, api_key_header, or query_param

[providers.*]

Each named table registers a provider and the models it serves. The table key is the provider identifier (e.g., openai, anthropic, groq).
FieldTypeRequiredDescription
base_urlstringYesProvider API base URL
modelsarrayYesModel names this provider handles
credentialstringNoCredential location (e.g., "env::OPENAI_API_KEY", "none"). When omitted, uses the per-provider convention default.
timeout_msintegerNoRequest timeout in milliseconds. Overrides [defaults.provider].timeout_ms.
auth_typestringNoAuthentication method: bearer (default), api_key_header, or query_param. Overrides [defaults.provider].auth_type.
auth_header_namestringNoCustom header name when auth_type is not standard (e.g., "api-key" for Azure OpenAI)
When credential is omitted, the gateway uses a per-provider convention based on the table key. For example, the openai provider defaults to env::OPENAI_API_KEY, and the anthropic provider defaults to env::ANTHROPIC_API_KEY. Set credential = "none" to disable credential resolution entirely for a provider.

[models.*]

Per-model overrides. The table key is the model name as it appears in a provider’s models array.
FieldTypeDefaultDescription
timeout_msintegerRequest timeout in milliseconds. Overrides the provider timeout.
The timeout resolution order is: [models.*].timeout_ms > [providers.*].timeout_ms > [defaults.provider].timeout_ms > hardcoded default (30 000 ms).

[targets.*]

Each named table declares a managed routing target — a specific model paired with gateway-owned credentials. The table key is the target identifier referenced by routes and functions.
FieldTypeRequiredDescription
modelstringYesUpstream model name (must exist in a provider’s models array)
credentialstringNoCredential location for this target. Overrides the provider-level credential.
weightintegerNoWeight for the weighted strategy (default: 1)
timeout_msintegerNoRequest timeout in milliseconds. Overrides the provider timeout for this target.
When credential is set to an "env::VAR_NAME" location, the environment variable must be present at startup. The gateway fails fast on missing managed credentials rather than discovering the problem at request time.

[routes.*]

Each named table defines a managed route that maps one or more model names to a set of targets with a selection strategy. The table key is the route identifier.
FieldTypeRequiredDescription
endpointstringYesEndpoint type: chat, embeddings, audio_speech, audio_transcription, or image_generation
modelsarrayYesModel names this route handles
strategystringYesSelection strategy: single, weighted, or fallback
targetsarrayYes*Target names for simple routes
stepsarrayNoMulti-step route definition (overrides targets). Each step has its own strategy and targets.
When steps is present, each step defines its own strategy and targets. Steps execute as an ordered fallback chain. The top-level targets field is ignored when steps is provided.

[routes.*.retry]

Per-route retry configuration. When present, overrides the global [routing.retry] settings for this route.
FieldTypeDefaultDescription
max_retriesinteger2Maximum retry attempts
backoff_base_msinteger500Base backoff in milliseconds

[functions.*]

Each named table defines a function — a named routing alias that maps a task name to a strategy and set of models or targets. The table key is the function name (e.g., summarize, extract-entities). Callers invoke a function by sending model: "function::summarize" or, when no provider or route shares the name, simply model: "summarize" (resolved via top-down lookup).
FieldTypeRequiredDescription
endpointstringYesEndpoint type: chat, embeddings, audio_speech, audio_transcription, or image_generation
strategystringYesSelection strategy: single, weighted, fallback, or experiment
modelsarrayNoInline model shorthand — model names resolved through configured providers
targetsarrayNoNamed target references (for weights, custom credentials, or timeouts)
stepsarrayNoMulti-step definition. Each step has its own strategy and targets.
models and targets are mutually exclusive on a function. When models is used, the gateway creates ephemeral targets internally by resolving each model name through the provider configuration. Each inline model must exist in exactly one provider’s models array.

[functions.*.variants.*]

When strategy = "experiment", variants define the A/B test arms. Each named table under variants is a variant configuration.
FieldTypeRequiredDescription
modelstringYes*Model name for this variant. Mutually exclusive with target.
targetstringNoNamed target reference. Mutually exclusive with model.
weightintegerYesRelative weight for random selection. Must be > 0.
(params)variesNoAdditional key-value pairs override request parameters (e.g., temperature, max_tokens).
Params are validated against an allowlist per endpoint type. Unknown params pass through with a startup warning. Protected fields (model, messages, input, file, prompt, stream) cannot be set as params.

EndpointType enum

The endpoint field on functions and routes accepts one of:
ValueHTTP Endpoint
chat/v1/chat/completions, /v1/responses, /v1/messages
embeddings/v1/embeddings
audio_speech/v1/audio/speech
audio_transcription/v1/audio/transcriptions
image_generation/v1/images/generations

[functions.*.retry]

Per-function retry configuration. When present, overrides the global [routing.retry] settings for this function.
FieldTypeDefaultDescription
max_retriesinteger2Maximum retry attempts
backoff_base_msinteger500Base backoff in milliseconds

[routing]

Global routing behaviour settings. This section is optional — when omitted, all sub-sections use their defaults.
FieldTypeDefaultDescription
[routing.retry]tableabsentGlobal retry configuration for managed routes and functions
[routing.circuit_breaker]tableDeprecated. Accepted for backward compatibility but ignored at runtime.

[routing.retry]

Global retry settings applied to all managed routes and functions that do not declare their own [routes.*.retry] or [functions.*.retry] block.
FieldTypeDefaultDescription
max_retriesinteger2Maximum retry attempts
backoff_base_msinteger500Base backoff in milliseconds
The retry layer wraps every managed-route attempt with exponential backoff (powered by the backon crate).

[routing.circuit_breaker] (deprecated)

The [routing.circuit_breaker] section is accepted for backward compatibility but ignored at runtime. The circuit breaker was removed in PR #232 and replaced by stateless fast sequential failover via FallbackStrategy combined with retry and exponential backoff.Use [routing.retry] with the fallback strategy instead. If enabled = true is set, a deprecation warning is logged at startup.
FieldTypeDefaultDescription
enabledboolfalseDeprecated. Accepted but ignored.
failure_thresholdinteger5Deprecated. Accepted but ignored.
recovery_timeout_msinteger30000Deprecated. Accepted but ignored.

[security]

Inline security pipeline for input guardrails.
FieldTypeDefaultDescription
enabledboolfalseEnable the inline security pipeline
blocklistarray[]Blocked terms for case-insensitive substring matching
When enabled, the security pipeline runs PII detection (always on) and blocklist matching on every request before it reaches the upstream provider. See Security Overview for details.

[telemetry]

Application-level telemetry (HTTP spans, latency). These traces go to your application observability stack (e.g., Tempo via Alloy).
FieldTypeDefaultDescription
enabledbooltrueEnable application OTLP export
otlp_endpointstring"http://localhost:4317"gRPC OTLP endpoint
service_namestring"gateway"OpenTelemetry service name
environmentstring"production"Deployment environment tag
sample_ratiofloat0.1Trace sampling ratio (0.0 to 1.0)
log_levelstring"info"Log level filter

[genai_telemetry]

Dedicated GenAI provider telemetry sink. These traces and metrics go to the OTel Collector and on to ClickHouse.
FieldTypeDefaultDescription
enabledboolfalseEnable GenAI OTLP export
otlp_endpointstring"http://localhost:4317"gRPC OTLP endpoint
service_namestring"gateway-genai-telemetry"Service name for GenAI spans
environmentstring"production"Deployment environment tag
sample_ratiofloat1.0Trace sampling ratio (0.0 to 1.0)
metrics_enabledbooltrueEnable histogram metrics export
metrics_export_interval_secsinteger30Metrics push interval in seconds
metrics_export_timeout_secsinteger10Metrics push timeout in seconds
capture_contentbooltrueRecord prompt/completion text in spans
otlp_bearer_tokenstring""Bearer token for gRPC Authorization metadata

[auth]

Optional per-request platform authentication for customer-hosted gateways. When enabled, the gateway exchanges a platform credential for a short-lived collector JWT and exports GenAI telemetry with that request-scoped token.
FieldTypeRequiredDefaultDescription
enabledboolNofalseEnable platform auth exchange
exchange_urlstringYes*Token exchange endpoint URL. Required when enabled = true.
api_tokenstringNoFallback API token when x-to11-authorization header is absent
project_idstringNoFallback project ID when x-to11-project-id header is absent
cache_skew_secondsintegerNo60Token cache skew tolerance in seconds
exchange_timeout_msintegerNo2000HTTP timeout for the token exchange request in milliseconds

[tenant_routing]

Per-tenant routing configuration for platform mode (self-hosted or managed SaaS). When enabled, each project gets its own routing config loaded from Redis with API fallback. This section is only relevant when running in multi-tenant platform mode.
FieldTypeRequiredDefaultDescription
enabledboolNofalseEnable per-tenant routing from DB/Redis. When false, the gateway uses static TOML config.
redis_urlstringNo"redis://localhost:6379"Redis/Valkey connection URL for loading config snapshots
lru_capacityintegerNo1000Maximum number of tenant configs cached in memory
anti_entropy_interval_secsintegerNo300Interval between anti-entropy sweeps (seconds). Must be > 0; invalid values fall back to the default.
api_fallback_urlstringNoBase URL for the API fallback endpoint (e.g., http://api:4500/internal/routing/configs)
api_fallback_timeout_msintegerNo5000Timeout for API fallback requests in milliseconds
api_fallback_auth_tokenstringNoBearer token for authenticating API fallback requests. Sent as Authorization: Bearer <token>.
lru_idle_timeout_secsintegerNo3600Evict cached configs idle longer than this (seconds)
private_key_pathstringNoPath to 32-byte X25519 private key for decrypting Redis envelope-encrypted snapshots. When absent, the gateway expects plaintext JSON in Redis (dev mode only).
The api_fallback_auth_token should be a service-scoped API key with routing:read permission. The API fallback endpoint (GET /internal/routing/configs/:projectId) requires this permission. In production, additionally restrict the endpoint at the network level (VPC, security groups) to gateway instances only.
When private_key_path is not set, the gateway operates in plaintext mode — config snapshots in Redis are not encrypted. This is acceptable for local development but must not be used in production where Redis may contain provider API keys.

[cache]

Response caching backed by Valkey (Redis-compatible). When enabled, identical LLM requests can return instantly from cache.
FieldTypeRequiredDefaultDescription
enabledboolNofalseEnable response caching
urlstringNo"redis://localhost:6379"Valkey/Redis connection URL
default_modestringNo"auto"Cache mode: auto, always, or never
ttl_secondsintegerNo604800Default TTL in seconds (1 week)
max_ttl_secondsintegerNo604800Maximum allowed TTL — caps per-request overrides
max_entry_size_bytesintegerNo1048576Maximum cached response size in bytes (1 MiB)
pool_sizeintegerNo8Connection pool size
encryptboolNotrueAES-256-GCM encryption at rest
encrypt_saltstringNo""Server salt for HKDF key derivation. Required when encrypt = true.
When encrypt is true, encrypt_salt must be a non-empty string. The gateway refuses to start if encryption is enabled without a salt.

Config file location

Override the default path with the GATEWAY_CONFIG environment variable:
GATEWAY_CONFIG=/path/to/config.toml cargo run --release -p gateway

Environment variable overrides

Environment variables take precedence over TOML values.

Application telemetry

Env varOverrides
GATEWAY_TELEMETRY_ENABLED[telemetry] enabled
OTEL_EXPORTER_OTLP_ENDPOINT[telemetry] otlp_endpoint
SERVICE_NAME[telemetry] service_name
ENVIRONMENT[telemetry] environment
OTEL_TRACES_SAMPLER_RATIO[telemetry] sample_ratio

Tenant routing

Env varOverrides
GATEWAY_TENANT_ROUTING_ENABLED[tenant_routing] enabled
GATEWAY_TENANT_ROUTING_REDIS_URL[tenant_routing] redis_url
GATEWAY_TENANT_ROUTING_LRU_CAPACITY[tenant_routing] lru_capacity
GATEWAY_TENANT_ROUTING_ANTI_ENTROPY_INTERVAL_SECS[tenant_routing] anti_entropy_interval_secs
GATEWAY_TENANT_ROUTING_API_FALLBACK_URL[tenant_routing] api_fallback_url
GATEWAY_TENANT_ROUTING_API_FALLBACK_TIMEOUT_MS[tenant_routing] api_fallback_timeout_ms
GATEWAY_TENANT_ROUTING_API_FALLBACK_AUTH_TOKEN[tenant_routing] api_fallback_auth_token
GATEWAY_TENANT_ROUTING_LRU_IDLE_TIMEOUT_SECS[tenant_routing] lru_idle_timeout_secs
GATEWAY_TENANT_ROUTING_PRIVATE_KEY_PATH[tenant_routing] private_key_path

GenAI telemetry

Env varOverrides
GATEWAY_GENAI_TELEMETRY_ENABLED[genai_telemetry] enabled
GATEWAY_GENAI_OTLP_ENDPOINT[genai_telemetry] otlp_endpoint
GATEWAY_GENAI_SERVICE_NAME[genai_telemetry] service_name
GATEWAY_GENAI_ENVIRONMENT[genai_telemetry] environment
GATEWAY_GENAI_SAMPLE_RATIO[genai_telemetry] sample_ratio
GATEWAY_GENAI_CAPTURE_CONTENT[genai_telemetry] capture_content
GATEWAY_GENAI_OTLP_BEARER_TOKEN[genai_telemetry] otlp_bearer_token

Logging

Control log verbosity with the RUST_LOG environment variable:
RUST_LOG=debug cargo run -p gateway
RUST_LOG=gateway_core=trace,info cargo run -p gateway