Documentation Index
Fetch the complete documentation index at: https://to11.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Configuration Reference
The gateway loads its configuration from a TOML file. By default it reads config/gateway.toml relative to the working directory. All sections except [server] are optional.
For an explanation of routing concepts, see Routing. For provider-specific details, see Providers.
Full config reference
[server]
host = "127.0.0.1"
port = 4000
[defaults.provider]
timeout_ms = 30000
[providers.openai]
base_url = "https://api.openai.com/v1"
models = ["gpt-4o", "gpt-4o-mini", "o3"]
timeout_ms = 30000
# credential defaults to "env::OPENAI_API_KEY" for the openai provider
[providers.anthropic]
base_url = "https://api.anthropic.com/v1"
credential = "env::ANTHROPIC_API_KEY"
models = ["claude-sonnet-4-6", "claude-haiku-4-5"]
timeout_ms = 60000
[models.o3]
timeout_ms = 120000
[targets.openai-primary]
model = "gpt-4o"
weight = 70
credential = "env::MANAGED_OPENAI_KEY"
[targets.openai-fallback]
model = "gpt-4o"
weight = 30
credential = "env::MANAGED_OPENAI_KEY_2"
[routes.balanced-gpt4o]
endpoint = "chat"
models = ["gpt-4o"]
strategy = "weighted"
targets = ["openai-primary", "openai-fallback"]
[functions.summarize]
endpoint = "chat"
strategy = "fallback"
models = ["gpt-4o", "claude-sonnet-4-6"]
[functions.summarize.retry]
max_retries = 4
backoff_base_ms = 1000
[routing.retry]
max_retries = 2
backoff_base_ms = 500
[security]
enabled = true
blocklist = ["ignore previous instructions", "dan mode"]
# [tenant_routing]
# enabled = true
# redis_url = "redis://localhost:6379"
# lru_capacity = 1000
# anti_entropy_interval_secs = 300
# api_fallback_url = "http://localhost:4500/internal/routing/configs"
# api_fallback_timeout_ms = 5000
# api_fallback_auth_token = ""
# lru_idle_timeout_secs = 3600
# private_key_path = "/etc/gateway/x25519.key"
[telemetry]
enabled = true
otlp_endpoint = "http://localhost:4317"
service_name = "gateway"
environment = "production"
sample_ratio = 0.1
log_level = "info"
[genai_telemetry]
enabled = false
otlp_endpoint = "http://localhost:4317"
service_name = "gateway-genai-telemetry"
environment = "production"
sample_ratio = 1.0
metrics_enabled = true
metrics_export_interval_secs = 30
metrics_export_timeout_secs = 10
capture_content = true
otlp_bearer_token = ""
Section reference
[server]
Network binding for the gateway process.
| Field | Type | Default | Description |
|---|
host | string | "127.0.0.1" | Bind address |
port | integer | 4000 | Listen port |
[defaults.provider]
Global defaults applied to every provider unless the provider overrides them explicitly.
| Field | Type | Default | Description |
|---|
timeout_ms | integer | — | Default request timeout in milliseconds for all providers |
auth_type | string | — | Default authentication method: bearer, api_key_header, or query_param |
[providers.*]
Each named table registers a provider and the models it serves. The table key is the provider identifier (e.g., openai, anthropic, groq).
| Field | Type | Required | Description |
|---|
base_url | string | Yes | Provider API base URL |
models | array | Yes | Model names this provider handles |
credential | string | No | Credential location (e.g., "env::OPENAI_API_KEY", "none"). When omitted, uses the per-provider convention default. |
timeout_ms | integer | No | Request timeout in milliseconds. Overrides [defaults.provider].timeout_ms. |
auth_type | string | No | Authentication method: bearer (default), api_key_header, or query_param. Overrides [defaults.provider].auth_type. |
auth_header_name | string | No | Custom header name when auth_type is not standard (e.g., "api-key" for Azure OpenAI) |
When credential is omitted, the gateway uses a per-provider convention based on the table key. For example, the openai provider defaults to env::OPENAI_API_KEY, and the anthropic provider defaults to env::ANTHROPIC_API_KEY. Set credential = "none" to disable credential resolution entirely for a provider.
[models.*]
Per-model overrides. The table key is the model name as it appears in a provider’s models array.
| Field | Type | Default | Description |
|---|
timeout_ms | integer | — | Request timeout in milliseconds. Overrides the provider timeout. |
The timeout resolution order is: [models.*].timeout_ms > [providers.*].timeout_ms > [defaults.provider].timeout_ms > hardcoded default (30 000 ms).
[targets.*]
Each named table declares a managed routing target — a specific model paired with gateway-owned credentials. The table key is the target identifier referenced by routes and functions.
| Field | Type | Required | Description |
|---|
model | string | Yes | Upstream model name (must exist in a provider’s models array) |
credential | string | No | Credential location for this target. Overrides the provider-level credential. |
weight | integer | No | Weight for the weighted strategy (default: 1) |
timeout_ms | integer | No | Request timeout in milliseconds. Overrides the provider timeout for this target. |
When credential is set to an "env::VAR_NAME" location, the environment variable must be present at startup. The gateway fails fast on missing managed credentials rather than discovering the problem at request time.
[routes.*]
Each named table defines a managed route that maps one or more model names to a set of targets with a selection strategy. The table key is the route identifier.
| Field | Type | Required | Description |
|---|
endpoint | string | Yes | Endpoint type: chat, embeddings, audio_speech, audio_transcription, or image_generation |
models | array | Yes | Model names this route handles |
strategy | string | Yes | Selection strategy: single, weighted, or fallback |
targets | array | Yes* | Target names for simple routes |
steps | array | No | Multi-step route definition (overrides targets). Each step has its own strategy and targets. |
When steps is present, each step defines its own strategy and targets. Steps execute as an ordered fallback chain. The top-level targets field is ignored when steps is provided.
[routes.*.retry]
Per-route retry configuration. When present, overrides the global [routing.retry] settings for this route.
| Field | Type | Default | Description |
|---|
max_retries | integer | 2 | Maximum retry attempts |
backoff_base_ms | integer | 500 | Base backoff in milliseconds |
[functions.*]
Each named table defines a function — a named routing alias that maps a task name to a strategy and set of models or targets. The table key is the function name (e.g., summarize, extract-entities).
Callers invoke a function by sending model: "function::summarize" or, when no provider or route shares the name, simply model: "summarize" (resolved via top-down lookup).
| Field | Type | Required | Description |
|---|
endpoint | string | Yes | Endpoint type: chat, embeddings, audio_speech, audio_transcription, or image_generation |
strategy | string | Yes | Selection strategy: single, weighted, fallback, or experiment |
models | array | No | Inline model shorthand — model names resolved through configured providers |
targets | array | No | Named target references (for weights, custom credentials, or timeouts) |
steps | array | No | Multi-step definition. Each step has its own strategy and targets. |
models and targets are mutually exclusive on a function. When models is used, the gateway creates ephemeral targets internally by resolving each model name through the provider configuration. Each inline model must exist in exactly one provider’s models array.
[functions.*.variants.*]
When strategy = "experiment", variants define the A/B test arms. Each named table under variants is a variant configuration.
| Field | Type | Required | Description |
|---|
model | string | Yes* | Model name for this variant. Mutually exclusive with target. |
target | string | No | Named target reference. Mutually exclusive with model. |
weight | integer | Yes | Relative weight for random selection. Must be > 0. |
| (params) | varies | No | Additional key-value pairs override request parameters (e.g., temperature, max_tokens). |
Params are validated against an allowlist per endpoint type. Unknown params pass through with a startup warning. Protected fields (model, messages, input, file, prompt, stream) cannot be set as params.
EndpointType enum
The endpoint field on functions and routes accepts one of:
| Value | HTTP Endpoint |
|---|
chat | /v1/chat/completions, /v1/responses, /v1/messages |
embeddings | /v1/embeddings |
audio_speech | /v1/audio/speech |
audio_transcription | /v1/audio/transcriptions |
image_generation | /v1/images/generations |
[functions.*.retry]
Per-function retry configuration. When present, overrides the global [routing.retry] settings for this function.
| Field | Type | Default | Description |
|---|
max_retries | integer | 2 | Maximum retry attempts |
backoff_base_ms | integer | 500 | Base backoff in milliseconds |
[routing]
Global routing behaviour settings. This section is optional — when omitted, all sub-sections use their defaults.
| Field | Type | Default | Description |
|---|
[routing.retry] | table | absent | Global retry configuration for managed routes and functions |
[routing.circuit_breaker] | table | — | Deprecated. Accepted for backward compatibility but ignored at runtime. |
[routing.retry]
Global retry settings applied to all managed routes and functions that do not declare their own [routes.*.retry] or [functions.*.retry] block.
| Field | Type | Default | Description |
|---|
max_retries | integer | 2 | Maximum retry attempts |
backoff_base_ms | integer | 500 | Base backoff in milliseconds |
The retry layer wraps every managed-route attempt with exponential backoff (powered by the backon crate).
[routing.circuit_breaker] (deprecated)
The [routing.circuit_breaker] section is accepted for backward compatibility but ignored at runtime. The circuit breaker was removed in PR #232 and replaced by stateless fast sequential failover via FallbackStrategy combined with retry and exponential backoff.Use [routing.retry] with the fallback strategy instead. If enabled = true is set, a deprecation warning is logged at startup.
| Field | Type | Default | Description |
|---|
enabled | bool | false | Deprecated. Accepted but ignored. |
failure_threshold | integer | 5 | Deprecated. Accepted but ignored. |
recovery_timeout_ms | integer | 30000 | Deprecated. Accepted but ignored. |
[security]
Inline security pipeline for input guardrails.
| Field | Type | Default | Description |
|---|
enabled | bool | false | Enable the inline security pipeline |
blocklist | array | [] | Blocked terms for case-insensitive substring matching |
When enabled, the security pipeline runs PII detection (always on) and blocklist matching on every request before it reaches the upstream provider. See Security Overview for details.
[telemetry]
Application-level telemetry (HTTP spans, latency). These traces go to your application observability stack (e.g., Tempo via Alloy).
| Field | Type | Default | Description |
|---|
enabled | bool | true | Enable application OTLP export |
otlp_endpoint | string | "http://localhost:4317" | gRPC OTLP endpoint |
service_name | string | "gateway" | OpenTelemetry service name |
environment | string | "production" | Deployment environment tag |
sample_ratio | float | 0.1 | Trace sampling ratio (0.0 to 1.0) |
log_level | string | "info" | Log level filter |
[genai_telemetry]
Dedicated GenAI provider telemetry sink. These traces and metrics go to the OTel Collector and on to ClickHouse.
| Field | Type | Default | Description |
|---|
enabled | bool | false | Enable GenAI OTLP export |
otlp_endpoint | string | "http://localhost:4317" | gRPC OTLP endpoint |
service_name | string | "gateway-genai-telemetry" | Service name for GenAI spans |
environment | string | "production" | Deployment environment tag |
sample_ratio | float | 1.0 | Trace sampling ratio (0.0 to 1.0) |
metrics_enabled | bool | true | Enable histogram metrics export |
metrics_export_interval_secs | integer | 30 | Metrics push interval in seconds |
metrics_export_timeout_secs | integer | 10 | Metrics push timeout in seconds |
capture_content | bool | true | Record prompt/completion text in spans |
otlp_bearer_token | string | "" | Bearer token for gRPC Authorization metadata |
[auth]
Optional per-request platform authentication for customer-hosted gateways. When enabled, the gateway exchanges a platform credential for a short-lived collector JWT and exports GenAI telemetry with that request-scoped token.
| Field | Type | Required | Default | Description |
|---|
enabled | bool | No | false | Enable platform auth exchange |
exchange_url | string | Yes* | — | Token exchange endpoint URL. Required when enabled = true. |
api_token | string | No | — | Fallback API token when x-to11-authorization header is absent |
project_id | string | No | — | Fallback project ID when x-to11-project-id header is absent |
cache_skew_seconds | integer | No | 60 | Token cache skew tolerance in seconds |
exchange_timeout_ms | integer | No | 2000 | HTTP timeout for the token exchange request in milliseconds |
[tenant_routing]
Per-tenant routing configuration for platform mode (self-hosted or managed SaaS). When enabled, each project gets its own routing config loaded from Redis with API fallback. This section is only relevant when running in multi-tenant platform mode.
| Field | Type | Required | Default | Description |
|---|
enabled | bool | No | false | Enable per-tenant routing from DB/Redis. When false, the gateway uses static TOML config. |
redis_url | string | No | "redis://localhost:6379" | Redis/Valkey connection URL for loading config snapshots |
lru_capacity | integer | No | 1000 | Maximum number of tenant configs cached in memory |
anti_entropy_interval_secs | integer | No | 300 | Interval between anti-entropy sweeps (seconds). Must be > 0; invalid values fall back to the default. |
api_fallback_url | string | No | — | Base URL for the API fallback endpoint (e.g., http://api:4500/internal/routing/configs) |
api_fallback_timeout_ms | integer | No | 5000 | Timeout for API fallback requests in milliseconds |
api_fallback_auth_token | string | No | — | Bearer token for authenticating API fallback requests. Sent as Authorization: Bearer <token>. |
lru_idle_timeout_secs | integer | No | 3600 | Evict cached configs idle longer than this (seconds) |
private_key_path | string | No | — | Path to 32-byte X25519 private key for decrypting Redis envelope-encrypted snapshots. When absent, the gateway expects plaintext JSON in Redis (dev mode only). |
The api_fallback_auth_token should be a service-scoped API key with routing:read permission. The API fallback endpoint (GET /internal/routing/configs/:projectId) requires this permission. In production, additionally restrict the endpoint at the network level (VPC, security groups) to gateway instances only.
When private_key_path is not set, the gateway operates in plaintext mode — config snapshots in Redis are not encrypted. This is acceptable for local development but must not be used in production where Redis may contain provider API keys.
[cache]
Response caching backed by Valkey (Redis-compatible). When enabled, identical LLM requests can return instantly from cache.
| Field | Type | Required | Default | Description |
|---|
enabled | bool | No | false | Enable response caching |
url | string | No | "redis://localhost:6379" | Valkey/Redis connection URL |
default_mode | string | No | "auto" | Cache mode: auto, always, or never |
ttl_seconds | integer | No | 604800 | Default TTL in seconds (1 week) |
max_ttl_seconds | integer | No | 604800 | Maximum allowed TTL — caps per-request overrides |
max_entry_size_bytes | integer | No | 1048576 | Maximum cached response size in bytes (1 MiB) |
pool_size | integer | No | 8 | Connection pool size |
encrypt | bool | No | true | AES-256-GCM encryption at rest |
encrypt_salt | string | No | "" | Server salt for HKDF key derivation. Required when encrypt = true. |
When encrypt is true, encrypt_salt must be a non-empty string. The gateway refuses to start if encryption is enabled without a salt.
Config file location
Override the default path with the GATEWAY_CONFIG environment variable:
GATEWAY_CONFIG=/path/to/config.toml cargo run --release -p gateway
Environment variable overrides
Environment variables take precedence over TOML values.
Application telemetry
| Env var | Overrides |
|---|
GATEWAY_TELEMETRY_ENABLED | [telemetry] enabled |
OTEL_EXPORTER_OTLP_ENDPOINT | [telemetry] otlp_endpoint |
SERVICE_NAME | [telemetry] service_name |
ENVIRONMENT | [telemetry] environment |
OTEL_TRACES_SAMPLER_RATIO | [telemetry] sample_ratio |
Tenant routing
| Env var | Overrides |
|---|
GATEWAY_TENANT_ROUTING_ENABLED | [tenant_routing] enabled |
GATEWAY_TENANT_ROUTING_REDIS_URL | [tenant_routing] redis_url |
GATEWAY_TENANT_ROUTING_LRU_CAPACITY | [tenant_routing] lru_capacity |
GATEWAY_TENANT_ROUTING_ANTI_ENTROPY_INTERVAL_SECS | [tenant_routing] anti_entropy_interval_secs |
GATEWAY_TENANT_ROUTING_API_FALLBACK_URL | [tenant_routing] api_fallback_url |
GATEWAY_TENANT_ROUTING_API_FALLBACK_TIMEOUT_MS | [tenant_routing] api_fallback_timeout_ms |
GATEWAY_TENANT_ROUTING_API_FALLBACK_AUTH_TOKEN | [tenant_routing] api_fallback_auth_token |
GATEWAY_TENANT_ROUTING_LRU_IDLE_TIMEOUT_SECS | [tenant_routing] lru_idle_timeout_secs |
GATEWAY_TENANT_ROUTING_PRIVATE_KEY_PATH | [tenant_routing] private_key_path |
GenAI telemetry
| Env var | Overrides |
|---|
GATEWAY_GENAI_TELEMETRY_ENABLED | [genai_telemetry] enabled |
GATEWAY_GENAI_OTLP_ENDPOINT | [genai_telemetry] otlp_endpoint |
GATEWAY_GENAI_SERVICE_NAME | [genai_telemetry] service_name |
GATEWAY_GENAI_ENVIRONMENT | [genai_telemetry] environment |
GATEWAY_GENAI_SAMPLE_RATIO | [genai_telemetry] sample_ratio |
GATEWAY_GENAI_CAPTURE_CONTENT | [genai_telemetry] capture_content |
GATEWAY_GENAI_OTLP_BEARER_TOKEN | [genai_telemetry] otlp_bearer_token |
Logging
Control log verbosity with the RUST_LOG environment variable:
RUST_LOG=debug cargo run -p gateway
RUST_LOG=gateway_core=trace,info cargo run -p gateway