Documentation Index
Fetch the complete documentation index at: https://to11.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
AI Gateway Overview
The to11 AI Gateway is a Rust-based LLM reverse proxy built with Axum, Tokio, and Hyper. It sits between your application and upstream LLM providers, adding less than 1 ms of overhead while providing three-layer routing, inline security, cross-format streaming, and GenAI telemetry.Architecture
Three-Layer Routing
The gateway implements a layered routing model. Each layer is independently useful and backward compatible — you opt in to higher layers as you need them.| Layer | Routing key | What the gateway owns | Config |
|---|---|---|---|
| L1: Passthrough | Model name | Nothing — proxies to provider | [providers.*] |
| L2: Managed routing | Model name | Credentials + routing strategy | [targets.*] + [routes.*] |
| L3: Function routing | Function name | Credentials + routing + task abstraction | [functions.*] |
Key Capabilities
Cross-Format Routing
The gateway accepts requests in OpenAI, Anthropic, or xAI format and can route them to any upstream provider. The response is always returned in the caller’s native format. For example, you can send an OpenAI-format request that routes to Anthropic’s Claude, and receive the response in OpenAI format — your application code doesn’t need to change.Streaming
Two code paths optimised for different scenarios:- Fast path — When the caller’s SDK format matches the upstream provider, SSE chunks are forwarded as-is with zero-copy passthrough. Sub-millisecond overhead.
- Normalised path — When formats differ, SSE is parsed into canonical
StreamEvents and re-serialised to the caller’s format.
Inline Security
The security pipeline runs synchronously in the request path with fail-fast ordering:- Blocklist — Case-insensitive substring matching (~nanoseconds)
- PII Detection — Regex patterns for SSN, credit cards, phone numbers, IPs, emails (~500ns)
- ONNX ML Detectors — Prompt injection and content moderation (Phase 2)
400 Bad Request and are never forwarded upstream.
Telemetry
Every LLM call produces OpenTelemetry GenAI semantic convention spans and metrics:- Model, tokens, latency, finish reason
- TTFT (time to first token) for streaming
- Tool call child spans with timing
- Optional prompt/completion content capture
Request Flow
Crate Layout
The gateway is organised as a Cargo workspace with isolated responsibilities:| Crate | Responsibility |
|---|---|
gateway | Binary entry point, CLI, benchmarks |
gateway-core | Axum router, provider adapters, SSE streaming, routing, auth, caching |
gateway-config | TOML config loading, credential resolution, routing config |
gateway-security | Detector trait, PII + blocklist + ONNX pipeline |
gateway-telemetry | OTel init, middleware layers, context propagation |
gateway-security owns the ort (ONNX Runtime) dependency so changes to routing, config, or core never trigger ONNX recompilation.
Next Steps
Quickstart
Build and run the gateway from source.
Routing
Three-layer routing: passthrough, managed, and function routing.
Concepts
Providers, models, targets, routes, and functions.
Configuration
Full TOML configuration reference.