Skip to main content

Documentation Index

Fetch the complete documentation index at: https://to11.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

AI Gateway Overview

The to11 AI Gateway is a Rust-based LLM reverse proxy built with Axum, Tokio, and Hyper. It sits between your application and upstream LLM providers, adding less than 1 ms of overhead while providing three-layer routing, inline security, cross-format streaming, and GenAI telemetry.

Architecture

Your app / SDK
    |  OpenAI, Anthropic, or xAI format
    v
+------------------------------------------+
|  AI Gateway  (Rust, :4000)               |
|                                          |
|  L3: Function routing  (task aliases)    |
|  L2: Managed routing   (strategies)      |
|  L1: Passthrough       (transparent)     |
|                                          |
|  - Cross-format detection & translation  |
|  - Inline security pipeline              |
|  - Streaming (fast-path + normalised)    |
|  - W3C Trace Context propagation         |
+--------+-----------+---------------------+
         |           |
    HTTPS|           | gRPC OTLP
         |           |
         v           v
   LLM Provider   OTel Collector (:4317)
   (OpenAI, etc)    |  Exports GenAI spans
                    |  via ClickHouse exporter
                    v
                ClickHouse
                  (analytics)

Three-Layer Routing

The gateway implements a layered routing model. Each layer is independently useful and backward compatible — you opt in to higher layers as you need them.
LayerRouting keyWhat the gateway ownsConfig
L1: PassthroughModel nameNothing — proxies to provider[providers.*]
L2: Managed routingModel nameCredentials + routing strategy[targets.*] + [routes.*]
L3: Function routingFunction nameCredentials + routing + task abstraction[functions.*]
Requests resolve top-down: function match → route match → provider match. See Routing Overview for the full resolution flow and namespace prefix system.

Key Capabilities

Cross-Format Routing

The gateway accepts requests in OpenAI, Anthropic, or xAI format and can route them to any upstream provider. The response is always returned in the caller’s native format. For example, you can send an OpenAI-format request that routes to Anthropic’s Claude, and receive the response in OpenAI format — your application code doesn’t need to change.

Streaming

Two code paths optimised for different scenarios:
  • Fast path — When the caller’s SDK format matches the upstream provider, SSE chunks are forwarded as-is with zero-copy passthrough. Sub-millisecond overhead.
  • Normalised path — When formats differ, SSE is parsed into canonical StreamEvents and re-serialised to the caller’s format.

Inline Security

The security pipeline runs synchronously in the request path with fail-fast ordering:
  1. Blocklist — Case-insensitive substring matching (~nanoseconds)
  2. PII Detection — Regex patterns for SSN, credit cards, phone numbers, IPs, emails (~500ns)
  3. ONNX ML Detectors — Prompt injection and content moderation (Phase 2)
Blocked requests return 400 Bad Request and are never forwarded upstream.

Telemetry

Every LLM call produces OpenTelemetry GenAI semantic convention spans and metrics:
  • Model, tokens, latency, finish reason
  • TTFT (time to first token) for streaming
  • Tool call child spans with timing
  • Optional prompt/completion content capture
Both app-level traces (to Tempo) and GenAI telemetry (to ClickHouse) are emitted via separate OTLP exporters.

Request Flow

HTTP Request (traceparent)
  -> Set/Propagate Request ID Layer
  -> TraceLayer Span (extract traceparent)
  -> Handler Span (gateway-core)
  -> Security Pipeline
  -> Three-Layer Route Resolution
       L3: Function match?  -> function dispatch
       L2: Route match?     -> managed route dispatch
       L1: Provider match?  -> passthrough
  -> GenAI Span (child of HTTP trace)
  -> Traced HTTP Client (inject traceparent)
  -> Downstream API (OpenAI/Anthropic)
  -> Adapter Response Normalisation
  -> HTTP Response (traceparent injected)

Crate Layout

The gateway is organised as a Cargo workspace with isolated responsibilities:
CrateResponsibility
gatewayBinary entry point, CLI, benchmarks
gateway-coreAxum router, provider adapters, SSE streaming, routing, auth, caching
gateway-configTOML config loading, credential resolution, routing config
gateway-securityDetector trait, PII + blocklist + ONNX pipeline
gateway-telemetryOTel init, middleware layers, context propagation
The key design decision: gateway-security owns the ort (ONNX Runtime) dependency so changes to routing, config, or core never trigger ONNX recompilation.

Next Steps

Quickstart

Build and run the gateway from source.

Routing

Three-layer routing: passthrough, managed, and function routing.

Concepts

Providers, models, targets, routes, and functions.

Configuration

Full TOML configuration reference.