AI Gateway Overview

The to11 AI Gateway is a Rust-based LLM reverse proxy built with Axum, Tokio, and Hyper. It sits between your application and upstream LLM providers, adding less than 1 ms of overhead while providing three-layer routing, inline security, cross-format streaming, and GenAI telemetry.

Architecture

Your app / SDK
    |  OpenAI, Anthropic, or xAI format
    v
+------------------------------------------+
|  AI Gateway  (Rust, :4000)               |
|                                          |
|  L3: Function routing  (task aliases)    |
|  L2: Managed routing   (strategies)      |
|  L1: Passthrough       (transparent)     |
|                                          |
|  - Cross-format detection & translation  |
|  - Inline security pipeline              |
|  - Streaming (fast-path + normalised)    |
|  - W3C Trace Context propagation         |
+--------+-----------+---------------------+
         |           |
    HTTPS|           | gRPC OTLP
         |           |
         v           v
   LLM Provider   OTel Collector (:4317)
   (OpenAI, etc)    |  Exports GenAI spans
                    |  via ClickHouse exporter
                    v
                ClickHouse
                  (analytics)

Three-Layer Routing

The gateway implements a layered routing model. Each layer is independently useful and backward compatible — you opt in to higher layers as you need them.

Layer	Routing key	What the gateway owns	Config
L1: Passthrough	Model name	Nothing — proxies to provider	`[providers.*]`
L2: Managed routing	Model name	Credentials + routing strategy	`[targets.]` + `[routes.]`
L3: Function routing	Function name	Credentials + routing + task abstraction	`[functions.*]`

Requests resolve top-down: function match → route match → provider match. See Routing Overview for the full resolution flow and namespace prefix system.

Key Capabilities

Cross-Format Routing

The gateway accepts requests in OpenAI, Anthropic, or xAI format and can route them to any upstream provider. The response is always returned in the caller’s native format. For example, you can send an OpenAI-format request that routes to Anthropic’s Claude, and receive the response in OpenAI format — your application code doesn’t need to change.

Streaming

Two code paths optimised for different scenarios:

Fast path — When the caller’s SDK format matches the upstream provider, SSE chunks are forwarded as-is with zero-copy passthrough. Sub-millisecond overhead.
Normalised path — When formats differ, SSE is parsed into canonical StreamEvents and re-serialised to the caller’s format.

Inline Security

The security pipeline runs synchronously in the request path with fail-fast ordering:

Blocklist — Case-insensitive substring matching (~nanoseconds)
PII Detection — Regex patterns for SSN, credit cards, phone numbers, IPs, emails (~500ns)
ONNX ML Detectors — Prompt injection and content moderation (Phase 2)

Blocked requests return 400 Bad Request and are never forwarded upstream.

Telemetry

Every LLM call produces OpenTelemetry GenAI semantic convention spans and metrics:

Model, tokens, latency, finish reason
TTFT (time to first token) for streaming
Tool call child spans with timing
Optional prompt/completion content capture

Both app-level traces (to Tempo) and GenAI telemetry (to ClickHouse) are emitted via separate OTLP exporters.

Request Flow

HTTP Request (traceparent)
  -> Set/Propagate Request ID Layer
  -> TraceLayer Span (extract traceparent)
  -> Handler Span (gateway-core)
  -> Security Pipeline
  -> Three-Layer Route Resolution
       L3: Function match?  -> function dispatch
       L2: Route match?     -> managed route dispatch
       L1: Provider match?  -> passthrough
  -> GenAI Span (child of HTTP trace)
  -> Traced HTTP Client (inject traceparent)
  -> Downstream API (OpenAI/Anthropic)
  -> Adapter Response Normalisation
  -> HTTP Response (traceparent injected)

Crate Layout

The gateway is organised as a Cargo workspace with isolated responsibilities:

Crate	Responsibility
`gateway`	Binary entry point, CLI, benchmarks
`gateway-core`	Axum router, provider adapters, SSE streaming, routing, auth, caching
`gateway-config`	TOML config loading, credential resolution, routing config
`gateway-security`	Detector trait, PII + blocklist + ONNX pipeline
`gateway-telemetry`	OTel init, middleware layers, context propagation

The key design decision: gateway-security owns the ort (ONNX Runtime) dependency so changes to routing, config, or core never trigger ONNX recompilation.

Next Steps

Quickstart

Build and run the gateway from source.

Routing

Three-layer routing: passthrough, managed, and function routing.

Concepts

Providers, models, targets, routes, and functions.

Configuration

Full TOML configuration reference.

Get Started

Concepts

Routing

Reference

Security

Telemetry

AI Gateway Overview

AI Gateway Overview

Architecture

Three-Layer Routing

Key Capabilities

Cross-Format Routing

Streaming

Inline Security

Telemetry

Request Flow

Crate Layout

Next Steps

Quickstart

Routing

Concepts

Configuration

Get Started

Concepts

Routing

Reference

Security

Telemetry

Documentation Index

​AI Gateway Overview

​Architecture

​Three-Layer Routing

​Key Capabilities

​Cross-Format Routing

​Streaming

​Inline Security

​Telemetry

​Request Flow

​Crate Layout

​Next Steps

Quickstart

Routing

Concepts

Configuration

AI Gateway Overview

Architecture

Three-Layer Routing

Key Capabilities

Cross-Format Routing

Streaming

Inline Security

Telemetry

Request Flow

Crate Layout

Next Steps