Documentation Index
Fetch the complete documentation index at: https://to11.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Streaming
The gateway has two streaming code paths, selected per-request based on whether the client’s SDK format matches the upstream provider’s SSE wire format. Both paths produce identical telemetry and apply the same output guardrail checks.
Two paths
┌─────────────────────────────────┐
│ needs_stream_normalization()? │
└──────┬──────────────┬───────────┘
│ false │ true
┌───────▼───────┐ ┌────▼────────────────────────────┐
│ Fast path │ │ Normalised path │
│ (passthrough)│ │ (parse → StreamEvent → serialise)│
└───────────────┘ └─────────────────────────────────┘
Fast path — zero-copy SSE passthrough
When the caller’s SDK format matches the upstream provider, raw SSE bytes are forwarded directly to the client. There is no JSON parsing and no re-serialisation in the response path. This is what gives the gateway its sub-millisecond overhead.
The adapter’s adapt_sse_chunk is a single Bytes::copy_from_slice. A parallel line buffer feeds process_raw_sse_line to extract telemetry (usage, model, response ID, content deltas) from a single JSON parse per SSE line — without constructing intermediate StreamEvent objects.
When the fast path is used:
- OpenAI client → OpenAI upstream (wire formats match)
- OpenAI client → xAI upstream (xAI uses OpenAI-compatible SSE)
When the SSE wire formats are incompatible, each upstream SSE line goes through three stages:
- Parse —
parse_sse_to_events converts provider-native SSE into provider-agnostic StreamEvents
- Record —
record_event extracts telemetry from each event (tokens, TTFT, content deltas)
- Serialise —
ResponseSerializer::serialize_stream_event converts events into the client’s wire format
This path handles the structural differences between OpenAI’s choices[].delta and Anthropic’s content_block_delta event types.
When the normalised path is used:
- Anthropic client → OpenAI upstream (incompatible SSE wire formats)
- Anthropic client → Anthropic upstream (all Anthropic SDK ingress uses the normalised path)
Path selection matrix
| Client SDK | Upstream Provider | Path | Reason |
|---|
| OpenAI / xAI | OpenAI | Fast | Wire formats match |
| OpenAI / xAI | xAI | Fast | xAI uses OpenAI-compatible SSE |
| OpenAI / xAI | Anthropic | Normalised | Anthropic provider always requires parse → transform → re-serialise |
| OpenAI / xAI | Cohere | Normalised | Cohere provider always requires parse → transform → re-serialise |
| Anthropic | Any | Normalised | Anthropic ingress always uses the normalised path |
Output guardrails during streaming
Both paths accumulate response text in a bounded buffer (64 KB cap) for post-stream security checks.
| Mode | Behaviour |
|---|
| Non-streaming | Full response inspected before delivery. Violations return 400 Bad Request. |
| Streaming (either path) | Chunks are sent as they arrive. After the stream completes, accumulated text is checked. Since chunks have already been delivered, violations are logged for monitoring/alerting rather than blocking. |
The 64 KB cap (MAX_ACCUMULATED_CONTENT_BYTES) prevents memory exhaustion on long-running streams. Content beyond this limit is not checked.
Telemetry equivalence
Both paths produce identical telemetry signals:
- GenAI span with model, tokens, finish reason, and timing attributes
- TTFT metric (
gen_ai.server.time_to_first_token) — measured from StreamStart to first ContentDelta
- Content capture (when enabled) — accumulated from
ContentDelta events, capped at 64 KB
- Tool call child spans — opened on
ToolCallStart, closed on ToolCallEnd (streaming) or synthetically distributed (non-streaming)
See the Telemetry reference for the full attribute and metric catalog.
Key types
| Type | Role |
|---|
SseStreamState | Per-stream state: line buffer, telemetry accumulator, output guardrail text buffer |
StreamInstrumentation | GenAI span lifecycle and metrics recording, shared between stream body and flush closure via Arc<Mutex<>> |
ProviderAdapter::process_raw_sse_line | Fast-path telemetry extraction (overridden by the OpenAI adapter) |
ProviderAdapter::parse_sse_to_events | Normalised-path SSE parsing into StreamEvents |