Skip to main content

Documentation Index

Fetch the complete documentation index at: https://to11.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Streaming

The gateway has two streaming code paths, selected per-request based on whether the client’s SDK format matches the upstream provider’s SSE wire format. Both paths produce identical telemetry and apply the same output guardrail checks.

Two paths

                     ┌─────────────────────────────────┐
                     │  needs_stream_normalization()?   │
                     └──────┬──────────────┬───────────┘
                            │ false         │ true
                    ┌───────▼───────┐ ┌────▼────────────────────────────┐
                    │  Fast path    │ │  Normalised path                │
                    │  (passthrough)│ │  (parse → StreamEvent → serialise)│
                    └───────────────┘ └─────────────────────────────────┘

Fast path — zero-copy SSE passthrough

When the caller’s SDK format matches the upstream provider, raw SSE bytes are forwarded directly to the client. There is no JSON parsing and no re-serialisation in the response path. This is what gives the gateway its sub-millisecond overhead. The adapter’s adapt_sse_chunk is a single Bytes::copy_from_slice. A parallel line buffer feeds process_raw_sse_line to extract telemetry (usage, model, response ID, content deltas) from a single JSON parse per SSE line — without constructing intermediate StreamEvent objects. When the fast path is used:
  • OpenAI client → OpenAI upstream (wire formats match)
  • OpenAI client → xAI upstream (xAI uses OpenAI-compatible SSE)

Normalised path — parse, transform, re-serialise

When the SSE wire formats are incompatible, each upstream SSE line goes through three stages:
  1. Parseparse_sse_to_events converts provider-native SSE into provider-agnostic StreamEvents
  2. Recordrecord_event extracts telemetry from each event (tokens, TTFT, content deltas)
  3. SerialiseResponseSerializer::serialize_stream_event converts events into the client’s wire format
This path handles the structural differences between OpenAI’s choices[].delta and Anthropic’s content_block_delta event types. When the normalised path is used:
  • Anthropic client → OpenAI upstream (incompatible SSE wire formats)
  • Anthropic client → Anthropic upstream (all Anthropic SDK ingress uses the normalised path)

Path selection matrix

Client SDKUpstream ProviderPathReason
OpenAI / xAIOpenAIFastWire formats match
OpenAI / xAIxAIFastxAI uses OpenAI-compatible SSE
OpenAI / xAIAnthropicNormalisedAnthropic provider always requires parse → transform → re-serialise
OpenAI / xAICohereNormalisedCohere provider always requires parse → transform → re-serialise
AnthropicAnyNormalisedAnthropic ingress always uses the normalised path

Output guardrails during streaming

Both paths accumulate response text in a bounded buffer (64 KB cap) for post-stream security checks.
ModeBehaviour
Non-streamingFull response inspected before delivery. Violations return 400 Bad Request.
Streaming (either path)Chunks are sent as they arrive. After the stream completes, accumulated text is checked. Since chunks have already been delivered, violations are logged for monitoring/alerting rather than blocking.
The 64 KB cap (MAX_ACCUMULATED_CONTENT_BYTES) prevents memory exhaustion on long-running streams. Content beyond this limit is not checked.

Telemetry equivalence

Both paths produce identical telemetry signals:
  • GenAI span with model, tokens, finish reason, and timing attributes
  • TTFT metric (gen_ai.server.time_to_first_token) — measured from StreamStart to first ContentDelta
  • Content capture (when enabled) — accumulated from ContentDelta events, capped at 64 KB
  • Tool call child spans — opened on ToolCallStart, closed on ToolCallEnd (streaming) or synthetically distributed (non-streaming)
See the Telemetry reference for the full attribute and metric catalog.

Key types

TypeRole
SseStreamStatePer-stream state: line buffer, telemetry accumulator, output guardrail text buffer
StreamInstrumentationGenAI span lifecycle and metrics recording, shared between stream body and flush closure via Arc<Mutex<>>
ProviderAdapter::process_raw_sse_lineFast-path telemetry extraction (overridden by the OpenAI adapter)
ProviderAdapter::parse_sse_to_eventsNormalised-path SSE parsing into StreamEvents