Streaming
The gateway has two streaming code paths, selected per-request based on whether the client’s SDK format matches the upstream provider’s SSE wire format. Both paths produce identical telemetry and apply the same output guardrail checks.Two paths
Fast path — zero-copy SSE passthrough
When the caller’s SDK format matches the upstream provider, raw SSE bytes are forwarded directly to the client. There is no JSON parsing and no re-serialisation in the response path. This is what gives the gateway its sub-millisecond overhead. The adapter’sadapt_sse_chunk is a single Bytes::copy_from_slice. A parallel line buffer feeds process_raw_sse_line to extract telemetry (usage, model, response ID, content deltas) from a single JSON parse per SSE line — without constructing intermediate StreamEvent objects.
When the fast path is used:
- OpenAI client → OpenAI upstream (wire formats match)
- OpenAI client → xAI upstream (xAI uses OpenAI-compatible SSE)
Normalised path — parse, transform, re-serialise
When the SSE wire formats are incompatible, each upstream SSE line goes through three stages:- Parse —
parse_sse_to_eventsconverts provider-native SSE into provider-agnosticStreamEvents - Record —
record_eventextracts telemetry from each event (tokens, TTFT, content deltas) - Serialise —
ResponseSerializer::serialize_stream_eventconverts events into the client’s wire format
choices[].delta and Anthropic’s content_block_delta event types.
When the normalised path is used:
- Anthropic client → OpenAI upstream (incompatible SSE wire formats)
- Anthropic client → Anthropic upstream (all Anthropic SDK ingress uses the normalised path)
Path selection matrix
| Client SDK | Upstream Provider | Path | Reason |
|---|---|---|---|
| OpenAI / xAI | OpenAI | Fast | Wire formats match |
| OpenAI / xAI | xAI | Fast | xAI uses OpenAI-compatible SSE |
| OpenAI / xAI | Anthropic | Normalised | Anthropic provider always requires parse → transform → re-serialise |
| OpenAI / xAI | Cohere | Normalised | Cohere provider always requires parse → transform → re-serialise |
| Anthropic | Any | Normalised | Anthropic ingress always uses the normalised path |
Output guardrails during streaming
Both paths accumulate response text in a bounded buffer (64 KB cap) for post-stream security checks.| Mode | Behaviour |
|---|---|
| Non-streaming | Full response inspected before delivery. Violations return 400 Bad Request. |
| Streaming (either path) | Chunks are sent as they arrive. After the stream completes, accumulated text is checked. Since chunks have already been delivered, violations are logged for monitoring/alerting rather than blocking. |
MAX_ACCUMULATED_CONTENT_BYTES) prevents memory exhaustion on long-running streams. Content beyond this limit is not checked.
Telemetry equivalence
Both paths produce identical telemetry signals:- GenAI span with model, tokens, finish reason, and timing attributes
- TTFT metric (
gen_ai.server.time_to_first_token) — measured fromStreamStartto firstContentDelta - Content capture (when enabled) — accumulated from
ContentDeltaevents, capped at 64 KB - Tool call child spans — opened on
ToolCallStart, closed onToolCallEnd(streaming) or synthetically distributed (non-streaming)
Key types
| Type | Role |
|---|---|
SseStreamState | Per-stream state: line buffer, telemetry accumulator, output guardrail text buffer |
StreamInstrumentation | GenAI span lifecycle and metrics recording, shared between stream body and flush closure via Arc<Mutex<>> |
ProviderAdapter::process_raw_sse_line | Fast-path telemetry extraction (overridden by the OpenAI adapter) |
ProviderAdapter::parse_sse_to_events | Normalised-path SSE parsing into StreamEvents |