"stream": true on any chat request to receive the response as Server-Sent Events (Content-Type: text/event-stream). The gateway forwards tokens to your client as they arrive from the upstream provider.
Internally the gateway picks one of two streaming paths per request, based on whether your SDK’s wire format already matches the upstream provider’s. Both paths produce the same telemetry and apply the same guardrail checks — the difference is only how much work happens on the response.
Two paths
Fast path — SSE passthrough
When your SDK’s format matches the upstream provider, the gateway forwards the raw SSE bytes straight through. There is no re-serialization on the response path, which is what keeps streaming overhead low. Telemetry (token usage, model, finish reason, content) is still extracted from the stream as it passes.Normalized path — parse, translate, re-emit
When the formats differ, each upstream event is parsed, translated into your SDK’s wire format, and re-emitted. This is what lets you stream an Anthropic model through an OpenAI client (and vice versa) without your SDK seeing an unfamiliar event shape.Path selection
| Client SDK | Upstream provider | Path | Reason |
|---|---|---|---|
| OpenAI / xAI | OpenAI | Fast | Wire formats match |
| OpenAI / xAI | xAI | Fast | xAI uses OpenAI-compatible SSE |
| OpenAI / xAI | Anthropic | Normalized | Anthropic events are translated to the OpenAI shape |
| OpenAI / xAI | Cohere | Normalized | Cohere events are translated to the OpenAI shape |
| Anthropic | Any | Normalized | Anthropic ingress always uses the normalized path |
Requesting a stream
Guardrails during streaming
When output guardrails are enabled, the gateway accumulates response text (up to 64 KB) to inspect it. The enforcement point differs between streaming and non-streaming because streamed chunks have already left the gateway by the time the full response is known.| Mode | Behavior |
|---|---|
| Non-streaming | The full response is inspected before delivery. A violation returns 422 Unprocessable Entity and the response is withheld. |
| Streaming | Chunks are delivered as they arrive. After the stream completes, accumulated text is checked; because the content has already been sent, a violation is recorded for monitoring rather than blocking the response. |
Mid-stream errors
If an upstream fails after the stream has already started (HTTP200 with text/event-stream headers sent), the gateway delivers the error as an SSE error frame in your SDK’s shape and stops the stream — it does not emit a false success terminator. See Error handling for the frame format per SDK.
Telemetry
Streaming and non-streaming requests produce the same GenAI span — model, token counts, finish reason, and timing — and the same metrics. Streaming additionally records time-to-first-token asgen_ai.server.time_to_first_token, measured from the start of the response to the first content token.
See the Telemetry reference for the full attribute and metric catalog.