Skip to main content

Documentation Index

Fetch the complete documentation index at: https://to11.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Content Capture

Content capture records the full text of prompts and completions as span attributes. This is useful for debugging, evaluation, and quality monitoring — you can search for specific prompt patterns, compare model outputs across variants, and build evaluation datasets from production traffic. Content capture is enabled by default. Prompt and completion text is recorded verbatim on GenAI spans and stored in your configured telemetry backend.

Disable content capture

If you need to prevent prompt and completion text from being stored (e.g. for PII compliance), disable content capture in the gateway configuration or via an environment variable.
[genai_telemetry]
capture_content = false  # default: true
Or via environment variable:
GATEWAY_GENAI_CAPTURE_CONTENT=false
When enabled, content capture records the full prompt and completion text including any sensitive data such as PII, API keys, or confidential instructions. Only enable in development or trusted environments where you have appropriate data handling controls in place.

What’s captured

When content capture is active, the following attributes are added to each GenAI span:
AttributeDescription
gen_ai.input.messagesSerialised prompt messages from the request
gen_ai.output.messagesSerialised completion choices or accumulated stream content
gen_ai.system_instructionsSystem/developer prompt text, captured separately for easy filtering
The gen_ai.system_instructions attribute is recorded as a standalone string rather than embedded within gen_ai.input.messages. This separation makes it straightforward to query system prompts independently — for example, to audit which system prompts are in use across your fleet without parsing the full message array.

Media endpoint content capture

Content capture behaviour varies by endpoint because some endpoints handle binary data that cannot be meaningfully recorded as span attributes.
EndpointInput capturedOutput captured
Chat completionsFull messages arrayFull choices array (or streamed content)
EmbeddingsInput text— (vector data not captured)
Image generationprompt fieldrevised_prompt from DALL-E 3
Audio speech (TTS)input text— (binary audio)
Audio transcription— (binary audio)Transcription text field
For chat completions, both streaming and non-streaming responses are captured. Streaming responses are accumulated from individual deltas into the final content before recording. Embeddings capture only the input text; the output vector data is omitted as it is not human-readable and would be prohibitively large.

Size limits

Streaming responses are accumulated in memory before being recorded as span attributes. To prevent unbounded memory growth, the gateway enforces size limits on accumulated content.
PathLimit
Streaming accumulated content64 KB (MAX_ACCUMULATED_CONTENT_BYTES)
Streaming tool call arguments64 KB per tool (MAX_TOOL_CALL_ARGUMENT_BYTES)
Non-streaming responseFull response body captured (no limit)
When the streaming accumulation limit is reached, content is truncated. The span still records token counts, finish reasons, and all other attributes normally. Only the content capture attributes are affected by this limit.

Tool call content

When content capture is enabled, tool call child spans include additional attributes that record the function invocation details:
  • gen_ai.tool.call.arguments — the function arguments as JSON
  • gen_ai.input.messages — an object with the tool name and call ID
On the parent span, gen_ai.tool.definitions contains the JSON schema of all declared tools sent in the request. This attribute appears on the parent rather than on individual tool call child spans, since the tool definitions apply to the entire request. For requests that produce multiple tool calls, each child span carries its own gen_ai.tool.call.arguments value. The parent span’s gen_ai.tool.definitions is recorded once regardless of how many tool calls appear in the response.

Querying captured content

With content stored in ClickHouse, you can run SQL queries to search for specific patterns, debug unexpected outputs, or build evaluation datasets.
-- Find requests with specific content
SELECT
    Timestamp,
    SpanName,
    SpanAttributes['gen_ai.system_instructions'] AS system_prompt,
    SpanAttributes['gen_ai.input.messages'] AS input,
    SpanAttributes['gen_ai.output.messages'] AS output
FROM otel_traces
WHERE SpanAttributes['gen_ai.input.messages'] LIKE '%search%'
  AND Timestamp > now() - INTERVAL 1 HOUR
ORDER BY Timestamp DESC
LIMIT 20;
Because gen_ai.system_instructions is a separate attribute, you can efficiently filter by system prompt without parsing the full message JSON:
-- Audit system prompts across all requests
SELECT
    SpanAttributes['gen_ai.system_instructions'] AS system_prompt,
    SpanAttributes['gen_ai.request.model'] AS model,
    count() AS request_count
FROM otel_traces
WHERE SpanAttributes['gen_ai.system_instructions'] != ''
  AND Timestamp > now() - INTERVAL 24 HOUR
GROUP BY system_prompt, model
ORDER BY request_count DESC;
See Span Attributes for the complete attribute reference, including all content capture attributes and their types. See the configuration reference for all [genai_telemetry] fields.