Span Attributes

Every GenAI span emitted by the gateway follows the OpenTelemetry GenAI semantic conventions. The span name follows the pattern {operation} {model} (e.g. chat gpt-4o, embeddings text-embedding-3-small). This page is the complete catalog of every span attribute that may appear on your GenAI spans. Attributes are grouped by category: core GenAI semantics, routing, experimentation, client context, and caching. For how to configure and export your telemetry, see Telemetry Overview.

Conventions used in this reference

Type column uses OTel attribute types: string, int, float, bool.
string (JSON) denotes a string attribute whose value is a JSON-encoded structure.
string (JSON array) denotes a string attribute whose value is a JSON array.
“When set” means the attribute is omitted entirely when the source value is absent; it is never recorded as empty or zero.
Attributes in the gen_ai.* namespace follow the OTel GenAI semantic conventions. Attributes in the gateway.* namespace are gateway-specific extensions.

Endpoint instrumentation matrix

Not every endpoint produces GenAI telemetry. The table below shows which endpoints emit GenAI spans and metrics versus plain HTTP spans only. Endpoints marked “No” under GenAI Span still produce standard HTTP spans.

Endpoint	GenAI Span	Operation	Streaming	Content Capture	Metrics
`POST /v1/chat/completions`	Yes	`chat` or `generate_content`	Both	Yes	All
`POST /v1/messages`	Yes	`chat` or `generate_content`	Both	Yes	All
`POST /v1/responses`	Yes	`chat` or `generate_content`	Both	Yes	All
`POST /v1/embeddings`	Yes	`embeddings`	No	Input text	Duration + tokens
`POST /v1/images/generations`	Yes	`image_generation`	No	Prompt in, revised_prompt out	Duration
`POST /v1/audio/speech`	Yes	`audio_speech`	No	Input text, no output (binary)	Duration
`POST /v1/audio/transcriptions`	Yes	`audio_transcription`	No	No input (binary), output text	Duration
`POST /v1/messages/count_tokens`	No	—	—	—	—
`GET /v1/models`	No	—	—	—	—
`GET /health`	No	—	—	—	—

All three chat endpoints share the same handler and produce identical telemetry. The gen_ai.operation.name is chat for standard requests or generate_content where the provider uses that convention.

Media endpoints (images/generations, audio/speech, audio/transcriptions) do not report token usage metrics. Only gen_ai.client.operation.duration is recorded for those endpoints.

Always present

These four attributes appear on every GenAI span regardless of endpoint, provider, or request shape. They form the minimum set required for basic filtering and grouping in any trace backend.

Attribute	Type	Description
`gen_ai.operation.name`	string	`chat`, `generate_content`, `embeddings`, `image_generation`, `audio_speech`, or `audio_transcription`
`gen_ai.provider.name`	string	Provider identity (`openai`, `anthropic`, `xai`)
`gen_ai.request.model`	string	Model name from the request
`gen_ai.request.stream`	bool	Whether the request uses streaming

Request parameters (when set)

These attributes are recorded only when the corresponding parameter is present in the request body. Absent parameters produce no attribute rather than a zero or null value.

Attribute	Type	Description
`gen_ai.request.temperature`	float	Sampling temperature
`gen_ai.request.max_tokens`	int	Maximum tokens to generate
`gen_ai.request.top_p`	float	Nucleus sampling threshold
`gen_ai.request.frequency_penalty`	float	Frequency penalty coefficient
`gen_ai.request.presence_penalty`	float	Presence penalty coefficient
`gen_ai.request.seed`	int	Deterministic sampling seed
`gen_ai.request.top_k`	int	Top-k sampling cutoff
`gen_ai.request.stop_sequences`	string (JSON array)	Stop sequences, e.g. `["\n\n"]`
`gen_ai.request.n`	int	Number of completions requested

Not all parameters are supported by every provider. The gateway records whichever parameters appear in the request body, regardless of whether the upstream provider honors them. Provider-specific parameter validation is not performed at the telemetry layer.

Response attributes

Populated after the upstream provider returns a response (or after the final streaming chunk). For streaming requests, token counts and finish reasons are extracted from the terminal usage chunk when the provider includes one, or accumulated from individual deltas otherwise.

Attribute	Type	Description
`gen_ai.response.model`	string	Model name from the provider response (may differ from the requested model)
`gen_ai.response.id`	string	Provider-assigned response identifier
`gen_ai.usage.input_tokens`	int	Prompt/input token count
`gen_ai.usage.output_tokens`	int	Completion/output token count
`gen_ai.response.finish_reasons`	string (JSON array)	Stop reasons, e.g. `["stop"]` or `["tool_calls"]`

The gen_ai.response.model attribute is particularly useful when the request specifies an alias (e.g. gpt-4o) and the provider resolves it to a specific snapshot (e.g. gpt-4o-2024-08-06).

Advanced token types

Provider-specific token breakdowns beyond standard input/output counts. These attributes supplement gen_ai.usage.input_tokens and gen_ai.usage.output_tokens with finer-grained detail where the upstream provider supports it.

Attribute	Type	Description
`gen_ai.usage.cache_creation.input_tokens`	int	Tokens written to the Anthropic prompt cache during this request (aggregate across all TTLs)
`gen_ai.usage.cache_creation_5m.input_tokens`	int	Tokens written to the 5-minute ephemeral prompt cache (Anthropic TTL-specific breakdown)
`gen_ai.usage.cache_creation_1h.input_tokens`	int	Tokens written to the 1-hour prompt cache (Anthropic TTL-specific breakdown)
`gen_ai.usage.cache_read.input_tokens`	int	Tokens read from the Anthropic prompt cache
`gen_ai.usage.reasoning_tokens`	int	Internal reasoning tokens consumed by OpenAI reasoning models

Cache token attributes are provider-specific. They appear only when the upstream provider reports them. Anthropic cache attributes require prompt caching to be enabled in the request. OpenAI reasoning tokens appear only for reasoning-capable models.

When Anthropic cache tokens are present, the relationship between token counts is:

gen_ai.usage.input_tokens = total input tokens (includes both cached and non-cached)
gen_ai.usage.cache_read.input_tokens = subset of input tokens served from cache
gen_ai.usage.cache_creation.input_tokens = tokens newly written to cache (aggregate)
gen_ai.usage.cache_creation_5m.input_tokens = 5-minute TTL portion of cache creation (when Anthropic sends TTL breakdown)
gen_ai.usage.cache_creation_1h.input_tokens = 1-hour TTL portion of cache creation (when Anthropic sends TTL breakdown)

When both the aggregate and TTL-specific breakdown are present, cache_creation.input_tokens equals the sum of cache_creation_5m.input_tokens + cache_creation_1h.input_tokens. The gateway passes all fields through as-is from the provider. For OpenAI reasoning models, gen_ai.usage.reasoning_tokens is separate from gen_ai.usage.output_tokens. Reasoning tokens represent internal chain-of-thought computation and are billed separately by the provider.

Embeddings attributes

Emitted on embeddings operation spans when the corresponding value is available.

Attribute	Type	Description
`gen_ai.embeddings.dimension.count`	int	Dimension count of the returned embedding vectors
`gen_ai.request.encoding_formats`	string (JSON array)	Requested encoding format(s), e.g. `["float"]`

Server performance attributes

Per-request performance attributes recorded on completed operation spans. These use the gen_ai.server.* namespace.

Attribute	Type	Description
`gen_ai.server.time_to_first_token`	float (seconds)	Time to first token. For streaming requests, the time from stream start to the first content delta; for non-streaming responses, equal to the total request duration since the body returns in one shot.
`gen_ai.server.request.duration`	float (seconds)	Total wall-clock duration of the request
`gen_ai.server.output_tokens_per_second`	float	Output token throughput. Recorded when the response reports output tokens and the duration is non-zero.

gen_ai.server.time_to_first_token and gen_ai.server.request.duration are recorded on every completed operation span, streaming or not. The baseline lifecycle path also stamps gen_ai.server.time_to_first_token on cache hits, embeddings, audio, and image spans. These are per-span attributes. A few have a matching aggregate metric under a different name — for instance, the span attribute gen_ai.server.output_tokens_per_second is published as the metric gen_ai.client.output_tokens_per_second.

The span-level TTFT attribute is a custom extension. The OTel spec defines TTFT as a metric only. to11 also records it as a span attribute so you can identify individual slow requests directly on the trace, without reading aggregate metrics.

Output type

Indicates the modality of the response produced by the operation. The value is determined by the endpoint and, for chat endpoints, by the requested response format.

Attribute	Type	Values
`gen_ai.output.type`	string	`"text"`, `"json"`, `"embeddings"`, `"image"`, `"speech"`

The mapping from endpoint to output type is:

Endpoint group	Output type
Chat completions, messages, responses	`text`, or `json` when a structured response format is requested
Embeddings	`embeddings`
Image generations	`image`
Audio speech	`speech`
Audio transcription	`text`

For chat endpoints, the value is json when the request specifies a JSON object or JSON schema response format, and text otherwise. Audio transcription produces text, since the response is transcribed text rather than audio.

Server address

Identifies the upstream provider endpoint the gateway connected to. These follow the OTel server.* namespace conventions.

Attribute	Type	Description
`server.address`	string	Upstream hostname (e.g. `api.openai.com`)
`server.port`	int	Upstream port (e.g. `443`)

Error attributes

Present when the upstream call fails or times out. A span with error attributes always has its OTel status set accordingly.

Attribute	Type	Description
`error.type`	string	Error category. The transport-level classifier emits `"timeout"`, `"connection_error"`, or `"upstream_error"`; other paths emit a more specific category (see below).

The span StatusCode is set to ERROR and StatusMessage contains the error detail string. On successful requests, StatusCode remains UNSET and no error.type attribute is recorded. Transport-level error type values:

Value	Meaning
`timeout`	The upstream provider did not respond within the configured timeout
`connection_error`	TCP connection to the upstream provider failed or was reset
`upstream_error`	The upstream provider returned an HTTP error status, or another non-transport failure occurred

Other paths record their own error.type categories, including rate_limit (gateway rate limit), guardrail_violation (a guardrail blocked the request), and the retrieval-error value declared via x-to11-context-retrieval-error on a retrieval span. Domain-level routing failures (for example unknown_model, no_matching_route, all_targets_failed) carry their own category strings. Provider-returned error bodies are not captured as span attributes. The StatusMessage contains a summary suitable for filtering and alerting.

Session and user metadata

Populated from request headers or body fields. All attributes in this group are optional and appear only when the corresponding header or field is present. These attributes enable grouping spans by user, session, or conversation for analytics and debugging.

Attribute	Type	Source
`gen_ai.user.id`	string	`x-to11-user-id` header or `user` field in the request body
`gen_ai.session.id`	string	`x-to11-session-id` header
`gen_ai.conversation.id`	string	`x-to11-conversation-id` header (preferred), or `x-conversation-id` (back-compat). The namespaced form wins on conflict.
`deployment.environment.name`	string	`x-to11-env` header (resolved through the project’s environment registry)

When both the x-to11-user-id header and the user body field are present, the header takes precedence. The gen_ai.user.id attribute lets you group and filter spans by user or session when you explore your traces.

Tool call child spans

When a model response includes tool calls, the gateway emits child spans nested under the GenAI operation span. Each tool call produces one child span. The parent-child relationship is visible in any trace viewer that supports nested spans. Span name: tool_call.{function_name} (e.g. tool_call.get_weather)

Attribute	Type	Description
`gen_ai.tool.name`	string	Function name
`gen_ai.tool.type`	string	Always `"function"`
`gen_ai.tool.call.id`	string	Provider-assigned call identifier
`gen_ai.tool.call.arguments`	string (JSON)	Function arguments (only with `capture_content` enabled)

The gen_ai.tool.definitions attribute (string, JSON array) appears on the parent span, not on tool call child spans. It contains the tool/function definitions sent in the request when capture_content is enabled. Multiple tool calls within a single response each produce their own child span. The child spans share the same parent span ID as the GenAI operation span. If a response contains no tool calls, no child spans are created.

Timing behavior

Path	Behavior
Streaming	Spans opened on `ToolCallStart`, closed on `ToolCallEnd`. Durations reflect actual streaming time.
Non-streaming	Synthetic timestamps. Generation duration evenly distributed across tool call spans.

In the non-streaming case, the gateway cannot observe when individual tool calls were generated. The synthetic timestamp distribution ensures that tool call spans are evenly spaced within the parent span’s duration, providing a reasonable approximation for visualization in trace viewers.

Gateway routing attributes

Present on spans that traverse the gateway’s managed routing layer. These attributes use the gateway.* namespace, which is specific to the to11 gateway and not part of the OTel semantic conventions. Passthrough requests carry gateway.routing.path = "passthrough" and omit the remaining routing attributes.

Attribute	Type	Description
`gateway.routing.path`	string	`"managed"` or `"passthrough"`
`gateway.routing.target`	string	Resolved target name
`gateway.routing.layer`	string	Routing layer that resolved the request (`"function"` or the upstream layer)
`gateway.routing.strategy`	string	Strategy used (e.g. `"weighted"`, `"fallback"`)
`gateway.routing.attempt`	int	Attempt number (1-indexed; increments on fallback retries)
`gateway.routing.upstream_model`	string	Model sent to the upstream provider. Emitted only when it differs from the requested model.

For fallback strategies, the gateway.routing.attempt attribute increments with each retry. The first attempt is 1. Combined with gateway.routing.target, this allows reconstruction of the full fallback chain from span data.

Provider qualifier attributes

Emitted on the operation span when the resolved provider configuration carries the corresponding qualifier. These attributes use the gen_ai.provider.* namespace and distinguish between multiple provider configurations that map to the same gen_ai.provider.name.

Attribute	Type	Description
`gen_ai.provider.config_key`	string	The `[providers.*]` config key that served the request (e.g. `openai_eu`, `bedrock_eu`)
`gen_ai.provider.region`	string	Provider region
`gen_ai.provider.tier`	string	Provider service tier
`gen_ai.provider.mode`	string	Provider mode

Each attribute is omitted when the resolved configuration does not define it. gen_ai.provider.config_key disambiguates multiple config entries that resolve to the same canonical gen_ai.provider.name.

Experiment and variant attributes

Present when the request is routed through a function variant. These attributes record which variant was selected and the weight distribution at the time of the request. Each attribute is emitted only when the route resolution carries the corresponding value; gateway.experiment.total_variants reflects the number of variants defined on the function.

Attribute	Type	Description
`gateway.variant.name`	string	Name of the selected variant
`gateway.variant.weight`	int	Configured weight of the selected variant
`gateway.experiment.total_variants`	int	Total number of variants defined on the function
`gateway.function.name`	string	Function name from the route resolution
`gateway.function.endpoint`	string	Resolved endpoint for the function

Variant selection is determined at request time based on the configured weights. The gateway.variant.weight attribute records the weight of the selected variant, not the probability. To compute the selection probability, divide by the sum of all variant weights (derivable from the total variants and their individual weights across spans).

Client-context operation attributes

These attributes appear on sibling spans created via x-to11-context-* headers. Each header value generates a separate span adjacent to the main GenAI operation span, sharing the same parent trace context. The span name matches the operation type (e.g. execute_tool, retrieval, invoke_agent, create_agent).

execute_tool spans

Attribute	Type	Description
`gen_ai.tool.name`	string	Tool name from the context header
`gen_ai.tool.call.id`	string	Tool call identifier from the context header

retrieval spans

Attribute	Type	Description
`gen_ai.retrieval.source`	string	Retrieval source identifier
`gen_ai.retrieval.document.count`	string	Number of documents retrieved

invoke_agent / create_agent spans

Attribute	Type	Description
`gen_ai.agent.id`	string	Agent identifier
`gen_ai.agent.name`	string	Agent name
`gen_ai.agent.step`	string	Current agent step identifier (invoke_agent only)
`gen_ai.agent.description`	string	Human-readable agent description (create_agent only)

The invoke_agent and create_agent span types differ in their attribute sets. gen_ai.agent.step tracks the execution step within an agent invocation, while gen_ai.agent.description records the agent definition at creation time. Both share gen_ai.agent.id and gen_ai.agent.name.

Content capture attributes

Recorded when content capture is enabled, which it is by default. These attributes contain the full prompt and completion text as serialized JSON. Streaming responses are accumulated up to a 64 KB limit; non-streaming responses are captured in full. See Content Capture for how it’s controlled.

Attribute	Type	Description
`gen_ai.input.messages`	string (JSON)	Serialized prompt messages from the request
`gen_ai.output.messages`	string (JSON)	Serialized completion choices or accumulated stream content
`gen_ai.system_instructions`	string	System prompt text

For media endpoints, content capture behavior varies:

Endpoint	Input capture	Output capture
Image generation	`prompt` field	`revised_prompt` from DALL-E 3 responses
Audio speech (TTS)	`input` text field	Not captured (binary audio)
Audio transcription	Not captured (binary audio)	Transcription `text` field

Content capture is enabled by default, so prompt and completion text — including any sensitive data — is recorded verbatim. See Content Capture for how it’s controlled.

Cache status attributes

Present when the gateway evaluates its response cache for the request. The skip status indicates that the cache was not consulted for a hit because cache directives disabled, bypassed, or write-only-gated this request.

Attribute	Type	Description
`gen_ai.cache.status`	string	`"hit"`, `"miss"`, or `"skip"`
`gen_ai.cache.age_seconds`	int	Age of the cached response in seconds (present on cache hits only)

On a cache hit, the gateway skips the upstream provider call entirely. The span still records gen_ai.response.model and token usage attributes from the cached response. The gen_ai.cache.age_seconds attribute indicates staleness, allowing cache freshness analysis across requests. Cache status values:

Value	Meaning
`hit`	Response served from cache. `gen_ai.cache.age_seconds` is present.
`miss`	Cache read performed but no matching entry found. Request forwarded to provider.
`skip`	The cache was not consulted for a hit: caching was disabled, bypassed, or write-only for this request via its cache directives.

​Conventions used in this reference

​Endpoint instrumentation matrix

​Always present

​Request parameters (when set)

​Response attributes

​Advanced token types

​Embeddings attributes

​Server performance attributes

​Output type

​Server address

​Error attributes

​Session and user metadata

​Tool call child spans

​Timing behavior

​Gateway routing attributes

​Provider qualifier attributes

​Experiment and variant attributes

​Client-context operation attributes

​execute_tool spans

​retrieval spans

​invoke_agent / create_agent spans

​Content capture attributes

​Cache status attributes

Conventions used in this reference

Endpoint instrumentation matrix

Always present

Request parameters (when set)

Response attributes

Advanced token types

Embeddings attributes

Server performance attributes

Output type

Server address

Error attributes

Session and user metadata

Tool call child spans

Timing behavior

Gateway routing attributes

Provider qualifier attributes

Experiment and variant attributes

Client-context operation attributes

execute_tool spans

retrieval spans

invoke_agent / create_agent spans

Content capture attributes

Cache status attributes