Skip to main content
Every LLM call you route through the to11 gateway is captured as a structured OpenTelemetry trace. to11 records the model, the operation, token counts, latency, and the timing of each call, then exposes that data for cost attribution, quality analysis, and debugging. Every call, not just the loud ones. This page explains what to11 captures, why the data is standard and portable, and how the ten operation names map onto your workloads.

OpenTelemetry-native by default

to11 captures LLM calls as OpenTelemetry GenAI traces, following the OpenTelemetry GenAI semantic conventions. This is a deliberate choice. By using standard attribute names and span structure, the telemetry to11 produces is portable: any OTel-compatible viewer — Jaeger, Honeycomb, Datadog — can parse and visualize it without custom configuration. You are not locked into a proprietary trace format. The data flows in one direction, and to11 sits in the middle:
Your application
    |
    v
to11 gateway  --- captures a GenAI span per LLM call --->  your analytics
    |
    v
Your LLM providers
Your request passes through the gateway to your chosen provider, and to11 records a GenAI span describing that call along the way. Because the gateway participates in the same W3C Trace Context propagation chain, a single traceparent header flows from your application, through the gateway, to the upstream provider. The GenAI span and your own application spans share one trace ID, so you can correlate them in any trace viewer. to11 keeps GenAI telemetry distinct from ordinary application traces. That separation is what lets cost, token, and model-comparison analytics work as a first-class view rather than something you reconstruct from generic HTTP spans. You get LLM-shaped data — model, operation, tokens, time-to-first-token — instead of a pile of undifferentiated requests.

Span naming

The span name follows the pattern {operation} {model}. For example:
  • chat gpt-4o
  • embeddings text-embedding-3-small
  • image_generation dall-e-3
This convention means you can filter spans by operation type using a prefix match, or by model using a suffix match, in any trace viewer that supports span name search.

Supported operations

to11 emits ten gen_ai.operation.name values across two families: gateway-native operations, which to11 derives from the endpoint and request content, and client-context operations, which the caller declares via headers.

Gateway-native operations

These six operations are determined automatically, based on which endpoint is called and what the request contains.
OperationEndpoint(s)Description
chat/v1/chat/completions, /v1/messages, /v1/responsesText-only LLM calls
generate_contentSame as chatMultimodal requests (images detected automatically)
embeddings/v1/embeddingsVector embedding generation
image_generation/v1/images/generationsImage creation (DALL-E, etc.)
audio_speech/v1/audio/speechText-to-speech synthesis
audio_transcription/v1/audio/transcriptionsSpeech-to-text transcription
All three chat endpoints (/v1/chat/completions, /v1/messages, /v1/responses) share the same handler and produce identical telemetry. The distinction between chat and generate_content is covered in the multimodal detection section below.

Client-context operations

These four operations are declared by the caller via x-to11-context-* headers. When to11 receives these headers, it emits a sibling span alongside the main GenAI operation span, representing work that happened on the client side between LLM calls.
OperationPurpose
execute_toolClient-side tool or function execution between LLM calls
retrievalVector search or document retrieval in RAG workflows
invoke_agentDelegating to a sub-agent in a multi-agent system
create_agentAgent lifecycle creation (recommended as client-emitted OTLP; gateway fallback available)
These exist as gateway-emitted spans rather than requiring a full client-side OTel SDK for a practical reason: many applications — especially those using lightweight HTTP clients or serverless functions — have no OTel SDK configured. The context headers let those applications participate in structured GenAI tracing with no additional dependencies.
See the Context Propagation how-to for the full header specification and examples.

Multimodal detection

When to11 detects image_url content blocks in a chat request, it automatically promotes the operation from chat to generate_content. This distinction exists purely for telemetry: multimodal requests tend to have different latency profiles, token costs, and failure modes, so separating them in the data makes analytics more meaningful. You can override this automatic detection with the x-to11-operation header, which accepts two values: chat and generate_content. This is useful when your application knows the semantic intent of a request better than content inspection can determine — for example, a request that carries base64 image data but is primarily a text classification task.
Multimodal detection is purely a telemetry concern. Routing, credentials, and adapters behave identically regardless of whether the operation is classified as chat or generate_content.

What’s next

Span Attributes

Every GenAI span attribute to11 emits, grouped by lifecycle stage.

Metrics Reference

Histogram metrics, their dimensions, and the analytics they power.

Distributed Tracing

Group multiple LLM calls and agent chains under a single trace.

Context Propagation

Emit client-context spans via headers without an OTel SDK.

Direct Ingestion

Send custom OTLP spans straight to to11.

Content Capture

Record prompt and completion text for debugging and evaluation.