OpenTelemetry-native by default
to11 captures LLM calls as OpenTelemetry GenAI traces, following the OpenTelemetry GenAI semantic conventions. This is a deliberate choice. By using standard attribute names and span structure, the telemetry to11 produces is portable: any OTel-compatible viewer — Jaeger, Honeycomb, Datadog — can parse and visualize it without custom configuration. You are not locked into a proprietary trace format. The data flows in one direction, and to11 sits in the middle:traceparent header flows from your application, through the gateway, to the upstream provider. The GenAI span and your own application spans share one trace ID, so you can correlate them in any trace viewer.
to11 keeps GenAI telemetry distinct from ordinary application traces. That separation is what lets cost, token, and model-comparison analytics work as a first-class view rather than something you reconstruct from generic HTTP spans. You get LLM-shaped data — model, operation, tokens, time-to-first-token — instead of a pile of undifferentiated requests.
Span naming
The span name follows the pattern{operation} {model}. For example:
chat gpt-4oembeddings text-embedding-3-smallimage_generation dall-e-3
Supported operations
to11 emits tengen_ai.operation.name values across two families: gateway-native operations, which to11 derives from the endpoint and request content, and client-context operations, which the caller declares via headers.
Gateway-native operations
These six operations are determined automatically, based on which endpoint is called and what the request contains.| Operation | Endpoint(s) | Description |
|---|---|---|
chat | /v1/chat/completions, /v1/messages, /v1/responses | Text-only LLM calls |
generate_content | Same as chat | Multimodal requests (images detected automatically) |
embeddings | /v1/embeddings | Vector embedding generation |
image_generation | /v1/images/generations | Image creation (DALL-E, etc.) |
audio_speech | /v1/audio/speech | Text-to-speech synthesis |
audio_transcription | /v1/audio/transcriptions | Speech-to-text transcription |
/v1/chat/completions, /v1/messages, /v1/responses) share the same handler and produce identical telemetry. The distinction between chat and generate_content is covered in the multimodal detection section below.
Client-context operations
These four operations are declared by the caller viax-to11-context-* headers. When to11 receives these headers, it emits a sibling span alongside the main GenAI operation span, representing work that happened on the client side between LLM calls.
| Operation | Purpose |
|---|---|
execute_tool | Client-side tool or function execution between LLM calls |
retrieval | Vector search or document retrieval in RAG workflows |
invoke_agent | Delegating to a sub-agent in a multi-agent system |
create_agent | Agent lifecycle creation (recommended as client-emitted OTLP; gateway fallback available) |
See the Context Propagation how-to for the full header specification and examples.
Multimodal detection
When to11 detectsimage_url content blocks in a chat request, it automatically promotes the operation from chat to generate_content. This distinction exists purely for telemetry: multimodal requests tend to have different latency profiles, token costs, and failure modes, so separating them in the data makes analytics more meaningful.
You can override this automatic detection with the x-to11-operation header, which accepts two values: chat and generate_content. This is useful when your application knows the semantic intent of a request better than content inspection can determine — for example, a request that carries base64 image data but is primarily a text classification task.
Multimodal detection is purely a telemetry concern. Routing, credentials, and adapters behave identically regardless of whether the operation is classified as
chat or generate_content.What’s next
Span Attributes
Every GenAI span attribute to11 emits, grouped by lifecycle stage.
Metrics Reference
Histogram metrics, their dimensions, and the analytics they power.
Distributed Tracing
Group multiple LLM calls and agent chains under a single trace.
Context Propagation
Emit client-context spans via headers without an OTel SDK.
Direct Ingestion
Send custom OTLP spans straight to to11.
Content Capture
Record prompt and completion text for debugging and evaluation.