Context Propagation

The gateway can emit additional spans and attributes based on context headers you send alongside your LLM requests. This gives you visibility into the full AI workflow — not just the LLM call itself. Session and user headers enrich the existing GenAI operation span with metadata. Client-operation headers (x-to11-context-*) cause the gateway to emit a sibling span next to the GenAI span, representing work your application performed between LLM calls — tool execution, vector retrieval, or agent delegation.

Session and user context

These headers attach metadata to the GenAI operation span. No additional spans are created.

Header	Span attribute	Description
`x-to11-user-id`	`gen_ai.user.id`	User identifier (also read from the `user` field in the request body)
`x-to11-session-id`	`gen_ai.session.id`	Session identifier for grouping conversations
`x-to11-environment`	`deployment.environment.name`	Environment name (production, staging, etc.)
`X-Conversation-Id`	`gen_ai.conversation.id`	Conversation thread identifier

curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -H "x-to11-user-id: user-42" \
  -H "x-to11-session-id: sess-abc123" \
  -H "x-to11-environment: production" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'

Client-operation context

When your application performs work between LLM calls — executing tools, searching a vector database, or delegating to a sub-agent — you can tell the gateway about it. The gateway emits a sibling span alongside the GenAI operation span, both under the same HTTP request parent. To declare client-side context, set x-to11-context-operation to one of: execute_tool, retrieval, invoke_agent, create_agent. Then add the operation-specific headers described in the sections below.

All x-to11-context-* header values are truncated to 256 characters. Unknown headers are silently ignored.

Tool execution context

When your application has just executed a tool call from a previous LLM response and is now sending the tool result back, attach tool execution context so the gateway emits a span representing that work.

Headers

Header	Required	Description
`x-to11-context-operation`	Yes	Set to `execute_tool`
`x-to11-context-tool-name`	No	Name of the tool that was executed
`x-to11-context-tool-call-id`	No	Tool call ID from the LLM response
`x-to11-context-tool-duration-ms`	No	How long the tool took to execute (milliseconds)

Emitted span

execute_tool {tool_name} with attributes gen_ai.tool.name, gen_ai.tool.call.id.

Example

import requests

# After executing a tool call from a previous LLM response:
response = requests.post("http://localhost:4000/v1/chat/completions",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
        "traceparent": traceparent,
        "x-to11-context-operation": "execute_tool",
        "x-to11-context-tool-name": "web_search",
        "x-to11-context-tool-call-id": "call_abc123",
        "x-to11-context-tool-duration-ms": "250",
    },
    json={
        "model": "gpt-4o",
        "messages": [
            {"role": "user", "content": "Search for recent news"},
            {"role": "assistant", "tool_calls": [...]},
            {"role": "tool", "tool_call_id": "call_abc123", "content": "Search results..."},
        ],
    },
)

Retrieval context (RAG)

When your application performed a vector search or document retrieval before the LLM call, attach retrieval context so the gateway emits a span representing that work.

Headers

Header	Required	Description
`x-to11-context-operation`	Yes	Set to `retrieval`
`x-to11-context-retrieval-source`	No	Source system (e.g. `pinecone`, `weaviate`, `elasticsearch`)
`x-to11-context-retrieval-document-count`	No	Number of documents retrieved
`x-to11-context-retrieval-duration-ms`	No	How long the retrieval took (milliseconds)

Emitted span

retrieval {source} with attributes gen_ai.retrieval.source, gen_ai.retrieval.document.count.

Example

import requests

# 1. Retrieve documents from your vector store
docs = vector_store.search(query, top_k=5)

# 2. Send to LLM with retrieval context
response = requests.post("http://localhost:4000/v1/chat/completions",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
        "traceparent": traceparent,
        "x-to11-context-operation": "retrieval",
        "x-to11-context-retrieval-source": "pinecone",
        "x-to11-context-retrieval-document-count": str(len(docs)),
        "x-to11-context-retrieval-duration-ms": str(retrieval_time_ms),
    },
    json={
        "model": "gpt-4o",
        "messages": [
            {"role": "system", "content": f"Use these documents:\n{format_docs(docs)}"},
            {"role": "user", "content": query},
        ],
    },
)

Agent context

When your application delegates work to a sub-agent or tracks steps within an agent workflow, attach agent context so the gateway emits spans representing agent lifecycle events.

Invoke agent

To record a step within an agent workflow or a delegation to a sub-agent, set x-to11-context-operation to invoke_agent.

Header	Required	Description
`x-to11-context-operation`	Yes	Set to `invoke_agent`
`x-to11-context-agent-id`	No	Unique agent identifier
`x-to11-context-agent-name`	No	Human-readable agent name
`x-to11-context-agent-step`	No	Zero-indexed step number within the agent workflow

Emitted span: invoke_agent {agent_name} with attributes gen_ai.agent.id, gen_ai.agent.name, gen_ai.agent.step.

Create agent

To record agent creation, set x-to11-context-operation to create_agent.

Header	Required	Description
`x-to11-context-operation`	Yes	Set to `create_agent`
`x-to11-context-agent-id`	No	Unique agent identifier
`x-to11-context-agent-name`	No	Human-readable agent name
`x-to11-context-agent-description`	No	Agent purpose description

For production agent lifecycle tracking, consider emitting create_agent spans directly via OTLP to the collector. The header-based approach is a convenience fallback. See Direct Ingestion.

Example

import requests

# Agent "planner" — step 0
response = requests.post("http://localhost:4000/v1/chat/completions",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
        "traceparent": traceparent,
        "x-to11-context-operation": "invoke_agent",
        "x-to11-context-agent-id": "agent-001",
        "x-to11-context-agent-name": "planner",
        "x-to11-context-agent-step": "0",
    },
    json={"model": "gpt-4o", "messages": [...]},
)

Operation name override

By default, the gateway sets gen_ai.operation.name to chat for text requests and automatically promotes to generate_content when image content blocks are detected. To override this behaviour, set the x-to11-operation header:

Header value	Effect
`chat`	Force `chat` even on multimodal requests
`generate_content`	Force `generate_content` even on text-only requests

Unknown values are silently ignored and detection proceeds normally.

# Force "chat" on a request with images (e.g. image is just metadata, not generation input)
curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -H "x-to11-operation: chat" \
  -d '{
    "model": "gpt-4o",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Extract text from this receipt"},
        {"type": "image_url", "image_url": {"url": "..."}}
      ]
    }]
  }'

Resulting span tree

The following diagram shows how a RAG workflow with tool use appears in the trace. The HTTP request span is the parent; retrieval and GenAI operation spans are siblings under it.

HTTP request span
  |
  +-- retrieval pinecone        (x-to11-context-operation: retrieval)
  |     gen_ai.retrieval.source: "pinecone"
  |     gen_ai.retrieval.document.count: "5"
  |
  +-- chat gpt-4o               (GenAI operation span)
        gen_ai.operation.name: "chat"
        gen_ai.usage.input_tokens: 2450

When tool execution context is also present, both sibling spans appear:

HTTP request span
  |
  +-- execute_tool web_search   (x-to11-context-operation: execute_tool)
  |     gen_ai.tool.name: "web_search"
  |     gen_ai.tool.call.id: "call_abc123"
  |
  +-- chat gpt-4o               (GenAI operation span)
        gen_ai.operation.name: "chat"
        gen_ai.usage.input_tokens: 3200
        gen_ai.usage.output_tokens: 450

Get Started

Concepts

Routing

Reference

Security

Telemetry

Context Propagation

Context Propagation

Session and user context

Client-operation context

Tool execution context

Headers

Emitted span

Example

Retrieval context (RAG)

Headers

Emitted span

Example

Agent context

Invoke agent

Create agent

Example

Operation name override

Resulting span tree

Get Started

Concepts

Routing

Reference

Security

Telemetry

Documentation Index

​Context Propagation

​Session and user context

​Client-operation context

​Tool execution context

​Headers

​Emitted span

​Example

​Retrieval context (RAG)

​Headers

​Emitted span

​Example

​Agent context

​Invoke agent

​Create agent

​Example

​Operation name override

​Resulting span tree

Context Propagation

Session and user context

Client-operation context

Tool execution context

Headers

Emitted span

Example

Retrieval context (RAG)

Headers

Emitted span

Example

Agent context

Invoke agent

Create agent

Example

Operation name override

Resulting span tree