Context Propagation
The gateway can emit additional spans and attributes based on context headers you send alongside your LLM requests. This gives you visibility into the full AI workflow — not just the LLM call itself.
Session and user headers enrich the existing GenAI operation span with metadata. Client-operation headers (x-to11-context-*) cause the gateway to emit a sibling span next to the GenAI span, representing work your application performed between LLM calls — tool execution, vector retrieval, or agent delegation.
Session and user context
These headers attach metadata to the GenAI operation span. No additional spans are created.
| Header | Span attribute | Description |
|---|
x-to11-user-id | gen_ai.user.id | User identifier (also read from the user field in the request body) |
x-to11-session-id | gen_ai.session.id | Session identifier for grouping conversations |
x-to11-env | deployment.environment.name | Environment slug (resolved through the project’s environment registry — prod, staging, or any custom env). The gateway also routes requests via this header. |
X-Conversation-Id | gen_ai.conversation.id | Conversation thread identifier |
curl http://localhost:4000/v1/chat/completions \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-H "x-to11-user-id: user-42" \
-H "x-to11-session-id: sess-abc123" \
-H "x-to11-env: prod" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'
Client-operation context
When your application performs work between LLM calls — executing tools, searching a vector database, or delegating to a sub-agent — you can tell the gateway about it. The gateway emits a sibling span alongside the GenAI operation span, both under the same HTTP request parent.
To declare client-side context, set x-to11-context-operation to one of: execute_tool, retrieval, invoke_agent, create_agent. Then add the operation-specific headers described in the sections below.
All x-to11-context-* header values are truncated to 256 characters. Unknown headers are silently ignored.
Tool execution context
When your application has just executed a tool call from a previous LLM response and is now sending the tool result back, attach tool execution context so the gateway emits a span representing that work.
| Header | Required | Description |
|---|
x-to11-context-operation | Yes | Set to execute_tool |
x-to11-context-tool-name | No | Name of the tool that was executed |
x-to11-context-tool-call-id | No | Tool call ID from the LLM response |
x-to11-context-tool-duration-ms | No | How long the tool took to execute (milliseconds) |
Emitted span
execute_tool {tool_name} with attributes gen_ai.tool.name, gen_ai.tool.call.id.
Example
import requests
# After executing a tool call from a previous LLM response:
response = requests.post("http://localhost:4000/v1/chat/completions",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
"traceparent": traceparent,
"x-to11-context-operation": "execute_tool",
"x-to11-context-tool-name": "web_search",
"x-to11-context-tool-call-id": "call_abc123",
"x-to11-context-tool-duration-ms": "250",
},
json={
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Search for recent news"},
{"role": "assistant", "tool_calls": [...]},
{"role": "tool", "tool_call_id": "call_abc123", "content": "Search results..."},
],
},
)
Retrieval context (RAG)
When your application performed a vector search or document retrieval before the LLM call, attach retrieval context so the gateway emits a span representing that work.
| Header | Required | Description |
|---|
x-to11-context-operation | Yes | Set to retrieval |
x-to11-context-retrieval-source | No | Source system (e.g. pinecone, weaviate, elasticsearch) |
x-to11-context-retrieval-document-count | No | Number of documents retrieved |
x-to11-context-retrieval-duration-ms | No | How long the retrieval took (milliseconds) |
Emitted span
retrieval {source} with attributes gen_ai.retrieval.source, gen_ai.retrieval.document.count.
Example
import requests
# 1. Retrieve documents from your vector store
docs = vector_store.search(query, top_k=5)
# 2. Send to LLM with retrieval context
response = requests.post("http://localhost:4000/v1/chat/completions",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
"traceparent": traceparent,
"x-to11-context-operation": "retrieval",
"x-to11-context-retrieval-source": "pinecone",
"x-to11-context-retrieval-document-count": str(len(docs)),
"x-to11-context-retrieval-duration-ms": str(retrieval_time_ms),
},
json={
"model": "gpt-4o",
"messages": [
{"role": "system", "content": f"Use these documents:\n{format_docs(docs)}"},
{"role": "user", "content": query},
],
},
)
Agent context
When your application delegates work to a sub-agent or tracks steps within an agent workflow, attach agent context so the gateway emits spans representing agent lifecycle events.
Invoke agent
To record a step within an agent workflow or a delegation to a sub-agent, set x-to11-context-operation to invoke_agent.
| Header | Required | Description |
|---|
x-to11-context-operation | Yes | Set to invoke_agent |
x-to11-context-agent-id | No | Unique agent identifier |
x-to11-context-agent-name | No | Human-readable agent name |
x-to11-context-agent-step | No | Zero-indexed step number within the agent workflow |
Emitted span: invoke_agent {agent_name} with attributes gen_ai.agent.id, gen_ai.agent.name, gen_ai.agent.step.
Create agent
To record agent creation, set x-to11-context-operation to create_agent.
| Header | Required | Description |
|---|
x-to11-context-operation | Yes | Set to create_agent |
x-to11-context-agent-id | No | Unique agent identifier |
x-to11-context-agent-name | No | Human-readable agent name |
x-to11-context-agent-description | No | Agent purpose description |
For production agent lifecycle tracking, consider emitting create_agent spans directly via OTLP to the collector. The header-based approach is a convenience fallback. See Direct Ingestion.
Example
import requests
# Agent "planner" — step 0
response = requests.post("http://localhost:4000/v1/chat/completions",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
"traceparent": traceparent,
"x-to11-context-operation": "invoke_agent",
"x-to11-context-agent-id": "agent-001",
"x-to11-context-agent-name": "planner",
"x-to11-context-agent-step": "0",
},
json={"model": "gpt-4o", "messages": [...]},
)
Operation name override
By default, the gateway sets gen_ai.operation.name to chat for text requests and automatically promotes to generate_content when image content blocks are detected.
To override this behaviour, set the x-to11-operation header:
| Header value | Effect |
|---|
chat | Force chat even on multimodal requests |
generate_content | Force generate_content even on text-only requests |
Unknown values are silently ignored and detection proceeds normally.
# Force "chat" on a request with images (e.g. image is just metadata, not generation input)
curl http://localhost:4000/v1/chat/completions \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-H "x-to11-operation: chat" \
-d '{
"model": "gpt-4o",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "Extract text from this receipt"},
{"type": "image_url", "image_url": {"url": "..."}}
]
}]
}'
Resulting span tree
The following diagram shows how a RAG workflow with tool use appears in the trace. The HTTP request span is the parent; retrieval and GenAI operation spans are siblings under it.
HTTP request span
|
+-- retrieval pinecone (x-to11-context-operation: retrieval)
| gen_ai.retrieval.source: "pinecone"
| gen_ai.retrieval.document.count: "5"
|
+-- chat gpt-4o (GenAI operation span)
gen_ai.operation.name: "chat"
gen_ai.usage.input_tokens: 2450
When tool execution context is also present, both sibling spans appear:
HTTP request span
|
+-- execute_tool web_search (x-to11-context-operation: execute_tool)
| gen_ai.tool.name: "web_search"
| gen_ai.tool.call.id: "call_abc123"
|
+-- chat gpt-4o (GenAI operation span)
gen_ai.operation.name: "chat"
gen_ai.usage.input_tokens: 3200
gen_ai.usage.output_tokens: 450