The gateway can emit additional spans and attributes based on context headers you send alongside your LLM requests. This gives you visibility into the full AI workflow — not just the LLM call itself.
Authenticate to to11 with the x-to11-authorization: Bearer <your to11 API key> header (created in Settings → API keys); add x-to11-project-id to select the project when your key spans more than one. Your upstream provider key rides in the standard Authorization header (or x-api-key), or is configured server-side in your to11 project. The context headers below are layered on top of that request.
Session and user headers enrich the existing GenAI operation span with metadata. Client-operation headers (x-to11-context-*) cause the gateway to emit a sibling span next to the GenAI span, representing work your application performed between LLM calls — tool execution, vector retrieval, or agent delegation.
Session and user context
These headers attach metadata to the GenAI operation span. No additional spans are created.
| Header | Span attribute | Description |
|---|
x-to11-user-id | gen_ai.user.id | User identifier (also read from the user field in the request body) |
x-to11-session-id | gen_ai.session.id | Session identifier for grouping conversations |
x-to11-env | deployment.environment.name | Environment slug (resolved through the project’s environment registry — prod, staging, or any custom env). The gateway also routes requests via this header. |
x-to11-conversation-id | gen_ai.conversation.id | Conversation thread identifier. x-conversation-id is also accepted; on conflict the x-to11- form wins. |
curl https://gw.to11.ai/v1/chat/completions \
-H "x-to11-authorization: Bearer $TO11_API_KEY" \
-H "Authorization: Bearer $PROVIDER_API_KEY" \
-H "x-to11-project-id: $TO11_PROJECT_ID" \
-H "Content-Type: application/json" \
-H "x-to11-user-id: user-42" \
-H "x-to11-session-id: sess-abc123" \
-H "x-to11-env: prod" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'
Client-operation context
When your application performs work between LLM calls — executing tools, searching a vector database, or delegating to a sub-agent — you can tell the gateway about it. The gateway emits a sibling span alongside the GenAI operation span, both under the same parent. Send a traceparent header to attach that parent to your application’s trace; the examples below assume one is set.
To declare client-side context, set x-to11-context-operation to one of: execute_tool, retrieval, invoke_agent, create_agent. Then add the operation-specific headers described in the sections below. The emitted span carries gen_ai.operation.name set to the operation value.
All x-to11-context-* header values are truncated to 256 characters. Operation values the gateway does not recognize are skipped; malformed metadata values (for example a non-numeric duration) are treated as absent.
To declare more than one operation on a single request, send x-to11-context-operation multiple times. Each operation reads its metadata from index-prefixed headers — x-to11-context-{N}-tool-name, where N is the zero-based position of the operation. The first operation (index 0) also accepts the non-indexed headers shown below.
The *-duration-ms headers do not become span attributes. The gateway shifts the span’s start time back by that many milliseconds so the span’s own start and end timestamps convey the duration.
Tool execution context
When your application has just executed a tool call from a previous LLM response and is now sending the tool result back, attach tool execution context so the gateway emits a span representing that work.
| Header | Required | Description |
|---|
x-to11-context-operation | Yes | Set to execute_tool |
x-to11-context-tool-name | No | Name of the tool that was executed |
x-to11-context-tool-call-id | No | Tool call ID from the LLM response |
x-to11-context-tool-duration-ms | No | How long the tool took to execute (milliseconds) |
Emitted span
execute_tool {tool_name} with attributes gen_ai.tool.name, gen_ai.tool.call.id.
Example
import requests
# After executing a tool call from a previous LLM response:
response = requests.post("https://gw.to11.ai/v1/chat/completions",
headers={
"x-to11-authorization": f"Bearer {to11_api_key}",
"Authorization": f"Bearer {provider_api_key}",
"x-to11-project-id": project_id,
"Content-Type": "application/json",
"traceparent": traceparent,
"x-to11-context-operation": "execute_tool",
"x-to11-context-tool-name": "web_search",
"x-to11-context-tool-call-id": "call_abc123",
"x-to11-context-tool-duration-ms": "250",
},
json={
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Search for recent news"},
{"role": "assistant", "tool_calls": [...]},
{"role": "tool", "tool_call_id": "call_abc123", "content": "Search results..."},
],
},
)
Retrieval context (RAG)
When your application performed a vector search or document retrieval before the LLM call, attach retrieval context so the gateway emits a span representing that work.
| Header | Required | Description |
|---|
x-to11-context-operation | Yes | Set to retrieval |
x-to11-context-retrieval-source | No | Source system (e.g. pinecone, weaviate, elasticsearch) |
x-to11-context-retrieval-document-count | No | Number of documents retrieved |
x-to11-context-retrieval-duration-ms | No | How long the retrieval took (milliseconds) |
x-to11-context-retrieval-error | No | Error type to record when retrieval failed (sets the span error status) |
Emitted span
retrieval {source} with attributes gen_ai.retrieval.source, gen_ai.retrieval.document.count. When x-to11-context-retrieval-error is set, the span is marked with an error status.
Example
import requests
# 1. Retrieve documents from your vector store
docs = vector_store.search(query, top_k=5)
# 2. Send to LLM with retrieval context
response = requests.post("https://gw.to11.ai/v1/chat/completions",
headers={
"x-to11-authorization": f"Bearer {to11_api_key}",
"Authorization": f"Bearer {provider_api_key}",
"x-to11-project-id": project_id,
"Content-Type": "application/json",
"traceparent": traceparent,
"x-to11-context-operation": "retrieval",
"x-to11-context-retrieval-source": "pinecone",
"x-to11-context-retrieval-document-count": str(len(docs)),
"x-to11-context-retrieval-duration-ms": str(retrieval_time_ms),
},
json={
"model": "gpt-4o",
"messages": [
{"role": "system", "content": f"Use these documents:\n{format_docs(docs)}"},
{"role": "user", "content": query},
],
},
)
Agent context
When your application delegates work to a sub-agent or tracks steps within an agent workflow, attach agent context so the gateway emits spans representing agent lifecycle events.
Invoke agent
To record a step within an agent workflow or a delegation to a sub-agent, set x-to11-context-operation to invoke_agent.
| Header | Required | Description |
|---|
x-to11-context-operation | Yes | Set to invoke_agent |
x-to11-context-agent-id | No | Unique agent identifier |
x-to11-context-agent-name | No | Human-readable agent name |
x-to11-context-agent-step | No | Zero-indexed step number within the agent workflow |
Emitted span: invoke_agent {agent_name} with attributes gen_ai.agent.id, gen_ai.agent.name, gen_ai.agent.step.
import requests
# Agent "planner" — step 0
response = requests.post("https://gw.to11.ai/v1/chat/completions",
headers={
"x-to11-authorization": f"Bearer {to11_api_key}",
"Authorization": f"Bearer {provider_api_key}",
"x-to11-project-id": project_id,
"Content-Type": "application/json",
"traceparent": traceparent,
"x-to11-context-operation": "invoke_agent",
"x-to11-context-agent-id": "agent-001",
"x-to11-context-agent-name": "planner",
"x-to11-context-agent-step": "0",
},
json={"model": "gpt-4o", "messages": [...]},
)
Create agent
To record agent creation, set x-to11-context-operation to create_agent.
| Header | Required | Description |
|---|
x-to11-context-operation | Yes | Set to create_agent |
x-to11-context-agent-id | No | Unique agent identifier |
x-to11-context-agent-name | No | Human-readable agent name |
x-to11-context-agent-description | No | Agent purpose description |
Emitted span: create_agent {agent_name} with attributes gen_ai.agent.id, gen_ai.agent.name, gen_ai.agent.description.
For production agent lifecycle tracking, consider emitting create_agent spans directly via OTLP to the collector. The header-based approach is a convenience fallback. See Direct Ingestion.
import requests
# Register the "planner" agent before its first step
response = requests.post("https://gw.to11.ai/v1/chat/completions",
headers={
"x-to11-authorization": f"Bearer {to11_api_key}",
"Authorization": f"Bearer {provider_api_key}",
"x-to11-project-id": project_id,
"Content-Type": "application/json",
"traceparent": traceparent,
"x-to11-context-operation": "create_agent",
"x-to11-context-agent-id": "agent-001",
"x-to11-context-agent-name": "planner",
"x-to11-context-agent-description": "Breaks a goal into ordered steps",
},
json={"model": "gpt-4o", "messages": [...]},
)
Operation name override
By default, the gateway sets gen_ai.operation.name to chat for text requests and automatically promotes to generate_content when image content blocks are detected.
To override this behavior, set the x-to11-operation header:
| Header value | Effect |
|---|
chat | Force chat even on multimodal requests |
generate_content | Force generate_content even on text-only requests |
Unknown values are silently ignored and detection proceeds normally.
# Force "chat" on a request with images (e.g. image is just metadata, not generation input)
curl https://gw.to11.ai/v1/chat/completions \
-H "x-to11-authorization: Bearer $TO11_API_KEY" \
-H "Authorization: Bearer $PROVIDER_API_KEY" \
-H "Content-Type: application/json" \
-H "x-to11-operation: chat" \
-d '{
"model": "gpt-4o",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "Extract text from this receipt"},
{"type": "image_url", "image_url": {"url": "..."}}
]
}]
}'
Resulting span tree
The following diagram shows how a RAG workflow with tool use appears in the trace. The HTTP request span is the parent; retrieval and GenAI operation spans are siblings under it.
HTTP request span
|
+-- retrieval pinecone (x-to11-context-operation: retrieval)
| gen_ai.retrieval.source: "pinecone"
| gen_ai.retrieval.document.count: "5"
|
+-- chat gpt-4o (GenAI operation span)
gen_ai.operation.name: "chat"
gen_ai.usage.input_tokens: 2450
When tool execution context is also present, both sibling spans appear:
HTTP request span
|
+-- execute_tool web_search (x-to11-context-operation: execute_tool)
| gen_ai.tool.name: "web_search"
| gen_ai.tool.call.id: "call_abc123"
|
+-- chat gpt-4o (GenAI operation span)
gen_ai.operation.name: "chat"
gen_ai.usage.input_tokens: 3200
gen_ai.usage.output_tokens: 450