Documentation Index
Fetch the complete documentation index at: https://to11.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Context Propagation
The gateway can emit additional spans and attributes based on context headers you send alongside your LLM requests. This gives you visibility into the full AI workflow — not just the LLM call itself.
Session and user headers enrich the existing GenAI operation span with metadata. Client-operation headers (x-to11-context-*) cause the gateway to emit a sibling span next to the GenAI span, representing work your application performed between LLM calls — tool execution, vector retrieval, or agent delegation.
Session and user context
These headers attach metadata to the GenAI operation span. No additional spans are created.
| Header | Span attribute | Description |
|---|
x-to11-user-id | gen_ai.user.id | User identifier (also read from the user field in the request body) |
x-to11-session-id | gen_ai.session.id | Session identifier for grouping conversations |
x-to11-environment | deployment.environment.name | Environment name (production, staging, etc.) |
X-Conversation-Id | gen_ai.conversation.id | Conversation thread identifier |
curl http://localhost:4000/v1/chat/completions \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-H "x-to11-user-id: user-42" \
-H "x-to11-session-id: sess-abc123" \
-H "x-to11-environment: production" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'
Client-operation context
When your application performs work between LLM calls — executing tools, searching a vector database, or delegating to a sub-agent — you can tell the gateway about it. The gateway emits a sibling span alongside the GenAI operation span, both under the same HTTP request parent.
To declare client-side context, set x-to11-context-operation to one of: execute_tool, retrieval, invoke_agent, create_agent. Then add the operation-specific headers described in the sections below.
All x-to11-context-* header values are truncated to 256 characters. Unknown headers are silently ignored.
Tool execution context
When your application has just executed a tool call from a previous LLM response and is now sending the tool result back, attach tool execution context so the gateway emits a span representing that work.
| Header | Required | Description |
|---|
x-to11-context-operation | Yes | Set to execute_tool |
x-to11-context-tool-name | No | Name of the tool that was executed |
x-to11-context-tool-call-id | No | Tool call ID from the LLM response |
x-to11-context-tool-duration-ms | No | How long the tool took to execute (milliseconds) |
Emitted span
execute_tool {tool_name} with attributes gen_ai.tool.name, gen_ai.tool.call.id.
Example
import requests
# After executing a tool call from a previous LLM response:
response = requests.post("http://localhost:4000/v1/chat/completions",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
"traceparent": traceparent,
"x-to11-context-operation": "execute_tool",
"x-to11-context-tool-name": "web_search",
"x-to11-context-tool-call-id": "call_abc123",
"x-to11-context-tool-duration-ms": "250",
},
json={
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Search for recent news"},
{"role": "assistant", "tool_calls": [...]},
{"role": "tool", "tool_call_id": "call_abc123", "content": "Search results..."},
],
},
)
Retrieval context (RAG)
When your application performed a vector search or document retrieval before the LLM call, attach retrieval context so the gateway emits a span representing that work.
| Header | Required | Description |
|---|
x-to11-context-operation | Yes | Set to retrieval |
x-to11-context-retrieval-source | No | Source system (e.g. pinecone, weaviate, elasticsearch) |
x-to11-context-retrieval-document-count | No | Number of documents retrieved |
x-to11-context-retrieval-duration-ms | No | How long the retrieval took (milliseconds) |
Emitted span
retrieval {source} with attributes gen_ai.retrieval.source, gen_ai.retrieval.document.count.
Example
import requests
# 1. Retrieve documents from your vector store
docs = vector_store.search(query, top_k=5)
# 2. Send to LLM with retrieval context
response = requests.post("http://localhost:4000/v1/chat/completions",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
"traceparent": traceparent,
"x-to11-context-operation": "retrieval",
"x-to11-context-retrieval-source": "pinecone",
"x-to11-context-retrieval-document-count": str(len(docs)),
"x-to11-context-retrieval-duration-ms": str(retrieval_time_ms),
},
json={
"model": "gpt-4o",
"messages": [
{"role": "system", "content": f"Use these documents:\n{format_docs(docs)}"},
{"role": "user", "content": query},
],
},
)
Agent context
When your application delegates work to a sub-agent or tracks steps within an agent workflow, attach agent context so the gateway emits spans representing agent lifecycle events.
Invoke agent
To record a step within an agent workflow or a delegation to a sub-agent, set x-to11-context-operation to invoke_agent.
| Header | Required | Description |
|---|
x-to11-context-operation | Yes | Set to invoke_agent |
x-to11-context-agent-id | No | Unique agent identifier |
x-to11-context-agent-name | No | Human-readable agent name |
x-to11-context-agent-step | No | Zero-indexed step number within the agent workflow |
Emitted span: invoke_agent {agent_name} with attributes gen_ai.agent.id, gen_ai.agent.name, gen_ai.agent.step.
Create agent
To record agent creation, set x-to11-context-operation to create_agent.
| Header | Required | Description |
|---|
x-to11-context-operation | Yes | Set to create_agent |
x-to11-context-agent-id | No | Unique agent identifier |
x-to11-context-agent-name | No | Human-readable agent name |
x-to11-context-agent-description | No | Agent purpose description |
For production agent lifecycle tracking, consider emitting create_agent spans directly via OTLP to the collector. The header-based approach is a convenience fallback. See Direct Ingestion.
Example
import requests
# Agent "planner" — step 0
response = requests.post("http://localhost:4000/v1/chat/completions",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
"traceparent": traceparent,
"x-to11-context-operation": "invoke_agent",
"x-to11-context-agent-id": "agent-001",
"x-to11-context-agent-name": "planner",
"x-to11-context-agent-step": "0",
},
json={"model": "gpt-4o", "messages": [...]},
)
Operation name override
By default, the gateway sets gen_ai.operation.name to chat for text requests and automatically promotes to generate_content when image content blocks are detected.
To override this behaviour, set the x-to11-operation header:
| Header value | Effect |
|---|
chat | Force chat even on multimodal requests |
generate_content | Force generate_content even on text-only requests |
Unknown values are silently ignored and detection proceeds normally.
# Force "chat" on a request with images (e.g. image is just metadata, not generation input)
curl http://localhost:4000/v1/chat/completions \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-H "x-to11-operation: chat" \
-d '{
"model": "gpt-4o",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "Extract text from this receipt"},
{"type": "image_url", "image_url": {"url": "..."}}
]
}]
}'
Resulting span tree
The following diagram shows how a RAG workflow with tool use appears in the trace. The HTTP request span is the parent; retrieval and GenAI operation spans are siblings under it.
HTTP request span
|
+-- retrieval pinecone (x-to11-context-operation: retrieval)
| gen_ai.retrieval.source: "pinecone"
| gen_ai.retrieval.document.count: "5"
|
+-- chat gpt-4o (GenAI operation span)
gen_ai.operation.name: "chat"
gen_ai.usage.input_tokens: 2450
When tool execution context is also present, both sibling spans appear:
HTTP request span
|
+-- execute_tool web_search (x-to11-context-operation: execute_tool)
| gen_ai.tool.name: "web_search"
| gen_ai.tool.call.id: "call_abc123"
|
+-- chat gpt-4o (GenAI operation span)
gen_ai.operation.name: "chat"
gen_ai.usage.input_tokens: 3200
gen_ai.usage.output_tokens: 450