Skip to main content

Documentation Index

Fetch the complete documentation index at: https://to11.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Context Propagation

The gateway can emit additional spans and attributes based on context headers you send alongside your LLM requests. This gives you visibility into the full AI workflow — not just the LLM call itself. Session and user headers enrich the existing GenAI operation span with metadata. Client-operation headers (x-to11-context-*) cause the gateway to emit a sibling span next to the GenAI span, representing work your application performed between LLM calls — tool execution, vector retrieval, or agent delegation.

Session and user context

These headers attach metadata to the GenAI operation span. No additional spans are created.
HeaderSpan attributeDescription
x-to11-user-idgen_ai.user.idUser identifier (also read from the user field in the request body)
x-to11-session-idgen_ai.session.idSession identifier for grouping conversations
x-to11-environmentdeployment.environment.nameEnvironment name (production, staging, etc.)
X-Conversation-Idgen_ai.conversation.idConversation thread identifier
curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -H "x-to11-user-id: user-42" \
  -H "x-to11-session-id: sess-abc123" \
  -H "x-to11-environment: production" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'

Client-operation context

When your application performs work between LLM calls — executing tools, searching a vector database, or delegating to a sub-agent — you can tell the gateway about it. The gateway emits a sibling span alongside the GenAI operation span, both under the same HTTP request parent. To declare client-side context, set x-to11-context-operation to one of: execute_tool, retrieval, invoke_agent, create_agent. Then add the operation-specific headers described in the sections below.
All x-to11-context-* header values are truncated to 256 characters. Unknown headers are silently ignored.

Tool execution context

When your application has just executed a tool call from a previous LLM response and is now sending the tool result back, attach tool execution context so the gateway emits a span representing that work.

Headers

HeaderRequiredDescription
x-to11-context-operationYesSet to execute_tool
x-to11-context-tool-nameNoName of the tool that was executed
x-to11-context-tool-call-idNoTool call ID from the LLM response
x-to11-context-tool-duration-msNoHow long the tool took to execute (milliseconds)

Emitted span

execute_tool {tool_name} with attributes gen_ai.tool.name, gen_ai.tool.call.id.

Example

import requests

# After executing a tool call from a previous LLM response:
response = requests.post("http://localhost:4000/v1/chat/completions",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
        "traceparent": traceparent,
        "x-to11-context-operation": "execute_tool",
        "x-to11-context-tool-name": "web_search",
        "x-to11-context-tool-call-id": "call_abc123",
        "x-to11-context-tool-duration-ms": "250",
    },
    json={
        "model": "gpt-4o",
        "messages": [
            {"role": "user", "content": "Search for recent news"},
            {"role": "assistant", "tool_calls": [...]},
            {"role": "tool", "tool_call_id": "call_abc123", "content": "Search results..."},
        ],
    },
)

Retrieval context (RAG)

When your application performed a vector search or document retrieval before the LLM call, attach retrieval context so the gateway emits a span representing that work.

Headers

HeaderRequiredDescription
x-to11-context-operationYesSet to retrieval
x-to11-context-retrieval-sourceNoSource system (e.g. pinecone, weaviate, elasticsearch)
x-to11-context-retrieval-document-countNoNumber of documents retrieved
x-to11-context-retrieval-duration-msNoHow long the retrieval took (milliseconds)

Emitted span

retrieval {source} with attributes gen_ai.retrieval.source, gen_ai.retrieval.document.count.

Example

import requests

# 1. Retrieve documents from your vector store
docs = vector_store.search(query, top_k=5)

# 2. Send to LLM with retrieval context
response = requests.post("http://localhost:4000/v1/chat/completions",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
        "traceparent": traceparent,
        "x-to11-context-operation": "retrieval",
        "x-to11-context-retrieval-source": "pinecone",
        "x-to11-context-retrieval-document-count": str(len(docs)),
        "x-to11-context-retrieval-duration-ms": str(retrieval_time_ms),
    },
    json={
        "model": "gpt-4o",
        "messages": [
            {"role": "system", "content": f"Use these documents:\n{format_docs(docs)}"},
            {"role": "user", "content": query},
        ],
    },
)

Agent context

When your application delegates work to a sub-agent or tracks steps within an agent workflow, attach agent context so the gateway emits spans representing agent lifecycle events.

Invoke agent

To record a step within an agent workflow or a delegation to a sub-agent, set x-to11-context-operation to invoke_agent.
HeaderRequiredDescription
x-to11-context-operationYesSet to invoke_agent
x-to11-context-agent-idNoUnique agent identifier
x-to11-context-agent-nameNoHuman-readable agent name
x-to11-context-agent-stepNoZero-indexed step number within the agent workflow
Emitted span: invoke_agent {agent_name} with attributes gen_ai.agent.id, gen_ai.agent.name, gen_ai.agent.step.

Create agent

To record agent creation, set x-to11-context-operation to create_agent.
HeaderRequiredDescription
x-to11-context-operationYesSet to create_agent
x-to11-context-agent-idNoUnique agent identifier
x-to11-context-agent-nameNoHuman-readable agent name
x-to11-context-agent-descriptionNoAgent purpose description
For production agent lifecycle tracking, consider emitting create_agent spans directly via OTLP to the collector. The header-based approach is a convenience fallback. See Direct Ingestion.

Example

import requests

# Agent "planner" — step 0
response = requests.post("http://localhost:4000/v1/chat/completions",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
        "traceparent": traceparent,
        "x-to11-context-operation": "invoke_agent",
        "x-to11-context-agent-id": "agent-001",
        "x-to11-context-agent-name": "planner",
        "x-to11-context-agent-step": "0",
    },
    json={"model": "gpt-4o", "messages": [...]},
)

Operation name override

By default, the gateway sets gen_ai.operation.name to chat for text requests and automatically promotes to generate_content when image content blocks are detected. To override this behaviour, set the x-to11-operation header:
Header valueEffect
chatForce chat even on multimodal requests
generate_contentForce generate_content even on text-only requests
Unknown values are silently ignored and detection proceeds normally.
# Force "chat" on a request with images (e.g. image is just metadata, not generation input)
curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -H "x-to11-operation: chat" \
  -d '{
    "model": "gpt-4o",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Extract text from this receipt"},
        {"type": "image_url", "image_url": {"url": "..."}}
      ]
    }]
  }'

Resulting span tree

The following diagram shows how a RAG workflow with tool use appears in the trace. The HTTP request span is the parent; retrieval and GenAI operation spans are siblings under it.
HTTP request span
  |
  +-- retrieval pinecone        (x-to11-context-operation: retrieval)
  |     gen_ai.retrieval.source: "pinecone"
  |     gen_ai.retrieval.document.count: "5"
  |
  +-- chat gpt-4o               (GenAI operation span)
        gen_ai.operation.name: "chat"
        gen_ai.usage.input_tokens: 2450
When tool execution context is also present, both sibling spans appear:
HTTP request span
  |
  +-- execute_tool web_search   (x-to11-context-operation: execute_tool)
  |     gen_ai.tool.name: "web_search"
  |     gen_ai.tool.call.id: "call_abc123"
  |
  +-- chat gpt-4o               (GenAI operation span)
        gen_ai.operation.name: "chat"
        gen_ai.usage.input_tokens: 3200
        gen_ai.usage.output_tokens: 450