Skip to main content
The gateway can emit additional spans and attributes based on context headers you send alongside your LLM requests. This gives you visibility into the full AI workflow — not just the LLM call itself. Authenticate to to11 with the x-to11-authorization: Bearer <your to11 API key> header (created in Settings → API keys); add x-to11-project-id to select the project when your key spans more than one. Your upstream provider key rides in the standard Authorization header (or x-api-key), or is configured server-side in your to11 project. The context headers below are layered on top of that request. Session and user headers enrich the existing GenAI operation span with metadata. Client-operation headers (x-to11-context-*) cause the gateway to emit a sibling span next to the GenAI span, representing work your application performed between LLM calls — tool execution, vector retrieval, or agent delegation.

Session and user context

These headers attach metadata to the GenAI operation span. No additional spans are created.
HeaderSpan attributeDescription
x-to11-user-idgen_ai.user.idUser identifier (also read from the user field in the request body)
x-to11-session-idgen_ai.session.idSession identifier for grouping conversations
x-to11-envdeployment.environment.nameEnvironment slug (resolved through the project’s environment registry — prod, staging, or any custom env). The gateway also routes requests via this header.
x-to11-conversation-idgen_ai.conversation.idConversation thread identifier. x-conversation-id is also accepted; on conflict the x-to11- form wins.
curl https://gw.to11.ai/v1/chat/completions \
  -H "x-to11-authorization: Bearer $TO11_API_KEY" \
  -H "Authorization: Bearer $PROVIDER_API_KEY" \
  -H "x-to11-project-id: $TO11_PROJECT_ID" \
  -H "Content-Type: application/json" \
  -H "x-to11-user-id: user-42" \
  -H "x-to11-session-id: sess-abc123" \
  -H "x-to11-env: prod" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'

Client-operation context

When your application performs work between LLM calls — executing tools, searching a vector database, or delegating to a sub-agent — you can tell the gateway about it. The gateway emits a sibling span alongside the GenAI operation span, both under the same parent. Send a traceparent header to attach that parent to your application’s trace; the examples below assume one is set. To declare client-side context, set x-to11-context-operation to one of: execute_tool, retrieval, invoke_agent, create_agent. Then add the operation-specific headers described in the sections below. The emitted span carries gen_ai.operation.name set to the operation value.
All x-to11-context-* header values are truncated to 256 characters. Operation values the gateway does not recognize are skipped; malformed metadata values (for example a non-numeric duration) are treated as absent.
To declare more than one operation on a single request, send x-to11-context-operation multiple times. Each operation reads its metadata from index-prefixed headers — x-to11-context-{N}-tool-name, where N is the zero-based position of the operation. The first operation (index 0) also accepts the non-indexed headers shown below. The *-duration-ms headers do not become span attributes. The gateway shifts the span’s start time back by that many milliseconds so the span’s own start and end timestamps convey the duration.

Tool execution context

When your application has just executed a tool call from a previous LLM response and is now sending the tool result back, attach tool execution context so the gateway emits a span representing that work.

Headers

HeaderRequiredDescription
x-to11-context-operationYesSet to execute_tool
x-to11-context-tool-nameNoName of the tool that was executed
x-to11-context-tool-call-idNoTool call ID from the LLM response
x-to11-context-tool-duration-msNoHow long the tool took to execute (milliseconds)

Emitted span

execute_tool {tool_name} with attributes gen_ai.tool.name, gen_ai.tool.call.id.

Example

import requests

# After executing a tool call from a previous LLM response:
response = requests.post("https://gw.to11.ai/v1/chat/completions",
    headers={
        "x-to11-authorization": f"Bearer {to11_api_key}",
        "Authorization": f"Bearer {provider_api_key}",
        "x-to11-project-id": project_id,
        "Content-Type": "application/json",
        "traceparent": traceparent,
        "x-to11-context-operation": "execute_tool",
        "x-to11-context-tool-name": "web_search",
        "x-to11-context-tool-call-id": "call_abc123",
        "x-to11-context-tool-duration-ms": "250",
    },
    json={
        "model": "gpt-4o",
        "messages": [
            {"role": "user", "content": "Search for recent news"},
            {"role": "assistant", "tool_calls": [...]},
            {"role": "tool", "tool_call_id": "call_abc123", "content": "Search results..."},
        ],
    },
)

Retrieval context (RAG)

When your application performed a vector search or document retrieval before the LLM call, attach retrieval context so the gateway emits a span representing that work.

Headers

HeaderRequiredDescription
x-to11-context-operationYesSet to retrieval
x-to11-context-retrieval-sourceNoSource system (e.g. pinecone, weaviate, elasticsearch)
x-to11-context-retrieval-document-countNoNumber of documents retrieved
x-to11-context-retrieval-duration-msNoHow long the retrieval took (milliseconds)
x-to11-context-retrieval-errorNoError type to record when retrieval failed (sets the span error status)

Emitted span

retrieval {source} with attributes gen_ai.retrieval.source, gen_ai.retrieval.document.count. When x-to11-context-retrieval-error is set, the span is marked with an error status.

Example

import requests

# 1. Retrieve documents from your vector store
docs = vector_store.search(query, top_k=5)

# 2. Send to LLM with retrieval context
response = requests.post("https://gw.to11.ai/v1/chat/completions",
    headers={
        "x-to11-authorization": f"Bearer {to11_api_key}",
        "Authorization": f"Bearer {provider_api_key}",
        "x-to11-project-id": project_id,
        "Content-Type": "application/json",
        "traceparent": traceparent,
        "x-to11-context-operation": "retrieval",
        "x-to11-context-retrieval-source": "pinecone",
        "x-to11-context-retrieval-document-count": str(len(docs)),
        "x-to11-context-retrieval-duration-ms": str(retrieval_time_ms),
    },
    json={
        "model": "gpt-4o",
        "messages": [
            {"role": "system", "content": f"Use these documents:\n{format_docs(docs)}"},
            {"role": "user", "content": query},
        ],
    },
)

Agent context

When your application delegates work to a sub-agent or tracks steps within an agent workflow, attach agent context so the gateway emits spans representing agent lifecycle events.

Invoke agent

To record a step within an agent workflow or a delegation to a sub-agent, set x-to11-context-operation to invoke_agent.
HeaderRequiredDescription
x-to11-context-operationYesSet to invoke_agent
x-to11-context-agent-idNoUnique agent identifier
x-to11-context-agent-nameNoHuman-readable agent name
x-to11-context-agent-stepNoZero-indexed step number within the agent workflow
Emitted span: invoke_agent {agent_name} with attributes gen_ai.agent.id, gen_ai.agent.name, gen_ai.agent.step.
import requests

# Agent "planner" — step 0
response = requests.post("https://gw.to11.ai/v1/chat/completions",
    headers={
        "x-to11-authorization": f"Bearer {to11_api_key}",
        "Authorization": f"Bearer {provider_api_key}",
        "x-to11-project-id": project_id,
        "Content-Type": "application/json",
        "traceparent": traceparent,
        "x-to11-context-operation": "invoke_agent",
        "x-to11-context-agent-id": "agent-001",
        "x-to11-context-agent-name": "planner",
        "x-to11-context-agent-step": "0",
    },
    json={"model": "gpt-4o", "messages": [...]},
)

Create agent

To record agent creation, set x-to11-context-operation to create_agent.
HeaderRequiredDescription
x-to11-context-operationYesSet to create_agent
x-to11-context-agent-idNoUnique agent identifier
x-to11-context-agent-nameNoHuman-readable agent name
x-to11-context-agent-descriptionNoAgent purpose description
Emitted span: create_agent {agent_name} with attributes gen_ai.agent.id, gen_ai.agent.name, gen_ai.agent.description.
For production agent lifecycle tracking, consider emitting create_agent spans directly via OTLP to the collector. The header-based approach is a convenience fallback. See Direct Ingestion.
import requests

# Register the "planner" agent before its first step
response = requests.post("https://gw.to11.ai/v1/chat/completions",
    headers={
        "x-to11-authorization": f"Bearer {to11_api_key}",
        "Authorization": f"Bearer {provider_api_key}",
        "x-to11-project-id": project_id,
        "Content-Type": "application/json",
        "traceparent": traceparent,
        "x-to11-context-operation": "create_agent",
        "x-to11-context-agent-id": "agent-001",
        "x-to11-context-agent-name": "planner",
        "x-to11-context-agent-description": "Breaks a goal into ordered steps",
    },
    json={"model": "gpt-4o", "messages": [...]},
)

Operation name override

By default, the gateway sets gen_ai.operation.name to chat for text requests and automatically promotes to generate_content when image content blocks are detected. To override this behavior, set the x-to11-operation header:
Header valueEffect
chatForce chat even on multimodal requests
generate_contentForce generate_content even on text-only requests
Unknown values are silently ignored and detection proceeds normally.
# Force "chat" on a request with images (e.g. image is just metadata, not generation input)
curl https://gw.to11.ai/v1/chat/completions \
  -H "x-to11-authorization: Bearer $TO11_API_KEY" \
  -H "Authorization: Bearer $PROVIDER_API_KEY" \
  -H "Content-Type: application/json" \
  -H "x-to11-operation: chat" \
  -d '{
    "model": "gpt-4o",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Extract text from this receipt"},
        {"type": "image_url", "image_url": {"url": "..."}}
      ]
    }]
  }'

Resulting span tree

The following diagram shows how a RAG workflow with tool use appears in the trace. The HTTP request span is the parent; retrieval and GenAI operation spans are siblings under it.
HTTP request span
  |
  +-- retrieval pinecone        (x-to11-context-operation: retrieval)
  |     gen_ai.retrieval.source: "pinecone"
  |     gen_ai.retrieval.document.count: "5"
  |
  +-- chat gpt-4o               (GenAI operation span)
        gen_ai.operation.name: "chat"
        gen_ai.usage.input_tokens: 2450
When tool execution context is also present, both sibling spans appear:
HTTP request span
  |
  +-- execute_tool web_search   (x-to11-context-operation: execute_tool)
  |     gen_ai.tool.name: "web_search"
  |     gen_ai.tool.call.id: "call_abc123"
  |
  +-- chat gpt-4o               (GenAI operation span)
        gen_ai.operation.name: "chat"
        gen_ai.usage.input_tokens: 3200
        gen_ai.usage.output_tokens: 450