Documentation Index
Fetch the complete documentation index at: https://to11.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Distributed Tracing
The gateway supports W3C Trace Context propagation via the traceparent header. This allows you to group multiple LLM calls — or entire agent-to-agent communication chains — under a single distributed trace.
What W3C Trace Context gives you
- One trace, many calls — every LLM request that shares a
trace_id appears as a single trace in your observability backend
- Cross-service visibility — traces span your application, the gateway, and (if the provider supports it) the upstream LLM
- Causality — parent/child span relationships show exactly which application action triggered which LLM call
How the gateway propagates traces
- Your application sends a request with a
traceparent header
- The gateway extracts the trace context and creates HTTP + GenAI spans as children of that trace
- The gateway returns
traceparent in the response headers so your application can continue the trace
Your App (traceparent: 00-<trace_id>-<span_id>-01)
|
v
Gateway (:4000)
|--- Creates HTTP span (child of your trace)
|--- Creates GenAI span (child of HTTP span)
|--- Injects traceparent into upstream request
|
v
LLM Provider (receives traceparent)
Group multiple LLM calls under one trace
If your application uses the OpenTelemetry SDK, wrap multiple gateway calls in a single span. The OTel HTTP instrumentation automatically propagates the traceparent header.
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.instrumentation.requests import RequestsInstrumentor
import requests
# Set up OTel with OTLP export to the same collector as the gateway
provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
trace.set_tracer_provider(provider)
# Auto-instrument HTTP client to propagate traceparent
RequestsInstrumentor().instrument()
tracer = trace.get_tracer("my-agent")
# All LLM calls inside this span share the same trace_id
with tracer.start_as_current_span("user-journey-summarize"):
requests.post("http://localhost:4000/v1/chat/completions", json={
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Summarize this document..."}]
}, headers={"Authorization": "Bearer sk-..."})
requests.post("http://localhost:4000/v1/chat/completions", json={
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Extract key entities..."}]
}, headers={"Authorization": "Bearer sk-..."})
Both LLM calls appear as children of the user-journey-summarize span in your trace viewer.
Agent-to-agent tracing
When one agent delegates to another, pass the traceparent through to maintain a single end-to-end trace:
Agent A (trace_id=abc123)
├─ LLM call 1 → Gateway span (child of abc123)
├─ LLM call 2 → Gateway span (child of abc123)
└─ passes traceparent to Agent B
Agent B (same trace_id=abc123)
├─ LLM call 3 → Gateway span (child of abc123)
└─ LLM call 4 → Gateway span (child of abc123)
All four LLM calls land in the same trace, giving end-to-end visibility across agents.
To enrich these traces with session metadata or client-side operation context (tool execution, retrieval, agent steps), see Context Propagation.
Manual traceparent without an OTel SDK
If you don’t use an OpenTelemetry SDK, you can still pass a traceparent header manually. The format is:
traceparent: 00-<32 hex trace_id>-<16 hex span_id>-01
For example:
# Generate a trace ID and span ID
TRACE_ID=$(openssl rand -hex 16)
SPAN_ID=$(openssl rand -hex 8)
# First call
curl http://localhost:4000/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-H "traceparent: 00-${TRACE_ID}-${SPAN_ID}-01" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Step 1..."}]}'
# Second call with the same trace_id (different span_id)
SPAN_ID2=$(openssl rand -hex 8)
curl http://localhost:4000/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-H "traceparent: 00-${TRACE_ID}-${SPAN_ID2}-01" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Step 2..."}]}'
Both requests share the same trace_id and appear in one trace in ClickHouse or Tempo.
The gateway response includes a traceparent header with the gateway’s own span ID, so you can chain calls by reading and forwarding it.