Distributed Tracing

The gateway supports W3C Trace Context propagation via the traceparent header. This allows you to group multiple LLM calls — or entire agent-to-agent communication chains — under a single distributed trace.

What W3C Trace Context gives you

One trace, many calls — every LLM request that shares a trace_id appears as a single trace in your observability backend
Cross-service visibility — traces span your application, the gateway, and (if the provider supports it) the upstream LLM
Causality — parent/child span relationships show exactly which application action triggered which LLM call

How the gateway propagates traces

Your application sends a request with a traceparent header
The gateway extracts the trace context and creates HTTP + GenAI spans as children of that trace
The gateway returns traceparent in the response headers so your application can continue the trace

Your App (traceparent: 00-<trace_id>-<span_id>-01)
    |
    v
Gateway (:4000)
    |--- Creates HTTP span (child of your trace)
    |--- Creates GenAI span (child of HTTP span)
    |--- Injects traceparent into upstream request
    |
    v
LLM Provider (receives traceparent)

Group multiple LLM calls under one trace

If your application uses the OpenTelemetry SDK, wrap multiple gateway calls in a single span. The OTel HTTP instrumentation automatically propagates the traceparent header.

from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.instrumentation.requests import RequestsInstrumentor
import requests

# Set up OTel with OTLP export to the same collector as the gateway
provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
trace.set_tracer_provider(provider)

# Auto-instrument HTTP client to propagate traceparent
RequestsInstrumentor().instrument()

tracer = trace.get_tracer("my-agent")

# All LLM calls inside this span share the same trace_id
with tracer.start_as_current_span("user-journey-summarize"):
    requests.post("http://localhost:4000/v1/chat/completions", json={
        "model": "gpt-4o",
        "messages": [{"role": "user", "content": "Summarize this document..."}]
    }, headers={"Authorization": "Bearer sk-..."})

    requests.post("http://localhost:4000/v1/chat/completions", json={
        "model": "gpt-4o",
        "messages": [{"role": "user", "content": "Extract key entities..."}]
    }, headers={"Authorization": "Bearer sk-..."})

Both LLM calls appear as children of the user-journey-summarize span in your trace viewer.

Agent-to-agent tracing

When one agent delegates to another, pass the traceparent through to maintain a single end-to-end trace:

Agent A (trace_id=abc123)
  ├─ LLM call 1 → Gateway span (child of abc123)
  ├─ LLM call 2 → Gateway span (child of abc123)
  └─ passes traceparent to Agent B
Agent B (same trace_id=abc123)
  ├─ LLM call 3 → Gateway span (child of abc123)
  └─ LLM call 4 → Gateway span (child of abc123)

All four LLM calls land in the same trace, giving end-to-end visibility across agents.

To enrich these traces with session metadata or client-side operation context (tool execution, retrieval, agent steps), see Context Propagation.

Manual traceparent without an OTel SDK

If you don’t use an OpenTelemetry SDK, you can still pass a traceparent header manually. The format is:

traceparent: 00-<32 hex trace_id>-<16 hex span_id>-01

For example:

# Generate a trace ID and span ID
TRACE_ID=$(openssl rand -hex 16)
SPAN_ID=$(openssl rand -hex 8)

# First call
curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -H "traceparent: 00-${TRACE_ID}-${SPAN_ID}-01" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Step 1..."}]}'

# Second call with the same trace_id (different span_id)
SPAN_ID2=$(openssl rand -hex 8)
curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -H "traceparent: 00-${TRACE_ID}-${SPAN_ID2}-01" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Step 2..."}]}'

Both requests share the same trace_id and appear in one trace in ClickHouse or Tempo. The gateway response includes a traceparent header with the gateway’s own span ID, so you can chain calls by reading and forwarding it.

Get Started

Concepts

Routing

Reference

Security

Telemetry

Distributed Tracing

Distributed Tracing

What W3C Trace Context gives you

How the gateway propagates traces

Group multiple LLM calls under one trace

Agent-to-agent tracing

Manual traceparent without an OTel SDK

Get Started

Concepts

Routing

Reference

Security

Telemetry

Documentation Index

​Distributed Tracing

​What W3C Trace Context gives you

​How the gateway propagates traces

​Group multiple LLM calls under one trace

​Agent-to-agent tracing

​Manual traceparent without an OTel SDK

Distributed Tracing

What W3C Trace Context gives you

How the gateway propagates traces

Group multiple LLM calls under one trace

Agent-to-agent tracing

Manual traceparent without an OTel SDK