Documentation Index
Fetch the complete documentation index at: https://to11.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Self-Hosted Observability
The docker compose stack ships a complete observability backend. This guide shows you how to navigate it.
Architecture overview
to11 runs two independent telemetry pipelines. Each serves a different purpose and lands in a different backend.
Gateway / API
│
├─ [telemetry] App traces & logs ──▶ Alloy ──▶ Tempo (traces)
│ ├──▶ Loki (logs)
│ └──▶ Mimir (metrics)
│
└─ [genai_telemetry] LLM spans ───▶ OTel GenAI Collector ──▶ ClickHouse
| Pipeline | What it captures | Storage | Query UI |
|---|
| Application | HTTP request spans, middleware timing, error rates, container logs | Tempo + Loki + Mimir | Grafana (:3001) |
| GenAI | LLM call spans, token usage, content capture, tool calls | ClickHouse | Web app (:3000) or ClickHouse directly |
Services and ports
After running docker compose -f docker-compose.production.yml --profile observability up -d, these observability services are available (to pin a specific build, prefix with IMAGE_TAG=sha-abc1234):
| Service | URL | Purpose |
|---|
| Grafana | localhost:3001 | Dashboards, trace explorer, log viewer |
| Tempo | localhost:3200 | Trace storage (queried via Grafana) |
| Loki | localhost:3101 | Log aggregation (queried via Grafana) |
| Mimir | localhost:9009 | Metrics storage / Prometheus-compatible |
| Alloy | localhost:12345 | OTel collector + log shipper |
| ClickHouse | localhost:8123 | GenAI telemetry storage |
| OTel GenAI Collector | localhost:4317 / localhost:4318 | Receives GenAI OTLP, writes to ClickHouse |
Grafana ships with pre-configured data sources for Tempo, Loki, and Mimir. No setup required.
Open the Trace Explorer
- Open localhost:3001 (anonymous admin access is enabled)
- Click Explore in the left sidebar
- Select Tempo as the data source
Search by service
Use the Search tab to filter traces:
- Service Name:
gateway or to11-api
- Span Name: filter by operation (e.g.
POST /v1/chat/completions)
- Duration: find slow requests
- Status: filter for errors
TraceQL queries
Switch to the TraceQL tab for more powerful queries:
// All gateway traces
{ resource.service.name = "gateway" }
// Traces with errors
{ resource.service.name = "gateway" && status = error }
// Slow requests (over 2 seconds)
{ resource.service.name = "gateway" } | duration > 2s
// Specific endpoint
{ resource.service.name = "gateway" && name =~ "POST.*chat" }
// API service traces
{ resource.service.name = "to11-api" }
Reading a trace waterfall
Click any trace to open the waterfall view. You’ll see:
- Root span — the inbound HTTP request to the gateway
- Child spans — middleware processing, upstream provider call, response handling
- Span attributes — HTTP method, status code, URL path, timing details
- Events — errors, warnings, or other logged events within a span
Traces to logs
The Grafana Tempo data source is configured with Traces-to-Logs linking. When viewing a trace:
- Click the Logs for this span button in the trace detail panel
- This jumps to Loki with the
trace_id pre-filled, showing all log lines emitted during that trace
Exploring logs in Grafana
- In Explore, select the Loki data source
- Use LogQL to query:
// All gateway container logs
{service="gateway"}
// API logs containing "error"
{service="api"} |= "error"
// Logs correlated to a specific trace
{service="gateway"} | json | trace_id = "abc123..."
Container logs from all Docker Compose services are automatically collected by Alloy and shipped to Loki.
Exploring metrics in Grafana
- In Explore, select the Mimir data source
- Use PromQL to query span-derived metrics:
// Request rate by service
rate(traces_spanmetrics_calls_total[5m])
// P95 latency for gateway spans
histogram_quantile(0.95, rate(traces_spanmetrics_duration_seconds_bucket{service="gateway"}[5m]))
// OTel Collector health (scraped from GenAI collector :8888)
otelcol_exporter_sent_spans_total
Tempo’s metrics generator automatically derives RED metrics (Rate, Errors, Duration) from ingested traces and pushes them to Mimir.
Querying GenAI telemetry in ClickHouse
GenAI spans (LLM calls, token counts, content capture) go to ClickHouse via the OTel GenAI Collector.
Using the Web UI
The to11 web app at localhost:3000 provides a traces page per project. Navigate to a project and open the Traces tab to see:
- Trace list with provider, model, tokens, cost, and status
- Trace detail view with span waterfall and attributes
Querying ClickHouse directly
Connect to ClickHouse at localhost:8123 (user: otel, password: otel):
# Quick check for data
curl "http://localhost:8123/?user=otel&password=otel" \
--data-binary "SELECT count() FROM otel_traces"
# Recent GenAI spans
curl "http://localhost:8123/?user=otel&password=otel" \
--data-binary "
SELECT
Timestamp,
SpanName,
SpanAttributes['gen_ai.request.model'] AS model,
SpanAttributes['gen_ai.provider.name'] AS provider,
SpanAttributes['gen_ai.usage.input_tokens'] AS input_tokens,
SpanAttributes['gen_ai.usage.output_tokens'] AS output_tokens
FROM otel_traces
ORDER BY Timestamp DESC
LIMIT 10
FORMAT Pretty
"
See the Metrics reference for more query examples.
Host-machine development (no Docker gateway)
When running the Rust gateway on your host machine instead of in Docker (for
example cargo run -p gateway), the infra services still run in Docker. The
port mappings differ slightly:
| Signal | Docker-to-Docker endpoint | Native (host) endpoint |
|---|
| App traces (Alloy) | alloy:4317 | localhost:14317 |
| App telemetry HTTP | alloy:4318 | localhost:14318 |
| GenAI OTLP (Collector) | otel-genai-collector:4317 | localhost:4317 |
Use docker/gateway/config.dev.toml which has these host endpoints pre-configured:
source .env.local
GATEWAY_CONFIG=docker/gateway/config.dev.toml cargo run -p gateway
Troubleshooting
No traces appearing in Tempo
- Verify Alloy is running:
curl http://localhost:12345 should return the Alloy UI
- Check that
[telemetry].enabled = true in the gateway config
- Look at Alloy logs:
docker compose -f docker-compose.production.yml logs alloy
No GenAI spans in ClickHouse
- Verify the OTel GenAI Collector is healthy:
curl http://localhost:13133/
- Check collector logs:
docker compose -f docker-compose.production.yml logs otel-genai-collector
- Confirm ClickHouse is accepting writes:
docker compose -f docker-compose.production.yml logs clickhouse
ClickHouse shows unhealthy
ClickHouse can take 30-60 seconds to fully initialize. If it stays unhealthy:
# Check ClickHouse logs
docker compose -f docker-compose.production.yml logs clickhouse
# Verify connectivity
curl "http://localhost:8123/?user=otel&password=otel" \
--data-binary "SELECT 1"
Grafana data source errors
If Grafana shows “Bad Gateway” for a data source, the backend service may still be starting. Wait 30 seconds and refresh — Grafana connects to Tempo, Loki, and Mimir via internal Docker networking.