Documentation Index
Fetch the complete documentation index at: https://to11.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Metrics
The gateway exports histogram metrics and counters via the GenAI telemetry pipeline. All metrics are dimensioned by standard GenAI operation attributes and land in ClickHouse via the OTel Collector.
Histogram metrics
| Metric | Unit | Description |
|---|
gen_ai.client.operation.duration | seconds | End-to-end latency for each LLM call |
gen_ai.client.token.usage | tokens | Token distributions, dimensioned by gen_ai.token.type (input or output) |
gen_ai.server.time_to_first_token | seconds | Streaming TTFT distribution |
gen_ai.server.request.duration | seconds | Total upstream request duration |
gen_ai.server.time_per_output_token | seconds | Per-token generation time |
gen_ai.server.output_tokens_per_second | tokens/sec | Token generation throughput |
Counter metrics
| Metric | Description |
|---|
gen_ai.client.errors | Error count, dimensioned by error.type |
gen_ai.client.guardrail_violations | Guardrail hit count, dimensioned by gen_ai.guardrail.detector and gen_ai.guardrail.direction |
gen_ai.client.finish_reasons | Finish reason distribution, dimensioned by gen_ai.response.finish_reason |
gateway.retry.attempts | Retry attempt count |
gateway.retry.target_exclusions | Target exclusion count, dimensioned by gateway.retry.excluded_target |
gateway.retry.exhaustion_resets | Exclusion reset count |
Metric dimensions
All histogram metrics carry these dimensions:
| Dimension | Source |
|---|
gen_ai.operation.name | Operation type |
gen_ai.provider.name | Provider identity |
gen_ai.request.model | Requested model |
gen_ai.response.model | Response model (when available) |
server.address | Upstream hostname |
server.port | Upstream port |
The token usage metric additionally carries gen_ai.token.type (input or output).
ClickHouse query examples
All GenAI spans for a trace
SELECT
Timestamp,
SpanName,
Duration,
SpanAttributes['gen_ai.request.model'] AS model,
SpanAttributes['gen_ai.provider.name'] AS provider,
SpanAttributes['gen_ai.usage.input_tokens'] AS input_tokens,
SpanAttributes['gen_ai.usage.output_tokens'] AS output_tokens,
SpanAttributes['gen_ai.response.finish_reasons'] AS finish_reasons
FROM otel_traces
WHERE TraceId = '{trace_id}'
ORDER BY Timestamp;
Streaming requests with TTFT
SELECT
Timestamp,
SpanName,
SpanAttributes['gen_ai.server.time_to_first_token'] AS ttft_seconds,
SpanAttributes['gen_ai.server.request.duration'] AS stream_duration_seconds,
SpanAttributes['gen_ai.request.model'] AS model
FROM otel_traces
WHERE SpanAttributes['gen_ai.request.stream'] = 'true'
AND SpanAttributes['gen_ai.server.time_to_first_token'] != ''
ORDER BY Timestamp DESC
LIMIT 100;
Token usage by model
SELECT
SpanAttributes['gen_ai.request.model'] AS model,
count() AS calls,
sum(toUInt64OrZero(SpanAttributes['gen_ai.usage.input_tokens'])) AS total_input,
sum(toUInt64OrZero(SpanAttributes['gen_ai.usage.output_tokens'])) AS total_output
FROM otel_traces
WHERE SpanAttributes['gen_ai.operation.name'] = 'chat'
AND Timestamp > now() - INTERVAL 1 HOUR
GROUP BY model
ORDER BY total_input + total_output DESC;
Error rate by provider
SELECT
SpanAttributes['gen_ai.provider.name'] AS provider,
count() AS total,
countIf(StatusCode = 'Error') AS errors,
round(errors / total * 100, 2) AS error_pct
FROM otel_traces
WHERE SpanAttributes['gen_ai.operation.name'] = 'chat'
AND Timestamp > now() - INTERVAL 1 HOUR
GROUP BY provider;
SELECT
t.SpanName,
t.SpanAttributes['gen_ai.tool.name'] AS tool_name,
t.SpanAttributes['gen_ai.tool.call.id'] AS call_id,
t.Duration AS duration_ns,
parent.SpanName AS parent_span
FROM otel_traces t
LEFT JOIN otel_traces parent ON t.ParentSpanId = parent.SpanId AND t.TraceId = parent.TraceId
WHERE t.SpanName LIKE 'tool_call.%'
AND t.Timestamp > now() - INTERVAL 1 HOUR
ORDER BY t.Timestamp DESC
LIMIT 50;
Context operation spans (retrieval, tool execution, agent)
SELECT
Timestamp,
SpanName,
SpanAttributes['gen_ai.operation.name'] AS operation,
SpanAttributes['gen_ai.retrieval.source'] AS retrieval_source,
SpanAttributes['gen_ai.tool.name'] AS tool_name,
SpanAttributes['gen_ai.agent.name'] AS agent_name
FROM otel_traces
WHERE SpanAttributes['gen_ai.operation.name'] IN ('execute_tool', 'retrieval', 'invoke_agent')
AND Timestamp > now() - INTERVAL 1 HOUR
ORDER BY Timestamp DESC
LIMIT 50;