Skip to main content

Documentation Index

Fetch the complete documentation index at: https://to11.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Metrics

The gateway exports histogram metrics and counters via the GenAI telemetry pipeline. All metrics are dimensioned by standard GenAI operation attributes and land in ClickHouse via the OTel Collector.

Histogram metrics

MetricUnitDescription
gen_ai.client.operation.durationsecondsEnd-to-end latency for each LLM call
gen_ai.client.token.usagetokensToken distributions, dimensioned by gen_ai.token.type (input or output)
gen_ai.server.time_to_first_tokensecondsStreaming TTFT distribution
gen_ai.server.request.durationsecondsTotal upstream request duration
gen_ai.server.time_per_output_tokensecondsPer-token generation time
gen_ai.server.output_tokens_per_secondtokens/secToken generation throughput

Counter metrics

MetricDescription
gen_ai.client.errorsError count, dimensioned by error.type
gen_ai.client.guardrail_violationsGuardrail hit count, dimensioned by gen_ai.guardrail.detector and gen_ai.guardrail.direction
gen_ai.client.finish_reasonsFinish reason distribution, dimensioned by gen_ai.response.finish_reason
gateway.retry.attemptsRetry attempt count
gateway.retry.target_exclusionsTarget exclusion count, dimensioned by gateway.retry.excluded_target
gateway.retry.exhaustion_resetsExclusion reset count

Metric dimensions

All histogram metrics carry these dimensions:
DimensionSource
gen_ai.operation.nameOperation type
gen_ai.provider.nameProvider identity
gen_ai.request.modelRequested model
gen_ai.response.modelResponse model (when available)
server.addressUpstream hostname
server.portUpstream port
The token usage metric additionally carries gen_ai.token.type (input or output).

ClickHouse query examples

All GenAI spans for a trace

SELECT
    Timestamp,
    SpanName,
    Duration,
    SpanAttributes['gen_ai.request.model'] AS model,
    SpanAttributes['gen_ai.provider.name'] AS provider,
    SpanAttributes['gen_ai.usage.input_tokens'] AS input_tokens,
    SpanAttributes['gen_ai.usage.output_tokens'] AS output_tokens,
    SpanAttributes['gen_ai.response.finish_reasons'] AS finish_reasons
FROM otel_traces
WHERE TraceId = '{trace_id}'
ORDER BY Timestamp;

Streaming requests with TTFT

SELECT
    Timestamp,
    SpanName,
    SpanAttributes['gen_ai.server.time_to_first_token'] AS ttft_seconds,
    SpanAttributes['gen_ai.server.request.duration'] AS stream_duration_seconds,
    SpanAttributes['gen_ai.request.model'] AS model
FROM otel_traces
WHERE SpanAttributes['gen_ai.request.stream'] = 'true'
  AND SpanAttributes['gen_ai.server.time_to_first_token'] != ''
ORDER BY Timestamp DESC
LIMIT 100;

Token usage by model

SELECT
    SpanAttributes['gen_ai.request.model'] AS model,
    count() AS calls,
    sum(toUInt64OrZero(SpanAttributes['gen_ai.usage.input_tokens'])) AS total_input,
    sum(toUInt64OrZero(SpanAttributes['gen_ai.usage.output_tokens'])) AS total_output
FROM otel_traces
WHERE SpanAttributes['gen_ai.operation.name'] = 'chat'
  AND Timestamp > now() - INTERVAL 1 HOUR
GROUP BY model
ORDER BY total_input + total_output DESC;

Error rate by provider

SELECT
    SpanAttributes['gen_ai.provider.name'] AS provider,
    count() AS total,
    countIf(StatusCode = 'Error') AS errors,
    round(errors / total * 100, 2) AS error_pct
FROM otel_traces
WHERE SpanAttributes['gen_ai.operation.name'] = 'chat'
  AND Timestamp > now() - INTERVAL 1 HOUR
GROUP BY provider;

Tool call child spans

SELECT
    t.SpanName,
    t.SpanAttributes['gen_ai.tool.name'] AS tool_name,
    t.SpanAttributes['gen_ai.tool.call.id'] AS call_id,
    t.Duration AS duration_ns,
    parent.SpanName AS parent_span
FROM otel_traces t
LEFT JOIN otel_traces parent ON t.ParentSpanId = parent.SpanId AND t.TraceId = parent.TraceId
WHERE t.SpanName LIKE 'tool_call.%'
  AND t.Timestamp > now() - INTERVAL 1 HOUR
ORDER BY t.Timestamp DESC
LIMIT 50;

Context operation spans (retrieval, tool execution, agent)

SELECT
    Timestamp,
    SpanName,
    SpanAttributes['gen_ai.operation.name'] AS operation,
    SpanAttributes['gen_ai.retrieval.source'] AS retrieval_source,
    SpanAttributes['gen_ai.tool.name'] AS tool_name,
    SpanAttributes['gen_ai.agent.name'] AS agent_name
FROM otel_traces
WHERE SpanAttributes['gen_ai.operation.name'] IN ('execute_tool', 'retrieval', 'invoke_agent')
  AND Timestamp > now() - INTERVAL 1 HOUR
ORDER BY Timestamp DESC
LIMIT 50;