Metrics

The gateway exports histogram metrics and counters via the GenAI telemetry pipeline. All metrics are dimensioned by standard GenAI operation attributes and land in ClickHouse via the OTel Collector.

Histogram metrics

Metric	Unit	Description
`gen_ai.client.operation.duration`	seconds	End-to-end latency for each LLM call
`gen_ai.client.token.usage`	tokens	Token distributions, dimensioned by `gen_ai.token.type` (`input` or `output`)
`gen_ai.server.time_to_first_token`	seconds	Streaming TTFT distribution
`gen_ai.server.request.duration`	seconds	Total upstream request duration
`gen_ai.server.time_per_output_token`	seconds	Per-token generation time
`gen_ai.server.output_tokens_per_second`	tokens/sec	Token generation throughput

Counter metrics

Metric	Description
`gen_ai.client.errors`	Error count, dimensioned by `error.type`
`gen_ai.client.guardrail_violations`	Guardrail hit count, dimensioned by `gen_ai.guardrail.detector` and `gen_ai.guardrail.direction`
`gen_ai.client.finish_reasons`	Finish reason distribution, dimensioned by `gen_ai.response.finish_reason`
`gateway.retry.attempts`	Retry attempt count
`gateway.retry.target_exclusions`	Target exclusion count, dimensioned by `gateway.retry.excluded_target`
`gateway.retry.exhaustion_resets`	Exclusion reset count

Metric dimensions

All histogram metrics carry these dimensions:

Dimension	Source
`gen_ai.operation.name`	Operation type
`gen_ai.provider.name`	Provider identity
`gen_ai.request.model`	Requested model
`gen_ai.response.model`	Response model (when available)
`server.address`	Upstream hostname
`server.port`	Upstream port

The token usage metric additionally carries gen_ai.token.type (input or output).

ClickHouse query examples

All GenAI spans for a trace

SELECT
    Timestamp,
    SpanName,
    Duration,
    SpanAttributes['gen_ai.request.model'] AS model,
    SpanAttributes['gen_ai.provider.name'] AS provider,
    SpanAttributes['gen_ai.usage.input_tokens'] AS input_tokens,
    SpanAttributes['gen_ai.usage.output_tokens'] AS output_tokens,
    SpanAttributes['gen_ai.response.finish_reasons'] AS finish_reasons
FROM otel_traces
WHERE TraceId = '{trace_id}'
ORDER BY Timestamp;

Streaming requests with TTFT

SELECT
    Timestamp,
    SpanName,
    SpanAttributes['gen_ai.server.time_to_first_token'] AS ttft_seconds,
    SpanAttributes['gen_ai.server.request.duration'] AS stream_duration_seconds,
    SpanAttributes['gen_ai.request.model'] AS model
FROM otel_traces
WHERE SpanAttributes['gen_ai.request.stream'] = 'true'
  AND SpanAttributes['gen_ai.server.time_to_first_token'] != ''
ORDER BY Timestamp DESC
LIMIT 100;

Token usage by model

SELECT
    SpanAttributes['gen_ai.request.model'] AS model,
    count() AS calls,
    sum(toUInt64OrZero(SpanAttributes['gen_ai.usage.input_tokens'])) AS total_input,
    sum(toUInt64OrZero(SpanAttributes['gen_ai.usage.output_tokens'])) AS total_output
FROM otel_traces
WHERE SpanAttributes['gen_ai.operation.name'] = 'chat'
  AND Timestamp > now() - INTERVAL 1 HOUR
GROUP BY model
ORDER BY total_input + total_output DESC;

Error rate by provider

SELECT
    SpanAttributes['gen_ai.provider.name'] AS provider,
    count() AS total,
    countIf(StatusCode = 'Error') AS errors,
    round(errors / total * 100, 2) AS error_pct
FROM otel_traces
WHERE SpanAttributes['gen_ai.operation.name'] = 'chat'
  AND Timestamp > now() - INTERVAL 1 HOUR
GROUP BY provider;

Tool call child spans

SELECT
    t.SpanName,
    t.SpanAttributes['gen_ai.tool.name'] AS tool_name,
    t.SpanAttributes['gen_ai.tool.call.id'] AS call_id,
    t.Duration AS duration_ns,
    parent.SpanName AS parent_span
FROM otel_traces t
LEFT JOIN otel_traces parent ON t.ParentSpanId = parent.SpanId AND t.TraceId = parent.TraceId
WHERE t.SpanName LIKE 'tool_call.%'
  AND t.Timestamp > now() - INTERVAL 1 HOUR
ORDER BY t.Timestamp DESC
LIMIT 50;

Context operation spans (retrieval, tool execution, agent)

SELECT
    Timestamp,
    SpanName,
    SpanAttributes['gen_ai.operation.name'] AS operation,
    SpanAttributes['gen_ai.retrieval.source'] AS retrieval_source,
    SpanAttributes['gen_ai.tool.name'] AS tool_name,
    SpanAttributes['gen_ai.agent.name'] AS agent_name
FROM otel_traces
WHERE SpanAttributes['gen_ai.operation.name'] IN ('execute_tool', 'retrieval', 'invoke_agent')
  AND Timestamp > now() - INTERVAL 1 HOUR
ORDER BY Timestamp DESC
LIMIT 50;

Get Started

Concepts

Routing

Reference

Security

Telemetry

Metrics

Metrics

Histogram metrics

Counter metrics

Metric dimensions

ClickHouse query examples

All GenAI spans for a trace

Streaming requests with TTFT

Token usage by model

Error rate by provider

Tool call child spans

Context operation spans (retrieval, tool execution, agent)

Get Started

Concepts

Routing

Reference

Security

Telemetry

Documentation Index

​Metrics

​Histogram metrics

​Counter metrics

​Metric dimensions

​ClickHouse query examples

​All GenAI spans for a trace

​Streaming requests with TTFT

​Token usage by model

​Error rate by provider

​Tool call child spans

​Context operation spans (retrieval, tool execution, agent)

Metrics

Histogram metrics

Counter metrics

Metric dimensions

ClickHouse query examples

All GenAI spans for a trace

Streaming requests with TTFT

Token usage by model

Error rate by provider

Tool call child spans

Context operation spans (retrieval, tool execution, agent)