Histogram metrics
| Metric | Unit | Description |
|---|---|---|
gen_ai.client.operation.duration | seconds | End-to-end latency of each LLM call |
gen_ai.client.token.usage | tokens | Token distribution, dimensioned by gen_ai.token.type (input or output) |
gen_ai.server.time_to_first_token | seconds | Time to first token. For streaming, the time to the first content delta; for non-streaming, equal to the total request duration |
gen_ai.server.request.duration | seconds | Upstream request duration |
gen_ai.server.time_per_output_token | seconds | Generation time per output token |
gen_ai.client.output_tokens_per_second | tokens/second | Output token throughput |
These are aggregate metric instruments. The same signals are also recorded as per-request span attributes, and a few carry a different name on spans than as a metric — for example, the throughput metric
gen_ai.client.output_tokens_per_second is recorded on individual spans as gen_ai.server.output_tokens_per_second. The metric and the span attribute are distinct artifacts.Counter metrics
| Metric | Description |
|---|---|
gen_ai.client.errors | Error count, dimensioned by error.type |
gen_ai.client.finish_reasons | Finish reason distribution, dimensioned by gen_ai.response.finish_reason |
gen_ai.client.guardrail_violations | Guardrail hit count, dimensioned by gen_ai.guardrail.detector and gen_ai.guardrail.direction |
Dimensions
Metrics carry these dimensions, letting you break signals down by operation, provider, and model:| Dimension | Meaning |
|---|---|
gen_ai.operation.name | Operation type |
gen_ai.provider.name | Provider identity |
gen_ai.request.model | Requested model |
gen_ai.response.model | Response model, when available |
gen_ai.token.type (input or output).
Related
Span attributes
The attributes recorded on each span.
Observe
Where you view your data.