Connecting OpenTelemetry Traces to Prometheus
Convert operation traces into aggregated metrics for a broader view of your graph's performance
Operation traces provide insight into performance issues that are occurring at various execution points in your graph. However, individual traces don't provide a view of your graph's broader performance.
Helpfully, you can convert your operation traces into aggregated metrics without requiring manual instrumentation. To accomplish this, we'll use spanmetricsprocessor
in an OpenTelemetry Collector instance to automatically generate metrics from our existing trace spans.
OpenTelemetry Collector configuration
OpenTelemetry provides two different repositories for their OpenTelemetry Collector:
The core library
These repositories are similar in scope, but the contributor library includes extended features that aren't suitable for the core library. To derive performance metrics from our existing spans, we'll use the contributor library to take advantage of the spanmetricsprocessor
via the associated Docker image.
When your OpenTelemetry Collector is ready to run, you can start configuring it with this barebones example:
1receivers:
2 otlp:
3 protocols:
4 grpc:
5 http:
6 cors:
7 allowed_origins:
8 - http://*
9 - https://*
10 otlp/spanmetrics:
11 protocols:
12 grpc:
13 endpoint: 0.0.0.0:12346
14
15exporters:
16 prometheus:
17 endpoint: '0.0.0.0:9464'
18
19processors:
20 batch:
21 spanmetrics:
22 metrics_exporter: prometheus
23
24service:
25 pipelines:
26 traces:
27 receivers: [otlp]
28 processors: [spanmetrics, batch]
29 metrics:
30 receivers: [otlp/spanmetrics]
31 exporters: [prometheus]
32 processors: [batch]
Apollo Server setup
Add the OTLP Exporter (@opentelemetry/exporter-trace-otlp-http
Node package) following the same instructions as shown in the documentation for Apollo Server and OpenTelemetry.
GraphOS Router setup
To send traces from the GraphOS Router to OpenTelemetry Collector, see this article.
Prometheus setup
Lastly, we need to add the OpenTelemetry Collector as a target within Prometheus. It'll use the standard port for Prometheus metrics (9464
).
That's it- you should have access to span metrics using the same operation name!
Example queries
Here are a few sample queries to help explore the data structure being reported:
P95 by service:
histogram_quantile(.95, sum(rate(latency_bucket[5m])) by (le, service_name))
Average latency by service and operation (for example
router
/graphql.validate
):sum by (operation, service_name)(rate(latency_sum{}[1m])) / sum by (operation, service_name)(rate(latency_count{}[1m]))
RPM by service:
sum(rate(calls_total{operation="HTTP POST"}[1m])) by (service_name)
Full demo
To see this in action, check out the Supergraph Demo
repository using the OpenTelemetry-Collector-specific Docker Compose image.