Router Telemetry
Collect observable data to monitor your router and supergraph
Since the router is the single access point for all traffic to and from your graph, router telemetry is the most comprehensive way to observe your supergraph. By implementing telemetry, you can:
Monitor your supergraph's health and performance
Diagnose issues and deduce root causes
Optimize resource usage and system reliability
To understand how router telemetry fits into the broader set of GraphOS observability tooling, see the observability overview.
How router telemetry works
By default, the router doesn't collect or export any telemetry beyond the operation and field usage metrics it sends to GraphOS. You configure which additional telemetry data to collect and where to export it via your router's configuration file.
The router request lifecycle is the primary data source for telemetry data or signals. Telemetry signals include logs, metrics, and traces. The section on router telemetry signals explains these data types and gives basic configuration examples. Exporters are responsible for sending telemetry data to your application performance monitoring (APM) and observability tools for storage, visualization, and analysis.
Telemetry exporters
The router emits telemetry in the industry-standard OpenTelemetry Protocol (OTLP) format and is therefore compatible with many APM tools, including:
Prometheus
OpenTelemetry Collector
Datadog
New Relic
Jaeger
Zipkin
Attributes and selectors
Attributes and selectors are key-value pairs that add contextual information from the router request lifecycle to telemetry data. You can use attributes and selectors to annotate events, metrics, and spans so they can help you filter and group data in your APMs.
The router supports a set of standard attributes from OpenTelemetry semantic conventions. Example attributes include:
HTTP status code
GraphQL operation name
Subgraph name
Selectors allow you to define custom data points based on the router's request lifecycle.
Description | |
---|---|
Attribute | Standard data points that can be attached to spans, instruments, and events. |
Selector | Custom data points extracted from the router's request lifecycle, tailored to specific needs. |
Router telemetry signals
The router supports three signal types for collecting and exporting telemetry:
Signal | Description |
---|---|
Logs and events |
|
Metrics and instruments |
|
Traces and spans |
|
These mechanisms let you collect data about the inner workings of your router and graph and export them accordingly.
Logs and events
Logs record events in the router's request lifecycle. Examples of logged events include:
Information about the router lifecycle
Warnings about misconfiguration
Errors that occurred during a request
Log exporters
You can log events to standard output in either text or JSON format. Logs can also be consumed by logging exporters and as part of spans via tracing exporters.
Example log configuration
This configuration snippet enables stdout logging in JSON:
1telemetry:
2 exporters:
3 logging:
4 stdout:
5 enabled: true
6 format: json
Metrics and instruments
Metrics are measurements of the router's behavior that are collected and often analyzed over time to identify trends. Examples of router metrics include the number of incoming HTTP requests and the time spent processing a request.
Instruments define how to collect and report metrics. Different kinds of instruments include counters, gauges, and histograms. For example, given the metric "number of incoming HTTP requests," a counter records the total number of requests, a histogram captures the distribution of request counts over time, and a gauge provides a snapshot of the current request count at a given moment.
Instrument types
Metric instruments fall into three categories:
Instrument Type | Description |
---|---|
OTEL instruments | Standard OpenTelemetry instruments around the HTTP lifecycle, including:
|
Router instruments | Standard instruments for the router request life cycle, including:
|
Custom instrument | Custom instruments defined in the router request life cycle. |
Example instrument configuration
This configuration snippet enables OTEL instrumentation for a histogram of request body sizes:
1telemetry:
2 instrumentation:
3 instruments:
4 router:
5 http.server.request.body.size: true
See Instruments for an overview of available instruments and a guide for configuring and customizing instruments.
Metric exporters
In addition to the operation metrics and field usage metrics that GraphOS Router sends to GraphOS, you can configure the router with metric exporters for other observability tools and APMs.
This configuration snippet enables exporting metrics to Prometheus:
1telemetry:
2 exporters:
3 metrics:
4 prometheus:
5 enabled: true
6 listen: 127.0.0.1:9090
7 path: /metrics
Learn more about sending metrics to Prometheus and metric exporters in general.
Traces and spans
Traces help you monitor the flow of a request through the router. A trace is composed of spans. A span captures a request's duration as it flows through the router request lifecycle. Spans may include contextual information about the request, such as the HTTP status code or the name of the subgraph being queried.
Examples of spans include:
router - Wraps an entire request from the HTTP perspective
supergraph - Wraps a request once GraphQL parsing has taken place
subgraph - Wraps a request to a subgraph.
Tracing exporters
If you've enabled federated tracing (also known as FTV1 tracing) in your subgraph libraries, the router sends field-level traces to GraphOS. Additionally, trace exporters can consume and report traces to your APM.
This configuration snippet enables
setting attributes that Datadog uses to organize its APM view
exporting traces to a Datadog agent:
1telemetry:
2 instrumentation:
3 spans:
4 mode: spec_compliant
5 router:
6 attributes:
7 otel.name: router
8 operation.name: "router"
9 resource.name:
10 request_method: true
11 supergraph:
12 attributes:
13 otel.name: supergraph
14 operation.name: "supergraph"
15 resource.name:
16 operation_name: string
17 subgraph:
18 attributes:
19 otel.name: subgraph
20 operation.name: "subgraph"
21 resource.name:
22 subgraph_operation_name: string
23 exporters:
24 tracing:
25 otlp:
26 enabled: true
27 endpoint: "${env.DATADOG_AGENT_HOST}:4317"
Learn more about sending traces to DataDog and trace exporters in general.
Best practices
Collecting exactly the telemetry you need
Effective telemetry provides just the right amount and granularity of information to maintain your graph. Too much data can overwhelm your system, for example, with high cardinality metrics. Too little may not provide enough information to debug issues.
Specific events that need to be captured—and the conditions under which they need to be captured—can change as client applications and graphs change. Different environments, such as production and development, can have different observability requirements.
Router telemetry is customizable to meet the observability needs of different graphs. Keep in mind your particular environments' and graphs' requirements when configuring your telemetry.
Setting conditions for collecting telemetry
You can set conditions for instruments and events to only collect telemetry data when necessary. This configuration snippet enables only collecting the configured telemetry data when the request_header
is equal to "example-value":
1eq:
2 - "example-value"
3 - request_header: x-req-header
Dropping metrics using views
You can use metric exporters' view
property with the drop
aggregation to remove certain metrics from being sent to your APM. This configuration snippet removes all instruments that begin with apollo_router
:
1telemetry:
2 exporters:
3 metrics:
4 common:
5 service_name: apollo-router
6 views:
7 - name: apollo_router*
8 aggregation: drop
Balancing telemetry and router performance
Keep in mind that the amount of telemetry you add can impact your router's performance.
Custom metrics, events, and attributes consume more processing resources than standard metrics. Adding too many (standard or custom) can slow your router down.
Configurations such as
events.*.request|error|response
that produce output for all router lifecycle services should only be used for development or debugging, not for production.
For properly logged telemetry, you should use a log verbosity of info
. Set the values of RUST_LOG
or APOLLO_ROUTER_LOG
environment variables and the --log
CLI option to info
. Using less verbose logging, such as error
, can cause some attributes to be dropped.
Next steps
Consult the following documentation for details on how to configure the various telemetry mechanisms and exporters: