Router Tracing
Collect tracing information from the router
The GraphOS Router and Apollo Router Core support collection of traces with OpenTelemetry, with exporters for:
OpenTelemetry Protocol (OTLP) over HTTP or gRPC
The router generates spans that include the various phases of serving a request and associated dependencies. This is useful for showing how response time is affected by:
Sub-request response times
Query shape (sub-request dependencies)
Router post-processing
Span data is sent to a collector such as Jaeger, which can assemble spans into a Gantt chart for analysis.
Tracing common configuration
Common tracing configuration contains global settings for all exporters.
Service name
Set a service name for your router traces so you can easily locate them in external metrics dashboards.
The service name can be set by an environment variable or in router.yaml
, with the following order of precedence (first to last):
OTEL_SERVICE_NAME
environment variableOTEL_RESOURCE_ATTRIBUTES
environment variabletelemetry.exporters.tracing.common.service_name
inrouter.yaml
Example service_nameExample setting service name intelemetry.exporters.tracing.common.service_name
:YAMLrouter.yaml1telemetry: 2 exporters: 3 tracing: 4 common: 5 # (Optional) Set the service name to easily find metrics related to the apollo-router in your metrics dashboards 6 service_name: "router"
telemetry.exporters.tracing.common.resource
inrouter.yaml
Example resourceExample setting service name intelemetry.exporters.tracing.common.resource
:YAMLrouter.yaml1telemetry: 2 exporters: 3 tracing: 4 common: 5 resource: 6 # (Optional) Set the service name to easily find metrics related to the apollo-router in your metrics dashboards 7 "service.name": "router"
If the service name isn't explicitly set, it defaults to unknown_service:router
or unknown_service
if the executable name cannot be determined.
resource
A resource attribute is a set of key-value pairs that provide additional information to an exporter. Application performance monitors (APM) may interpret and display resource information.
In router.yaml
, resource attributes are set in telemetry.exporters.tracing.common.resource
. For example:
1telemetry:
2 exporters:
3 tracing:
4 common:
5 resource:
6 "environment.name": "production"
7 "environment.namespace": "{env.MY_K8_NAMESPACE_ENV_VARIABLE}"
For OpenTelemetry conventions for resources, see Resource Semantic Conventions.
sampler
You can configure the sampling rate of traces to match the rate of your application performance monitors (APM). To enable sampling configuration, in router.yaml
set telemetry.exporters.tracing.common.sampler
and telemetry.exporters.tracing.common.parent_based_sampler
:
1telemetry:
2 exporters:
3 tracing:
4 common:
5 sampler: always_on # (default) all requests are sampled (always_on|always_off|<0.0-1.0>)
6 parent_based_sampler: true # (default) If an incoming span has OpenTelemetry headers then the request will always be sampled.
sampler
sets the sampling rate as a decimal percentage,always_on
, oralways_off
.For example, setting
sampler: 0.1
samples 10% of your requests.always_on
(the default) sends all spans to your APM.always_off
turns off sampling. No spans reach your APM.
parent_based_sampler
enables clients to make the sampling decision. This guarantees that a trace that starts at a client will also have spans at the router. You may wish to disable it (settingparent_based_sampler: false
) if your router is exposed directly to the internet.
preview_datadog_agent_sampling
Since 1.59
Enable accurate Datadog APM views with the preview_datadog_agent_sampling
option.
The Datadog APM view relies on traces to generate metrics. For this to be accurate, all requests must be sampled and sent to the Datadog Agent.
To both enable accurate APM views and prevent all traces from being sent to Datadog, you must set preview_datadog_agent_sampling
to true
and adjust the sampler
to the desired percentage of traces to be sent to Datadog.
1telemetry:
2 exporters:
3 tracing:
4 common:
5 # Only 10 percent of spans will be forwarded from the Datadog agent to Datadog. Experiment to find a value that is good for you!
6 sampler: 0.1
7 preview_datadog_agent_sampling: true
To learn more details and limitations about this option, go to preview_datadog_agent_sampling
in DataDog trace exporter docs.
propagation
The telemetry.exporters.tracing.propagation
section allows you to configure which propagators are active in addition to those automatically activated by using an exporter.
Specifying explicit propagation is generally only required if you're using an exporter that supports multiple trace ID formats, for example, OpenTelemetry Collector, Jaeger, or OpenTracing compatible exporters.
For example:
1telemetry:
2 exporters:
3 tracing:
4 propagation:
5 # https://www.w3.org/TR/baggage/
6 baggage: false
7
8 # https://www.datadoghq.com/
9 datadog: false
10
11 # https://www.jaegertracing.io/ (compliant with opentracing)
12 jaeger: false
13
14 # https://www.w3.org/TR/trace-context/
15 trace_context: false
16
17 # https://zipkin.io/ (compliant with opentracing)
18 zipkin: false
19
20 # https://aws.amazon.com/xray/ (compliant with opentracing)
21 aws_xray: false
22
23 # If you have your own way to generate a trace id and you want to pass it via a custom request header
24 request:
25 # The name of the header to read the trace id from
26 header_name: my-trace-id
27 # The format of the trace when propagating to subgraphs.
28 format: uuid
request
configuration reference
Option | Values | Default | Description |
---|---|---|---|
header_name | The name of the http header to use for propagation. | ||
format | hexadecimal |open_telemetry |decimal |datadog |uuid | hexadecimal | The output format of the trace_id |
Valid values for format
:
hexadecimal
- 32-character hexadecimal string (e.g.0123456789abcdef0123456789abcdef
)open_telemetry
- 32-character hexadecimal string (e.g.0123456789abcdef0123456789abcdef
)decimal
- 16-character decimal string (e.g.1234567890123456
)datadog
- 16-character decimal string (e.g.1234567890123456
)uuid
- 36-character UUID string (e.g.01234567-89ab-cdef-0123-456789abcdef
)
open_telemetry
or uuid
format.Limits
You may set limits on spans to prevent sending too much data to your APM. For example:
1telemetry:
2 exporters:
3 tracing:
4 common:
5 max_attributes_per_event: 128
6 max_attributes_per_link: 128
7 max_attributes_per_span: 128
8 max_events_per_span: 128
9 max_links_per_span: 128
Attributes, events and links that exceed the limits are dropped silently.
max_attributes_per_event
Events are used to describe something that happened in the context of a span. For example, an exception or a message sent. These events can have attributes that are key-value pairs that provide additional information to display via APM.
max_attributes_per_link
Spans may link to other spans in the same or different trace. For example, a span may link to a parent span, or a span may link to a span in a different trace to represent that trace's parent. These links may have attributes that are key-value pairs that provide additional information to display via APM.
max_attributes_per_span
Spans are used to a activity in the context of a trace. For example, a request to a subgraph or a query planning. Spans can have attributes that are key-value pairs that provide additional information to display via APM.
max_events_per_span
Spans may have events that describe something that happened in the context of a span. For example, an exception or a message sent. The number of events per span can be limited to prevent spans becoming very large.
max_links_per_span
Spans may link to other spans in the same or different trace. For example, a span may link to a parent span, or a span may link to a span in a different trace to represent that trace's parent. The number of links per span can be limited to prevent spans becoming very large.
experimental_response_trace_id
You can also give feedback in the discussion on GitHub.
If you want to expose in response headers the generated trace ID or the one you provided using propagation headers you can use this configuration:
1telemetry:
2 exporters:
3 tracing:
4 experimental_response_trace_id:
5 enabled: true # default: false
6 header_name: "my-trace-id" # default: "apollo-trace-id"
Using this configuration you will have a response header called my-trace-id
containing the trace ID. It could help you to debug a specific query if you want to grep your log with this trace id to have more context.
experimental_response_trace_id
reference
Attribute | Default | Description |
---|---|---|
enabled | false | Set to true to return trace IDs on response headers. |
header_name | apollo-trace-id | The name of the header to respond with. |
Tracing common reference
Attribute | Default | Description |
---|---|---|
parent_based_sampler | true | Sampling decisions from upstream will be honored |
preview_datadog_agent_sampling | false | Send all spans to the Datadog agent. |
propagation | The propagation configuration. | |
sampler | always_on | The sampling rate for traces. |
service_name | unknown_service:router | The OpenTelemetry service name. |
service_namespace | The OpenTelemetry namespace. | |
resource | The OpenTelemetry resource to attach to traces. | |
experimental_response_trace_id | Return the trace ID in a response header. | |
max_attributes_per_event | 128 | The maximum number of attributes per event. |
max_attributes_per_link | 128 | The maximum number of attributes per link. |
max_attributes_per_span | 128 | The maximum number of attributes per span. |
max_events_per_span | 128 | The maximum number of events per span. |
max_links_per_span | 128 | The maximum links per span. |