Router Tracing

Collect tracing information from the router


The GraphOS Router and Apollo Router Core support collection of traces with OpenTelemetry, with exporters for:

The router generates spans that include the various phases of serving a request and associated dependencies. This is useful for showing how response time is affected by:

  • Sub-request response times

  • Query shape (sub-request dependencies)

  • Router post-processing

Span data is sent to a collector such as Jaeger, which can assemble spans into a Gantt chart for analysis.

tip
To get the most out of distributed tracing, all components in your system should be instrumented.

Tracing common configuration

Common tracing configuration contains global settings for all exporters.

Service name

Set a service name for your router traces so you can easily locate them in external metrics dashboards.

The service name can be set by an environment variable or in router.yaml, with the following order of precedence (first to last):

  1. OTEL_SERVICE_NAME environment variable

  2. OTEL_RESOURCE_ATTRIBUTES environment variable

  3. telemetry.exporters.tracing.common.service_name in router.yaml

    Example service_name
    Example setting service name in telemetry.exporters.tracing.common.service_name:
    YAML
    router.yaml
    1telemetry:
    2  exporters:
    3    tracing:
    4      common:
    5        # (Optional) Set the service name to easily find metrics related to the apollo-router in your metrics dashboards
    6        service_name: "router"
  4. telemetry.exporters.tracing.common.resource in router.yaml

    Example resource
    Example setting service name in telemetry.exporters.tracing.common.resource:
    YAML
    router.yaml
    1telemetry:
    2  exporters:
    3     tracing:
    4       common:
    5         resource:
    6           # (Optional) Set the service name to easily find metrics related to the apollo-router in your metrics dashboards
    7           "service.name": "router"

If the service name isn't explicitly set, it defaults to unknown_service:router or unknown_service if the executable name cannot be determined.

resource

A resource attribute is a set of key-value pairs that provide additional information to an exporter. Application performance monitors (APM) may interpret and display resource information.

In router.yaml, resource attributes are set in telemetry.exporters.tracing.common.resource. For example:

YAML
router.yaml
1telemetry:
2  exporters:
3     tracing:
4       common:
5         resource:
6           "environment.name": "production"
7           "environment.namespace": "{env.MY_K8_NAMESPACE_ENV_VARIABLE}"

For OpenTelemetry conventions for resources, see Resource Semantic Conventions.

sampler

You can configure the sampling rate of traces to match the rate of your application performance monitors (APM). To enable sampling configuration, in router.yaml set telemetry.exporters.tracing.common.sampler and telemetry.exporters.tracing.common.parent_based_sampler:

YAML
router.yaml
1telemetry:
2  exporters:
3     tracing:
4       common:
5         sampler: always_on # (default) all requests are sampled (always_on|always_off|<0.0-1.0>)
6         parent_based_sampler: true # (default) If an incoming span has OpenTelemetry headers then the request will always be sampled. 
  • sampler sets the sampling rate as a decimal percentage, always_on, or always_off.

    • For example, setting sampler: 0.1 samples 10% of your requests.

    • always_on (the default) sends all spans to your APM.

    • always_off turns off sampling. No spans reach your APM.

  • parent_based_sampler enables clients to make the sampling decision. This guarantees that a trace that starts at a client will also have spans at the router. You may wish to disable it (setting parent_based_sampler: false) if your router is exposed directly to the internet.

preview_datadog_agent_samplingSince 1.59

Preview
This feature is in preview. Your questions and feedback are highly valueddon't hesitate to get in touch with your Apollo contact .

Enable accurate Datadog APM views with the preview_datadog_agent_sampling option.

The Datadog APM view relies on traces to generate metrics. For this to be accurate, all requests must be sampled and sent to the Datadog Agent.

To both enable accurate APM views and prevent all traces from being sent to Datadog, you must set preview_datadog_agent_sampling to true and adjust the sampler to the desired percentage of traces to be sent to Datadog.

YAML
router.yaml
1telemetry:
2  exporters:
3    tracing:
4      common:
5        # Only 10 percent of spans will be forwarded from the Datadog agent to Datadog. Experiment to find a value that is good for you!
6        sampler: 0.1
7        preview_datadog_agent_sampling: true

To learn more details and limitations about this option, go to preview_datadog_agent_sampling in DataDog trace exporter docs.

propagation

The telemetry.exporters.tracing.propagation section allows you to configure which propagators are active in addition to those automatically activated by using an exporter.

Specifying explicit propagation is generally only required if you're using an exporter that supports multiple trace ID formats, for example, OpenTelemetry Collector, Jaeger, or OpenTracing compatible exporters.

For example:

YAML
router.yaml
1telemetry:
2  exporters:
3     tracing:
4       propagation:
5         # https://www.w3.org/TR/baggage/
6         baggage: false
7   
8         # https://www.datadoghq.com/
9         datadog: false
10   
11         # https://www.jaegertracing.io/ (compliant with opentracing)
12         jaeger: false
13   
14         # https://www.w3.org/TR/trace-context/
15         trace_context: false
16   
17         # https://zipkin.io/ (compliant with opentracing)
18         zipkin: false
19   
20         # https://aws.amazon.com/xray/ (compliant with opentracing)
21         aws_xray: false
22   
23         # If you have your own way to generate a trace id and you want to pass it via a custom request header
24         request:
25           # The name of the header to read the trace id from
26           header_name: my-trace-id
27           # The format of the trace when propagating to subgraphs.
28           format: uuid

request configuration reference

OptionValuesDefaultDescription
header_nameThe name of the http header to use for propagation.
formathexadecimal|open_telemetry|decimal|datadog|uuidhexadecimalThe output format of the trace_id

Valid values for format:

  • hexadecimal - 32-character hexadecimal string (e.g. 0123456789abcdef0123456789abcdef)

  • open_telemetry - 32-character hexadecimal string (e.g. 0123456789abcdef0123456789abcdef)

  • decimal - 16-character decimal string (e.g. 1234567890123456)

  • datadog - 16-character decimal string (e.g. 1234567890123456)

  • uuid - 36-character UUID string (e.g. 01234567-89ab-cdef-0123-456789abcdef)

note
Incoming trace IDs must be in open_telemetry or uuid format.

Limits

You may set limits on spans to prevent sending too much data to your APM. For example:

YAML
router.yaml
1telemetry:
2  exporters:
3     tracing:
4       common:
5         max_attributes_per_event: 128
6         max_attributes_per_link: 128
7         max_attributes_per_span: 128
8         max_events_per_span: 128
9         max_links_per_span: 128

Attributes, events and links that exceed the limits are dropped silently.

max_attributes_per_event

Events are used to describe something that happened in the context of a span. For example, an exception or a message sent. These events can have attributes that are key-value pairs that provide additional information to display via APM.

Spans may link to other spans in the same or different trace. For example, a span may link to a parent span, or a span may link to a span in a different trace to represent that trace's parent. These links may have attributes that are key-value pairs that provide additional information to display via APM.

max_attributes_per_span

Spans are used to a activity in the context of a trace. For example, a request to a subgraph or a query planning. Spans can have attributes that are key-value pairs that provide additional information to display via APM.

max_events_per_span

Spans may have events that describe something that happened in the context of a span. For example, an exception or a message sent. The number of events per span can be limited to prevent spans becoming very large.

Spans may link to other spans in the same or different trace. For example, a span may link to a parent span, or a span may link to a span in a different trace to represent that trace's parent. The number of links per span can be limited to prevent spans becoming very large.

experimental_response_trace_id

This feature is experimental. Your questions and feedback are highly valued—don't hesitate to get in touch with your Apollo contact.

You can also give feedback in the discussion on GitHub.

If you want to expose in response headers the generated trace ID or the one you provided using propagation headers you can use this configuration:

YAML
router.yaml
1telemetry:
2  exporters:
3     tracing:
4       experimental_response_trace_id:
5         enabled: true # default: false
6         header_name: "my-trace-id" # default: "apollo-trace-id"

Using this configuration you will have a response header called my-trace-id containing the trace ID. It could help you to debug a specific query if you want to grep your log with this trace id to have more context.

experimental_response_trace_id reference

AttributeDefaultDescription
enabledfalseSet to true to return trace IDs on response headers.
header_nameapollo-trace-idThe name of the header to respond with.

Tracing common reference

AttributeDefaultDescription
parent_based_samplertrueSampling decisions from upstream will be honored
preview_datadog_agent_samplingfalseSend all spans to the Datadog agent.
propagationThe propagation configuration.
sampleralways_onThe sampling rate for traces.
service_nameunknown_service:routerThe OpenTelemetry service name.
service_namespaceThe OpenTelemetry namespace.
resourceThe OpenTelemetry resource to attach to traces.
experimental_response_trace_idReturn the trace ID in a response header.
max_attributes_per_event128The maximum number of attributes per event.
max_attributes_per_link128The maximum number of attributes per link.
max_attributes_per_span128The maximum number of attributes per span.
max_events_per_span128The maximum number of events per span.
max_links_per_span128The maximum links per span.
Feedback

Edit on GitHub

Forums