Datadog exporter (via OTLP)

Configure the Datadog exporter for tracing


Enable and configure the Datadog

exporter for tracing in the GraphOS Router or Apollo Router Core.

For general tracing configuration, refer to Router Tracing Configuration.

Attributes for Datadog APM UI

The router should set attributes that Datadog uses to organize its APM view and other UI:

  • otel.name: span name that's fixed for Datadog

  • resource.name: Datadog resource name that's displayed in traces

  • operation.name: Datadog operation name that populates a dropdown menu in the Datadog service page

You should add these attributes to your router.yaml configuration file. The example below sets these attributes for the router, supergraph, and subgraph stages of the router's request lifecycle:

YAML
router.yaml
1telemetry:
2  instrumentation:
3    spans:
4      mode: spec_compliant
5      router:
6        attributes:
7          otel.name: router
8          operation.name: "router"
9          resource.name:
10            request_method: true
11
12      supergraph:
13        attributes:
14          otel.name: supergraph
15          operation.name: "supergraph"
16          resource.name:
17            operation_name: string
18
19      subgraph:
20        attributes:
21          otel.name: subgraph
22          operation.name: "subgraph"
23          resource.name:
24            subgraph_operation_name: string

Consequently you can filter for these operations in Datadog APM:

Datadog APM showing operations set with example attributes set in router.yaml

OTLP configuration

OpenTelemetry protocol (OTLP)

is the recommended protocol for transmitting telemetry, including traces, to Datadog.

To setup traces to Datadog via OTLP, you must do the following:

  • Modify the default configuration of the Datadog Agent to accept OTLP traces from the router.

  • Configure the router to send traces to the configured Datadog Agent.

Datadog Agent configuration

To configure the Datadog Agent, add OTLP configuration to your datadog.yaml. For example:

YAML
datadog.yaml
1otlp_config:
2  receiver:
3    protocols:
4      grpc:
5        endpoint: <dd-agent-ip>:4317

For additional Datadog Agent configuration details, review Datadog's Enabling OTLP Ingestion on the Datadog Agent

documentation.

Router configuration

To configure the router, enable the OTLP exporter and set endpoint: <datadog-agent-endpoint>. For example:

YAML
router.yaml
1telemetry:
2  exporters:
3    tracing:
4      common:
5        # Configured to forward 10 percent of spans from the Datadog Agent to Datadog. Experiment to find a value that is good for you.
6        preview_datadog_agent_sampling: true
7        sampler: 0.1
8
9      otlp:
10        enabled: true
11        # Optional endpoint, either 'default' or a URL (Defaults to http://127.0.0.1:4317)
12        endpoint: "${env.DATADOG_AGENT_HOST}:4317"
13
14        # Optional batch processor setting, this will enable the batch processor to send concurrent requests in a high load scenario.
15        batch_processor:
16          max_concurrent_exports: 100

Adjusting the sampler controls the sampling decisions that the router makes on its own and decreases the rate at which you sample. Your sample rate can have a direct impact on your Datadog bill.

note
If you see warning messages from the router regarding the batch span processor, you may need to adjust your batch_processor settings in your exporter config to match the volume of spans being created in a router instance. This applies to both OTLP and the Datadog native exporters.

Enabling Datadog Agent sampling

The Datadog APM view relies on traces to generate metrics. For these metrics to be accurate, all requests must be sampled and sent to the Datadog agent. To prevent all traces from being sent to Datadog, in your router you must set preview_datadog_agent_sampling to true and adjust the sampler to the desired percentage of traces to be sent to Datadog.

YAML
router.yaml
1telemetry:
2  exporters:
3    tracing:
4      common:
5        # Configured to forward 10 percent of spans from the Datadog Agent to Datadog. Experiment to find a value that is good for you.
6        sampler: 0.1
7        preview_datadog_agent_sampling: true
note
  • The router doesn't support in-agent ingestion control
    .
  • Configuring traces_per_second in the Datadog Agent will not dynamically adjust the router's sampling rate to meet the target rate.
  • Using preview_datadog_agent_sampling will send all spans to the Datadog Agent. This will have an impact on the resource usage and performance of both the router and Datadog Agent.

Enabling log correlation

To enable Datadog log correlation, you must configure dd.trace_id to appear on the router span:

YAML
router.yaml
1telemetry:
2  instrumentation:
3    spans:
4      mode: spec_compliant
5      router:
6        attributes:
7          dd.trace_id: true

Your JSON formatted log messages will automatically output dd.trace_id on each log message if dd.trace_id was detected on the router span.

Datadog native configuration

caution
Native Datadog tracing is not part of the OpenTelemetry spec, and given that Datadog supports OTLP we will be deprecating native Datadog tracing in the future. Use OTLP configuration instead.

The router can be configured to connect to either the native, default Datadog agent address or a URL:

YAML
router.yaml
1telemetry:
2  exporters:
3    tracing:
4      common:
5        # Configured to forward 10 percent of spans from the Datadog Agent to Datadog. Experiment to find a value that is good for you.
6        preview_datadog_agent_sampling: true
7        sampler: 0.1
8
9      datadog:
10        enabled: true
11        # Optional endpoint, either 'default' or a URL (Defaults to http://127.0.0.1:8126)
12        endpoint: "http://${env.DATADOG_AGENT_HOST}:8126"
13
14        # Optional batch processor setting, this will enable the batch processor to send concurrent requests in a high load scenario.
15        batch_processor:
16          max_concurrent_exports: 100
17
18  # Enable graphql.operation.name attribute on supergraph spans.
19  instrumentation:
20    spans:
21      mode: spec_compliant
22      supergraph:
23        attributes:
24          graphql.operation.name: true
note
Depending on the volume of spans being created in a router instance, it will be necessary to adjust the batch_processor settings in your exporter config. This applies to both OTLP and the Datadog native exporter.

enabled

Set to true to enable the Datadog exporter. Defaults to false.

enable_span_mapping (default: true)

There are some incompatibilities

between Datadog and OpenTelemetry, the Datadog exporter might not provide meaningful contextual information in the exported spans. To fix this, you can configure the router to perform a mapping for the span name and the span resource name.

YAML
router.yaml
1telemetry:
2  exporters:
3     tracing:
4       datadog:
5         enabled: true
6         enable_span_mapping: true

With enable_span_mapping: true, the router performs the following mapping:

  1. Use the OpenTelemetry span name to set the Datadog span operation name.

  2. Use the OpenTelemetry span attributes to set the Datadog span resource name.

Example trace

For example, assume a client sends a query MyQuery to the router. The router's query planner sends a subgraph query to my-subgraph-name and creates the following trace:

Text
1    | apollo_router request                                                                 |
2        | apollo_router router                                                              |
3            | apollo_router supergraph                                                      |
4            | apollo_router query_planning  | apollo_router execution                       |
5                                                | apollo_router fetch                       |
6                                                    | apollo_router subgraph                |
7                                                        | apollo_router subgraph_request    |

As you can see, there is no clear information about the name of the query, the name of the subgraph, and the name of the query sent to the subgraph.

Instead, when enable_span_mapping is set to true the following trace will be created:

Text
1    | request /graphql                                                                                   |
2        | router /graphql                                                                                         |
3            | supergraph MyQuery                                                                         |
4                | query_planning MyQuery  | execution                                                    |
5                                              | fetch fetch                                              |
6                                                  | subgraph my-subgraph-name                            |
7                                                      | subgraph_request MyQuery__my-subgraph-name__0    |

fixed_span_names (default: true)

When fixed_span_names: true, the apollo router to use the original span names instead of the dynamic ones as described by OTel semantic conventions.

YAML
router.yaml
1telemetry:
2  exporters:
3     tracing:
4       datadog:
5         enabled: true
6         fixed_span_names: true

This will allow you to have a finite list of operation names in Datadog on the APM view.

resource_mapping

When set, resource_mapping allows you to specify which attribute to use in the Datadog APM and Trace view. The default resource mappings are:

OpenTelemetry Span NameDatadog Span Operation Name
requesthttp.route
routerhttp.route
supergraphgraphql.operation.name
query_planninggraphql.operation.name
subgraphsubgraph.name
subgraph_requestgraphql.operation.name
http_requesthttp.route

You may override these mappings by specifying the resource_mapping configuration:

YAML
router.yaml
1telemetry:
2  exporters:
3     tracing:
4       datadog:
5         enabled: true
6         resource_mapping:
7           # Use `my.span.attribute` as the resource name for the `router` span
8           router: "my.span.attribute"
9  instrumentation:
10    spans:
11      router:
12        attributes:
13          # Add a custom attribute to the `router` span
14          my.span.attribute:
15            request_header: x-custom-header

If you have introduced a new span in a custom build of the Router you can enable resource mapping for it by adding it to the resource_mapping configuration.

span_metrics

When set, span_metrics allows you to specify which spans will show span metrics in the Datadog APM and Trace view. By default, span metrics are enabled for:

  • request

  • router

  • supergraph

  • subgraph

  • subgraph_request

  • http_request

  • query_planning

  • execution

  • query_parsing

You may override these defaults by specifying span_metrics configuration:

The following will disable span metrics for the supergraph span.

YAML
router.yaml
1telemetry:
2  exporters:
3    tracing:
4      datadog:
5        enabled: true
6        span_metrics:
7          # Disable span metrics for supergraph
8          supergraph: false
9          # Enable span metrics for my_custom_span
10          my_custom_span: true

If you have introduced a new span in a custom build of the Router you can enable span metrics for it by adding it to the span_metrics configuration.

batch_processor

All exporters support configuration of a batch span processor with batch_processor.

You must tune your batch_processor configuration if you see any of the following messages in your logs:

  • OpenTelemetry trace error occurred: cannot send span to the batch span processor because the channel is full

  • OpenTelemetry metrics error occurred: cannot send span to the batch span processor because the channel is full

The exact settings depend on the bandwidth available for you to send data to your application peformance monitor (APM) and the bandwidth configuration of your APM. Expect to tune these settings over time as your application changes.

YAML
1telemetry:
2  exporters:
3    tracing:
4      datadog:
5        batch_processor:
6          max_export_batch_size: 512
7          max_concurrent_exports: 1
8          max_export_timeout: 30s
9          max_queue_size: 2048
10          scheduled_delay: 5s

batch_processor configuration reference

AttributeDefaultDescription
scheduled_delay5sThe delay in seconds from receiving the first span to sending the batch.
max_concurrent_exports1The maximum number of overlapping export requests.
max_export_batch_size512The number of spans to include in a batch. May be limited by maximum message size limits.
max_export_timeout30sThe timeout in seconds for sending spans before dropping the data.
max_queue_size2048The maximum number of spans to be buffered before dropping span data.

Datadog native configuration reference

AttributeDefaultDescription
enabledfalseEnable the OTLP exporter.
enable_span_mappingfalseIf span mapping should be used.
endpointhttp://localhost:8126/v0.4/tracesThe endpoint to send spans to.
batch_processorThe batch processor settings.
resource_mappingSee configA map of span names to attribute names.
span_metricsSee configA map of span names to boolean.

Sampler configuration

When using Datadog to gain insight into your router's performance, you need to decide whether to use the Datadog APM view or rely on OTLP metrics. The Datadog APM view is driven by traces. In order for this view to be accurate, all requests must be sampled and sent to the Datadog Agent.

Tracing is expensive both in terms of APM costs and router performance, so you typically will want to set the sampler to sample at low rates in production environments. This, however, impacts the APM view, which will show only a small percentage of traces.

To mitigate this, you can use Datadog Agent sampling mode, where all traces are sent to the Datadog Agent but only a percentage of them are forwarded to Datadog. This keeps the APM view accurate while lowering costs. Note that the router will incur a performance cost of having an effective sample rate of 100%.

Use the following guidelines on how to configure the sampler and preview_datadog_agent_sampling to get the desired behavior:

I want the APM view to show metrics for 100% of traffic, and I am OK with the performance impact on the router.

Set preview_datadog_agent_sampling to true and adjust the sampler to the desired percentage of traces to be sent to Datadog.

YAML
router.yaml
1telemetry:
2  exporters:
3    tracing:
4      common:
5        # All requests will be traced and sent to the Datadog agent.
6        # Only 10 percent of spans will be forwarded from the Datadog agent to Datadog.
7        preview_datadog_agent_sampling: true
8        sampler: 0.1

I want the Datadog Agent to be in control of the percentage of traces sent to Datadog.

Use the Datadog Agent's probabalistic_sampling option sampler and set the sampler to always_on to allow the agent to control the sampling rate.

Router config:

YAML
router.yaml
1telemetry:
2  exporters:
3    tracing:
4      common:
5        # All requests will be traced and sent to the Datadog agent.
6        sampler: always_on

Datadog agent config:

YAML
1otlp_config:
2  traces:
3    probabilistic_sampling:
4      # Only 10 percent of spans will be forwarded to Datadog
5      sampling_percentage: 10

I want the best performance from the router and I'm not concerned with the APM view. I use metrics and traces to monitor my application.

Set the sample to a low value to reduce the number of traces sent to Datadog. Leave preview_datadog_agent_sampling to false.

YAML
router.yaml
1telemetry:
2  exporters:
3    tracing:
4      common:
5        # Only 10 percent of requests will be traced and sent to the Datadog agent. The APM view will only show a subset of total request data but the Router will perform better.
6        sampler: 0.1
7        preview_datadog_agent_sampling: false

sampler (default: always_on)

The sampler configuration allows you to control the sampling decisions that the router will make on its own and decrease the rate at which you sample, which can have a direct impact on your Datadog bill.

YAML
router.yaml
1telemetry:
2  exporters:
3    tracing:
4      common:
5        # Only 10 percent of spans will be forwarded to the Datadog agent. Experiment to find a value that is good for you!
6        sampler: 0.1

If you are using the Datadog APM viw then you should set preview_datadog_agent_sampling to true and adjust the sampler to the desired percentage of traces to be sent to Datadog.

preview_datadog_agent_sampling (default: false)

The Datadog APM view relies on traces to generate metrics. For this to be accurate 100% of requests must be sampled and sent to the Datadog agent. To prevent ALL traces from then being sent to Datadog, you must set preview_datadog_agent_sampling to true and adjust the sampler to the desired percentage of traces to be sent to Datadog.

YAML
router.yaml
1telemetry:
2  exporters:
3    tracing:
4      common:
5        # Only 10 percent of spans will be forwarded from the Datadog agent to Datadog. Experiment to find a value that is good for you!
6        preview_datadog_agent_sampling: true
7        sampler: 0.1

Using preview_datadog_agent_sampling will send all spans to the Datadog Agent, but only the percentage of traces configured by the sampler will be forwarded to Datadog. This means that your APM view will be accurate, but it will incur performance and resource usage costs for both the router and Datadog Agent to send and receive all spans.

If your use case allows your APM view to show only a subset of traces, then you can set preview_datadog_agent_sampling to false. You should alternatively rely on OTLP metrics to gain insight into the router's performance.

note
  • The router doesn't support in-agent ingestion control
    .
  • Configuring traces_per_second in the Datadog Agent will not dynamically adjust the router's sampling rate to meet the target rate.
Feedback

Edit on GitHub

Forums