Datadog exporter (via OTLP)
Configure the Datadog exporter for tracing
Enable and configure the Datadog exporter for tracing in the GraphOS Router or Apollo Router Core.
For general tracing configuration, refer to Router Tracing Configuration.
Attributes for Datadog APM UI
The router should set attributes that Datadog uses to organize its APM view and other UI:
otel.name
: span name that's fixed for Datadogresource.name
: Datadog resource name that's displayed in tracesoperation.name
: Datadog operation name that populates a dropdown menu in the Datadog service page
You should add these attributes to your router.yaml
configuration file. The example below sets these attributes for the router
, supergraph
, and subgraph
stages of the router's request lifecycle:
1telemetry:
2 instrumentation:
3 spans:
4 mode: spec_compliant
5 router:
6 attributes:
7 otel.name: router
8 operation.name: "router"
9 resource.name:
10 request_method: true
11
12 supergraph:
13 attributes:
14 otel.name: supergraph
15 operation.name: "supergraph"
16 resource.name:
17 operation_name: string
18
19 subgraph:
20 attributes:
21 otel.name: subgraph
22 operation.name: "subgraph"
23 resource.name:
24 subgraph_operation_name: string
Consequently you can filter for these operations in Datadog APM:
OTLP configuration
OpenTelemetry protocol (OTLP) is the recommended protocol for transmitting telemetry, including traces, to Datadog.
To setup traces to Datadog via OTLP, you must do the following:
Modify the default configuration of the Datadog Agent to accept OTLP traces from the router.
Configure the router to send traces to the configured Datadog Agent.
Datadog Agent configuration
To configure the Datadog Agent, add OTLP configuration to your datadog.yaml
. For example:
1otlp_config:
2 receiver:
3 protocols:
4 grpc:
5 endpoint: <dd-agent-ip>:4317
For additional Datadog Agent configuration details, review Datadog's Enabling OTLP Ingestion on the Datadog Agent documentation.
Router configuration
To configure the router, enable the OTLP exporter and set endpoint: <datadog-agent-endpoint>
. For example:
1telemetry:
2 exporters:
3 tracing:
4 common:
5 # Configured to forward 10 percent of spans from the Datadog Agent to Datadog. Experiment to find a value that is good for you.
6 preview_datadog_agent_sampling: true
7 sampler: 0.1
8
9 otlp:
10 enabled: true
11 # Optional endpoint, either 'default' or a URL (Defaults to http://127.0.0.1:4317)
12 endpoint: "${env.DATADOG_AGENT_HOST}:4317"
13
14 # Optional batch processor setting, this will enable the batch processor to send concurrent requests in a high load scenario.
15 batch_processor:
16 max_concurrent_exports: 100
Adjusting the sampler
controls the sampling decisions that the router makes on its own and decreases the rate at which you sample. Your sample rate can have a direct impact on your Datadog bill.
batch_processor
settings in your exporter
config to match the volume of spans being created in a router instance. This applies to both OTLP and the Datadog native exporters.Enabling Datadog Agent sampling
The Datadog APM view relies on traces to generate metrics. For these metrics to be accurate, all requests must be sampled and sent to the Datadog agent.
To prevent all traces from being sent to Datadog, in your router you must set preview_datadog_agent_sampling
to true
and adjust the sampler
to the desired percentage of traces to be sent to Datadog.
1telemetry:
2 exporters:
3 tracing:
4 common:
5 # Configured to forward 10 percent of spans from the Datadog Agent to Datadog. Experiment to find a value that is good for you.
6 sampler: 0.1
7 preview_datadog_agent_sampling: true
- The router doesn't support
in-agent
ingestion control. - Configuring
traces_per_second
in the Datadog Agent will not dynamically adjust the router's sampling rate to meet the target rate. - Using
preview_datadog_agent_sampling
will send all spans to the Datadog Agent. This will have an impact on the resource usage and performance of both the router and Datadog Agent.
Enabling log correlation
To enable Datadog log correlation, you must configure dd.trace_id
to appear on the router
span:
1telemetry:
2 instrumentation:
3 spans:
4 mode: spec_compliant
5 router:
6 attributes:
7 dd.trace_id: true
Your JSON formatted log messages will automatically output dd.trace_id
on each log message if dd.trace_id
was detected on the router
span.
Datadog native configuration
The router can be configured to connect to either the native, default Datadog agent address or a URL:
1telemetry:
2 exporters:
3 tracing:
4 common:
5 # Configured to forward 10 percent of spans from the Datadog Agent to Datadog. Experiment to find a value that is good for you.
6 preview_datadog_agent_sampling: true
7 sampler: 0.1
8
9 datadog:
10 enabled: true
11 # Optional endpoint, either 'default' or a URL (Defaults to http://127.0.0.1:8126)
12 endpoint: "http://${env.DATADOG_AGENT_HOST}:8126"
13
14 # Optional batch processor setting, this will enable the batch processor to send concurrent requests in a high load scenario.
15 batch_processor:
16 max_concurrent_exports: 100
17
18 # Enable graphql.operation.name attribute on supergraph spans.
19 instrumentation:
20 spans:
21 mode: spec_compliant
22 supergraph:
23 attributes:
24 graphql.operation.name: true
batch_processor
settings in your exporter
config. This applies to both OTLP and the Datadog native exporter.enabled
Set to true to enable the Datadog exporter. Defaults to false.
enable_span_mapping
(default: true
)
There are some incompatibilities between Datadog and OpenTelemetry, the Datadog exporter might not provide meaningful contextual information in the exported spans. To fix this, you can configure the router to perform a mapping for the span name and the span resource name.
1telemetry:
2 exporters:
3 tracing:
4 datadog:
5 enabled: true
6 enable_span_mapping: true
With enable_span_mapping: true
, the router performs the following mapping:
Use the OpenTelemetry span name to set the Datadog span operation name.
Use the OpenTelemetry span attributes to set the Datadog span resource name.
Example trace
For example, assume a client sends a query MyQuery
to the router. The router's query planner sends a subgraph query to my-subgraph-name
and creates the following trace:
1 | apollo_router request |
2 | apollo_router router |
3 | apollo_router supergraph |
4 | apollo_router query_planning | apollo_router execution |
5 | apollo_router fetch |
6 | apollo_router subgraph |
7 | apollo_router subgraph_request |
As you can see, there is no clear information about the name of the query, the name of the subgraph, and the name of the query sent to the subgraph.
Instead, when enable_span_mapping
is set to true
the following trace will be created:
1 | request /graphql |
2 | router /graphql |
3 | supergraph MyQuery |
4 | query_planning MyQuery | execution |
5 | fetch fetch |
6 | subgraph my-subgraph-name |
7 | subgraph_request MyQuery__my-subgraph-name__0 |
fixed_span_names
(default: true
)
When fixed_span_names: true
, the apollo router to use the original span names instead of the dynamic ones as described by OTel semantic conventions.
1telemetry:
2 exporters:
3 tracing:
4 datadog:
5 enabled: true
6 fixed_span_names: true
This will allow you to have a finite list of operation names in Datadog on the APM view.
resource_mapping
When set, resource_mapping
allows you to specify which attribute to use in the Datadog APM and Trace view.
The default resource mappings are:
OpenTelemetry Span Name | Datadog Span Operation Name |
---|---|
request | http.route |
router | http.route |
supergraph | graphql.operation.name |
query_planning | graphql.operation.name |
subgraph | subgraph.name |
subgraph_request | graphql.operation.name |
http_request | http.route |
You may override these mappings by specifying the resource_mapping
configuration:
1telemetry:
2 exporters:
3 tracing:
4 datadog:
5 enabled: true
6 resource_mapping:
7 # Use `my.span.attribute` as the resource name for the `router` span
8 router: "my.span.attribute"
9 instrumentation:
10 spans:
11 router:
12 attributes:
13 # Add a custom attribute to the `router` span
14 my.span.attribute:
15 request_header: x-custom-header
If you have introduced a new span in a custom build of the Router you can enable resource mapping for it by adding it to the resource_mapping
configuration.
span_metrics
When set, span_metrics
allows you to specify which spans will show span metrics in the Datadog APM and Trace view.
By default, span metrics are enabled for:
request
router
supergraph
subgraph
subgraph_request
http_request
query_planning
execution
query_parsing
You may override these defaults by specifying span_metrics
configuration:
The following will disable span metrics for the supergraph span.
1telemetry:
2 exporters:
3 tracing:
4 datadog:
5 enabled: true
6 span_metrics:
7 # Disable span metrics for supergraph
8 supergraph: false
9 # Enable span metrics for my_custom_span
10 my_custom_span: true
If you have introduced a new span in a custom build of the Router you can enable span metrics for it by adding it to the span_metrics
configuration.
batch_processor
All exporters support configuration of a batch span processor with batch_processor
.
You must tune your batch_processor
configuration if you see any of the following messages in your logs:
OpenTelemetry trace error occurred: cannot send span to the batch span processor because the channel is full
OpenTelemetry metrics error occurred: cannot send span to the batch span processor because the channel is full
The exact settings depend on the bandwidth available for you to send data to your application peformance monitor (APM) and the bandwidth configuration of your APM. Expect to tune these settings over time as your application changes.
1telemetry:
2 exporters:
3 tracing:
4 datadog:
5 batch_processor:
6 max_export_batch_size: 512
7 max_concurrent_exports: 1
8 max_export_timeout: 30s
9 max_queue_size: 2048
10 scheduled_delay: 5s
batch_processor
configuration reference
Attribute | Default | Description |
---|---|---|
scheduled_delay | 5s | The delay in seconds from receiving the first span to sending the batch. |
max_concurrent_exports | 1 | The maximum number of overlapping export requests. |
max_export_batch_size | 512 | The number of spans to include in a batch. May be limited by maximum message size limits. |
max_export_timeout | 30s | The timeout in seconds for sending spans before dropping the data. |
max_queue_size | 2048 | The maximum number of spans to be buffered before dropping span data. |
Datadog native configuration reference
Attribute | Default | Description |
---|---|---|
enabled | false | Enable the OTLP exporter. |
enable_span_mapping | false | If span mapping should be used. |
endpoint | http://localhost:8126/v0.4/traces | The endpoint to send spans to. |
batch_processor | The batch processor settings. | |
resource_mapping | See config | A map of span names to attribute names. |
span_metrics | See config | A map of span names to boolean. |
Sampler configuration
When using Datadog to gain insight into your router's performance, you need to decide whether to use the Datadog APM view or rely on OTLP metrics. The Datadog APM view is driven by traces. In order for this view to be accurate, all requests must be sampled and sent to the Datadog Agent.
Tracing is expensive both in terms of APM costs and router performance, so you typically will want to set the sampler
to sample at low rates in production environments.
This, however, impacts the APM view, which will show only a small percentage of traces.
To mitigate this, you can use Datadog Agent sampling mode, where all traces are sent to the Datadog Agent but only a percentage of them are forwarded to Datadog. This keeps the APM view accurate while lowering costs. Note that the router will incur a performance cost of having an effective sample rate of 100%.
Use the following guidelines on how to configure the sampler
and preview_datadog_agent_sampling
to get the desired behavior:
I want the APM view to show metrics for 100% of traffic, and I am OK with the performance impact on the router.
Set preview_datadog_agent_sampling
to true
and adjust the sampler
to the desired percentage of traces to be sent to Datadog.
1telemetry:
2 exporters:
3 tracing:
4 common:
5 # All requests will be traced and sent to the Datadog agent.
6 # Only 10 percent of spans will be forwarded from the Datadog agent to Datadog.
7 preview_datadog_agent_sampling: true
8 sampler: 0.1
I want the Datadog Agent to be in control of the percentage of traces sent to Datadog.
Use the Datadog Agent's probabalistic_sampling
option sampler and set the sampler
to always_on
to allow the agent to control the sampling rate.
Router config:
1telemetry:
2 exporters:
3 tracing:
4 common:
5 # All requests will be traced and sent to the Datadog agent.
6 sampler: always_on
Datadog agent config:
1otlp_config:
2 traces:
3 probabilistic_sampling:
4 # Only 10 percent of spans will be forwarded to Datadog
5 sampling_percentage: 10
I want the best performance from the router and I'm not concerned with the APM view. I use metrics and traces to monitor my application.
Set the sample
to a low value to reduce the number of traces sent to Datadog. Leave preview_datadog_agent_sampling
to false
.
1telemetry:
2 exporters:
3 tracing:
4 common:
5 # Only 10 percent of requests will be traced and sent to the Datadog agent. The APM view will only show a subset of total request data but the Router will perform better.
6 sampler: 0.1
7 preview_datadog_agent_sampling: false
sampler
(default: always_on
)
The sampler
configuration allows you to control the sampling decisions that the router will make on its own and decrease the rate at which you sample, which can have a direct impact on your Datadog bill.
1telemetry:
2 exporters:
3 tracing:
4 common:
5 # Only 10 percent of spans will be forwarded to the Datadog agent. Experiment to find a value that is good for you!
6 sampler: 0.1
If you are using the Datadog APM viw then you should set preview_datadog_agent_sampling
to true
and adjust the sampler
to the desired percentage of traces to be sent to Datadog.
preview_datadog_agent_sampling
(default: false
)
The Datadog APM view relies on traces to generate metrics. For this to be accurate 100% of requests must be sampled and sent to the Datadog agent.
To prevent ALL traces from then being sent to Datadog, you must set preview_datadog_agent_sampling
to true
and adjust the sampler
to the desired percentage of traces to be sent to Datadog.
1telemetry:
2 exporters:
3 tracing:
4 common:
5 # Only 10 percent of spans will be forwarded from the Datadog agent to Datadog. Experiment to find a value that is good for you!
6 preview_datadog_agent_sampling: true
7 sampler: 0.1
Using preview_datadog_agent_sampling
will send all spans to the Datadog Agent, but only the percentage of traces configured by the sampler
will be forwarded to Datadog. This means that your APM view will be accurate, but it will incur performance and resource usage costs for both the router and Datadog Agent to send and receive all spans.
If your use case allows your APM view to show only a subset of traces, then you can set preview_datadog_agent_sampling
to false
. You should alternatively rely on OTLP metrics to gain insight into the router's performance.
- The router doesn't support
in-agent
ingestion control. - Configuring
traces_per_second
in the Datadog Agent will not dynamically adjust the router's sampling rate to meet the target rate.