Sending Metrics to GraphOS

Learn how to report operation and field usage metrics to GraphOS


GraphOS Studio offers visualizations of metrics like request rate, latency, and more to help you analyze your supergraph's performance. Studio also lets you analyze how clients are using individual fields in your GraphQL requests. To analyze operation metrics and field usage in Studio, you first need to report them to GraphOS.


If your organization doesn't currently have an Enterprise plan, you can test it out by signing up for a free GraphOS trial
.

Reporting operation metrics

How you report operation metrics to GraphOS depends on whether you're using Apollo Router or Apollo Server or whether you're using a third-party server.

From the Apollo Router or Apollo Server

Both the Apollo Router and Apollo Server use the same mechanism to enable operation metrics reporting to GraphOS:

  1. Obtain a graph API key from GraphOS Studio.

    How to obtain a graph API key
    caution
    API keys are secret credentials. Never share them outside your organization or commit them to version control. Delete and replace API keys that you believe are compromised.
    1. Go to studio.apollographql.com
      and click the graph you want to obtain an API key for.
    2. If a Publish your Schema dialog appears, copy the protected value that appears after APOLLO_KEY= in the example code block (it begins with service:), and you're all set.Otherwise, proceed to the next step.
    3. Open your graph's Settings page and select the API Keys tab. Click Create New Key. Give your key a name, such as Production. This helps you keep track of each API key's use.
      note
      If you don't see the API Keys tab, you don't have sufficient permissions for your graph. Only organization members with the Org Admin or Graph Admin role can manage graph API keys. Learn more about member roles.
    4. Copy the key's value. For security, you cannot view an API key's value in Studio after creating it.
    note
    The graph API key you use must have at least the Contributor role to report metrics for a non-protected variant, and either an Org Admin or Graph Admin role to report metrics for a protected variant.
  2. Obtain the graph ref for the graph and variant you want to report metrics for. You can find your variant's graph ref at the top of the variant's README page in Studio. It has the format graph-id@variant-name (such as my-graph@staging).

  3. Use the obtained values to set the following environment variables in your environment before starting up your router/server:

    Bash
    export APOLLO_KEY=<YOUR_GRAPH_API_KEY>
    export APOLLO_GRAPH_REF=<YOUR_GRAPH_REF>
note
Consult your production environment's documentation to learn how to set its environment variables.

Now, when your router or server starts up, it automatically begins reporting operation metrics to GraphOS.

From a third-party server (advanced)

You can set up a reporting agent in your GraphQL server to push metrics to GraphOS. The agent is responsible for:

  • Translating operation details into the correct reporting format

  • Implementing a default signature function to identify each executed operation

  • Emitting batches of traces and metrics to the GraphOS reporting endpoint

  • Optionally defining plugins to enable advanced reporting features

Apollo Server defines its agent for performing these tasks in the usage reporting plugin

.

note
If you're interested in collaborating with Apollo on creating a dedicated integration for your GraphQL server, please contact us at support@apollographql.com.

Reporting format

The GraphOS reporting endpoint accepts batches of traces and metrics that are encoded in protocol buffer format. Each trace corresponds to the execution of a single GraphQL operation, including a breakdown of the timing and error information for each field that's resolved as part of the operation. The schema for this protocol buffer is defined as the Report message in the protobuf schema

.

The protobuf schema

document describes how to create a report whose tracesPerQuery objects consist solely of a list of detailed execution traces in the trace array. GraphOS now allows your server to describe usage as a mix of detailed execution traces and pre-aggregated metrics (released in Apollo Server 2.24), which leads to much more efficient reports. This document doesn't describe how to generate these metrics. Nor does it describe how to report the number of requests for a particular operation shown in the Clients & Operations table on the Insights page.

note
We strongly encourage developers to contact Apollo support at support@apollographql.com to discuss their use case before building their own reporting agent using this module.

As a starting point, we recommend implementing an extension to the GraphQL execution that creates a report with a single trace, as defined in the Trace message of the protobuf schema

. Then, you can batch multiple traces into a single report. We recommend sending batches approximately every 20 seconds and limiting each batch to a reasonable size (~4 MB).

Many server runtimes already support emitting tracing information as a GraphQL extension

. Such extensions are available for Node, Ruby
, Scala
, Java
, Elixir
, and .NET
. If you're working on adding metrics reporting functionality for one of these languages, reading through that tracing instrumentation is a good place to start. For other languages, we recommend consulting the Apollo Server usage reporting plugin.

Operation signing

For GraphOS Studio to correctly group GraphQL queries, your reporting agent should define a function to generate an operation signature for each distinct operation. This can be challenging because two structurally different operations can be functionally equivalent. For instance, all the following queries request the same information:

GraphQL
1query AuthorForPost($foo: String!) {
2  post(id: $foo) {
3    author
4  }
5}
6
7query AuthorForPost($bar: String!) {
8  post(id: $bar) {
9    author
10  }
11}
12
13query AuthorForPost {
14  post(id: "my-post-id") {
15    author
16  }
17}
18
19query AuthorForPost {
20  post(id: "my-post-id") {
21    writer: author
22  }
23}

It's important to decide how to group such queries when tracking metrics. The TypeScript reference implementation

does the following to every query before generating its signature to better group functionally equivalent operations:

  • Drop unused fragments and/or operations

  • Hide string literals

  • Ignore aliases

  • Sort the tree deterministically

  • Ignore differences in whitespace.

We recommend using the same default signature method for consistency across different server runtimes.

Sending metrics to the reporting endpoint

After your GraphQL server prepares a batch of traces, it should send them to the Studio reporting endpoint at the following URL:

Text
1https://usage-reporting.api.apollographql.com/api/ingress/traces

Each batch should be sent as an HTTP POST request. The body of the request can be one of the following:

  • A binary serialization of a Report message

  • A gzipped binary serialization of a Report message

To authenticate with Studio, each request must include either:

  • An X-Api-Key header with a valid API key for your graph

  • An authtoken cookie with a valid API key for your graph

Only graph-level API keys (starting with the prefix service:) are supported.

The request can also optionally include a Content-Type header with value application/protobuf, but this is not required.

caution
The reporting endpoint rejects reports that are older than 50 minutes. If you see an error like Rejecting report from service {your service} with skewed timestamp, ensure your traces are current and that your timestamp calculations are accurate.

For a reference implementation, see the sendReport() function in the TypeScript reference agent

.

Tuning reporting behavior

We recommend implementing retries with backoff when you encounter 5xx responses or networking errors when communicating with the reporting endpoint. Additionally, implement a shutdown hook to ensure you push all pending reports before your server initiates a stable shutdown.

Implementing additional reporting features

The reference TypeScript implementation includes several features that you might want to include in your implementation. All these features are implemented in the usage reporting plugin itself and are documented in the plugin's API reference.

For example, you can restrict which information is sent to GraphOS, particularly to avoid reporting personal data. Because personal data most commonly appears in variables and headers, the TypeScript agent offers options to sendVariablesValues and sendHeaders.

Reporting field usage metrics

Your GraphQL router or server can report one or both of the following field usage metrics:

  • Requests: How many times an operation that requests a particular field has been observed

  • Executions: How many times the resolver for a particular field has been executed

How you report these metrics to GraphOS depends on whether you're using the Apollo Router or Apollo Server.

From the Apollo Router

If you have a cloud or self-hosted supergraph, you only need to configure your router to send operation metrics to GraphOS, and field usage will be automatically reported. Subgraphs should not send any metrics to GraphOS directly. Instead, they can include trace data in their responses to the router. The router then includes that data in its own reports to GraphOS.

note
Subgraphs send field tracing and error data to the router using the federated tracing implementation. This means your subgraphs must support federated tracing and that it is enabled for your environment before you start seeing error details in GraphOS for a supergraph.

From Apollo Server

Apollo Server automatically reports field usage metrics as long as you follow these prerequisites:

  • You must first configure your server to send operation metrics to GraphOS.

  • To report requests:

    • Your GraphQL server must run Apollo Server 3.6 or later.

    • If you have a federated graph, your gateway must run Apollo Server 3.6 or later, but there are no requirements for your subgraphs.

  • To report executions:

    • Your GraphQL server can run any recent version of Apollo Server 2.x or 3.x.

    • If you have a federated graph, your subgraphs must support federated tracing. For compatible libraries, see the FEDERATED TRACING entry for libraries in this table.

note
If some of your subgraphs support federated tracing and others don't, only executions in compatible subgraphs are reported to Apollo.

Disabling execution metrics

In Apollo Server 3.6 and later, you can turn off field-level instrumentation for some or all operations by providing the fieldLevelInstrumentation option to ApolloServerPluginUsageReporting.

Turning off field-level instrumentation for a particular request has the following effects:

  • The request does not contribute to the "executions" statistic on the Insights page in Studio.

  • The request does not contribute to field-level execution timing hints that can be displayed in the GraphOS Studio Explorer and VS Code.

  • The request does not produce a trace that can be viewed in the Traces section of the Insights page in Studio.

These requests still contribute to most features of Studio, such as schema checks, the Insights page, and the "Request" metrics on the Insights page.

To turn off field-level instrumentation for all requests, pass () => false as the fieldLevelInstrumentation option:

TypeScript
1new ApolloServer({
2  plugins: [
3    ApolloServerPluginUsageReporting({
4      fieldLevelInstrumentation: () => false
5    })
6  ]
7  // ...
8});

If you do this, execution metrics do not appear on the Insights page.

Fractional sampling

You can enable field-level instrumentation for a fixed fraction of all requests by passing a number between 0 and 1 as the fieldLevelInstrumentation option:

TypeScript
1new ApolloServer({
2  plugins: [
3    ApolloServerPluginUsageReporting({
4      fieldLevelInstrumentation: 0.01
5    })
6  ]
7  // ...
8});

If you do so, Apollo Server randomly chooses to enable field-level instrumentation for each request according to the given probability.

caution
Make sure to pass a number (like 0.01), not a function that always returns the same number (like () => 0.01), which has a different effect.

In this case, whenever field-level instrumentation is enabled for a particular request, Apollo Server reports it to Studio with a weight based on the given probability. The "executions" statistic on the Insights page (along with execution timing hints) is scaled by this weight.

For example, if you pass 0.01, your server enables field-level execution for approximately 1% of requests, and every observed execution is counted as 100 executions on the Insights page. (The actual observed execution count is available in a tooltip in the table.)

Custom sampling

You can decide whether to enable field-level instrumentation (and what the weight should be) on a per-operation basis by passing a function as the value of fieldLevelInstrumentation.

For example, you might want to enable field-level instrumentation more often for rare operations and less often for common operations. For details, see the usage reporting plugin docs.

Performance considerations

Calculating execution metrics can affect performance for large queries or high-traffic graphs. This is especially true for federated graphs because a subgraph includes each operation's full trace data in its response to the gateway.

Limitations

GraphOS enforces the following limitations to optimize insights performance.

Character limitation

GraphOS began enforcing a 256-character limit on the following data attributes in October 2024:

  • Operation name

  • Client name

  • Client version

  • Parent type

  • Field name

Any data exceeding this limit will be truncated before it is stored.

Cardinality limitation

GraphOS began enforcing cardinality limits on the following data attributes in December 2024:

  • Operation shape (a combination of the operation name and selection set)

  • Client name

  • Client version

If the cardinality of these metrics exceeds predefined thresholds, data may be redacted. Redacted values are replaced with # CardinalityLimitExceeded instead of their actual values.

Cardinality remediation

To remediate cardinality issues, you should find the source of your high-cardinality usage reports. Some examples of issues that can cause high cardinality include:

  • Bot traffic

  • Malformed operations

  • Uniquely named or autogenerated operations or client identifiers

Please email support@apollographql.com for further assistance.

Feedback

Forums