Error Diagnostics in GraphOS

Understand where and when errors are happening in your graph


Errors can occur in different parts of your graph. They might originate in a subgraph or Connector or during communication between your router and upstream services. Errors may also arise due to security features such as safelisting, demand control, operation limits, and authentication. Pinpointing the location and type of errors is essential for effective diagnosis and resolution.

GraphOS Studio includes tools to help developers identify and classify errors across your graph. You view error insights from a variant's Insights page in the Errors tab.

Error insights overview
note
The error insights view above includes new categorization features that are gradually being rolled out to Studio organizations. If your organization doesn't have access yet and you'd like it sooner, please Contact Support through the top right dropdown in Studio or email support@apollographql.org.

Common use cases

GraphOS Studio's error insights help you detect and analyze issues in your graph. Use them to:

  • Spot trends in error behavior: Viewing error metrics over time can help reveal downtime (spikes in errors) or other unexpected health problems in your graph.

  • Diagnose service-specific issues: By grouping your graph's metrics by underlying service, you can identify unhealthy subgraphs and Connectors. Grouping by error code reveals what kind of errors are happening. Reviewing representative traces for the error can expose execution path issues.

    • Enterprise organizations can navigate from error metrics to representative traces to debug individual requests with errors. Traces help you identify specific situations where errors occur, surfacing details such as specific error messages and execution pathways leading to the error. Learn more.

  • Diagnose client-specific issues: By filtering your graph's metrics by client (and client version with Enterprise), you can identify when a specific client contributes or is responsible for a high failure rate for an operation. This information can help you isolate the underlying cause and push an update for the affected client.

Error categorization

The errors insights histogram visualizes the number of errors over time. You can group and filter error metrics by the dimensions available to GraphOS.

Error insights grouping dropdown

Depending on your graph and how you've configured it to report metrics, these dimensions are:

  • Path: The schema location where a specific error occurred

  • Service: The underlying subgraph or Connector the error originated from

  • (Error) Code: The specific error type

    note
    If the code dimension's only value is unknown, it means either code isn't available to your graph, or extended error reporting hasn't been enabled.

Metric dimensions availability

The dimensions available for grouping and filtering error metrics depend on your graph type:

Graph TypeAvailable Dimensions
MonographsOnly the path dimension is available for grouping and filtering.
Supergraphs
(using GraphOS Router or @apollo/gateway)
Both path and service dimensions are available.
Supergraphs
(using GraphOS Router v2.1.2 or later)
Path, service are available, enable extended error reporting to access code.

Setup

To take advantage of GraphOS error insights, ensure your router or server is properly set up to report error data.

From the Apollo Router

Once your cloud or self-hosted router is configured to send operation metrics to GraphOS, error metrics are automatically reported.

Extended error reportingSince 2.1.2

preview
This feature is in preview. Your questions and feedback are highly valueddon't hesitate to get in touch with your Apollo contact.

You can enable richer error reporting via preview_extended_error_metrics and redaction_policy router configurations:

YAML
router.yaml
1telemetry:
2  apollo:
3    errors:
4      preview_extended_error_metrics: enabled # (default: disabled)
5      subgraph:
6        all:
7          # By default, subgraphs should report errors to GraphOS
8          send: true # (default: true)
9          redaction_policy: extended # (default: strict)
10        subgraphs:
11          account: # Override the default behavior for the "account" subgraph
12            send: false # Optional - excludes reporting of errors for this subgraph

When enabled, the router sends metrics to Studio with additional attributes, including the service and code found in GraphQL error extensions (errors[].extensions.service and errors[].extensions.code, respectively).

For additional configuration options, including extended error reporting details, see the router guide for errors.

note
Subgraphs shouldn't send any metrics to GraphOS directly. Instead, they can include trace data in their responses to the router. The router then includes that data in its own reports to GraphOS. That means to see subgraph errors in traces, your subgraphs must support federated tracing, and that it must be enabled in your environment.

From Apollo Server

Apollo Server automatically reports error metrics as long as you follow these prerequisites:

  • You must first configure your server to send operation metrics to GraphOS.

  • To report errors:

    • Your GraphQL server must run Apollo Server v3.6 or later.

    • If you have a federated graph, your gateway must run Apollo Server v3.6 or later, but there are no requirements for your subgraphs.

  • To report trace-level error details:

    • Your GraphQL server can run any recent version of Apollo Server 2.x or 3.x.

    • If you have a federated graph, your subgraphs must support federated tracing. See the federated tracing item in this table to check for compatible libraries.

note
If some of your subgraphs support federated tracing and others don't, only executions in compatible subgraphs will have error details reported to Apollo.
Feedback

Forums