Upgrading from Versions 1.x

Upgrade from version 1.x to 2.x of GraphOS Router


Learn how to upgrade your GraphOS Router deployment from version 1.x to 2.x.

note
GraphOS Router 2.x is available as a Developer Preview. The most recent version is 2.0.0-preview.6. This upgrade guide will be updated throughout the Developer Preview and will have additions after each release.

New configuration in GraphOS Router v2.0.0-preview.x

The router v2.x introduces some new features that can impact the configuration from v1.x.

Apollo operation usage reporting via OTLP

The router supports reporting operation usage metrics to GraphOS via OpenTelemetry Protocol (OTLP).

Prior to version 1.49.0 of the router, all GraphOS reporting was performed using a private tracing format. In v1.49.0, we introduced support for using OTel to perform this reporting. In v1.x, this is controlled using the experimental_otlp_tracing_sampler flag, and it's off by default.

Now in v2.x, this flag is renamed to otlp_tracing_sampler, and it's enabled by default.

Learn more about configuring usage reporting via OTLP.

Metrics reporting defaults

Default values of some GraphOS reporting metrics have been changed from v1.x to the following in v2.x:

  • telemetry.apollo.signature_normalization_algorithm now defaults to enhanced. (In v1.x the default is legacy.)

  • telemetry.apollo.metrics_reference_mode now defaults to extended. (In v1.x the default is standard.)

Removed features in GraphOS Router v2.x

Support for Scaffold

In Router v1.x Scaffold could be used to generate boilerplate source code for a Rust Plugin which extended functionality in a custom Router. This facility has been removed.

Source code generated using Scaffold will continue to compile; so existing Rust plugins will be unaffected by this change.

The router no longer uses the --apollo-uplink-poll-interval command-line argument, or the APOLLO_UPLINK_POLL_INTERVAL environment variable. It already did something unusual in v1.x: the configured interval would only be used for the very first poll, and not any subsequent polls.

Hot reloading --supergraph-urls, APOLLO_ROUTER_SUPERGRAPH_URLS

The --supergraph-urls command-line argument or the APOLLO_ROUTER_SUPERGRAPH_URLS environment variable no longer support hot reloading. In v1.x, if hot reloading was enabled, the router would repeatedly fetch the URLs on the interval specified by --apollo-uplink-poll-interval.

If hot reloading from a remote URL is desired, consider running a script to download the supergraph URL on an interval, and point the router to the downloaded file on the filesystem.

--schema

The deprecated --schema command-line argument is removed. router config schema should be used to print the configuration supergraph instead.

Busy timer for request processing duration

Request processing duration is already included as part of the spans sent by the router, and there is no need for rust plugin, coprocessor and rhai plugin users to track this information manually. As a result, the following methods and structs were removed from context::Context:

  • context::Context::busy_time()

  • context::Context::enter_active_request()

  • context::BusyTimer struct

  • context::BusyTimerGuard struct

Deprecated methods in rust plugins

The following deprecated methods are removed from rust plugins' API:

  • services::router::Response::map()

  • router::event::schema::SchemaSource::File.delay field

  • router::event::configuration::Configuration::File.delay field

  • context::extensions::sync::ExtensionsMutex::lock(). Use ExtensionsMutex::with_lock() instead.

  • test_harness::TestHarness::build(). Use TestHarness::build_supergraph() instead.

  • PluginInit::new(). Use PluginInit::builder() instead.

  • PluginInit::try_new(). Use PluginInit::try_builder() instead.

Upgrading to GraphOS Router v2.x

Upgrading from GraphOS Router v1.x to v2.x requires the following steps.

Configure your Apollo usage reporting protocol

If your router v1.x is configured with experimental_otlp_tracing_sampler, rename it to otlp_tracing_sampler.

To report operation usage metrics to GraphOS via OTLP, you can either remove your configuration of otlp_tracing_sampler and use its default value, or you can explicitly enable it by setting it to always_on or a sampling ratio.

Otherwise, to use the behavior of v1.x, turn off usage reporting via OTLP by setting otlp_tracing_sampler to always_off.

Configure metrics reporting defaults

If you plan to use the new metrics reporting defaults, then no changes are required.

Otherwise, to use the defaults from v1.x, set the following:

YAML
1telemetry:
2  apollo:
3    signature_normalization_algorithm: legacy
4    metrics_reference_mode: standard

Configure supergraph endpoint path

If you use the default endpoint path or your path has no matched parameters or wildcards, then no changes are required.

If your path uses named parameters, it is now bound with a braces syntax instead of a colon:

YAML
1supergraph:
2  # Previously:
3  # path: /foo/:bar/baz
4  path: /foo/{bar}/baz

If your path uses a wildcard, it must be wrapped in braces and must bind a name:

YAML
1supergraph:
2  # Previously:
3  # path: /foo/*
4  path: /foo/{*rest}

Logging

If you used the experimental_when_header feature previously like this for example:

YAML
router.previous.yaml
1telemetry:
2  exporters:
3    logging:
4      # If one of these headers matches we will log supergraph and subgraphs requests/responses
5      experimental_when_header: # REMOVED
6        - name: apollo-router-log-request
7          value: my_client
8          headers: true # default: false
9          body: true # default: false

This feature no longer exists and can be replaced with another one included in custom telemetry. Here is how you would configure custom telemetry to achieve the same result:

YAML
router.yaml
1telemetry:
2  instrumentation:
3    events:
4      router:
5        request: # Display router request log
6          level: info
7          condition:
8            eq:
9              - request_header: apollo-router-log-request
10              - my_client
11        response: # Display router response log
12          level: info
13          condition:
14            eq:
15              - request_header: apollo-router-log-request
16              - my_client
17      supergraph:
18        request: # Display supergraph request log
19          level: info
20          condition:
21            eq:
22              - request_header: apollo-router-log-request
23              - my_client
24        response:
25          level: info
26          condition:
27            eq:
28              - request_header: apollo-router-log-request
29              - my_client
30      subgraph:
31        request: # Display subgraph request log
32          level: info
33          condition:
34            eq:
35              - supergraph_request_header: apollo-router-log-request
36              - my_client
37        response: # Display subgraph response log
38          level: info
39          condition:
40            eq:
41              - supergraph_request_header: apollo-router-log-request
42              - my_client

Introspection depth limit

The schema-intropsection schema is recursive: a client can query the fields of the types of some other fields, and so on arbitrarily deep. This can produce responses that grow much faster than the size of the request.

To protect against abusive requests Router now refuses to execute introspection queries that nest list fields too deep and returns an error instead. The criteria matches MaxIntrospectionDepthRule in graphql-js, but may change in future versions.

In case it rejects legitimate queries, this check can be disabled in Router configuration:

YAML
1# Do not enable introspection in production!
2supergraph:
3  introspection: true # Without this, schema introspection is entirely disabled by default
4limits:
5  introspection_max_depth: false # Defaults to true

Traces

  • telemetry.instrumentation.spans.mode becomes spec_compliant by default instead of deprecated.

Renamed Metrics

Some of the older metrics in the router were not using the most recent . naming convention and were instead separated by _. For consistency purposes, the following metrics are renamed:

Previous metricRenamed metric
apollo_router_opened_subscriptionsapollo.router.opened.subscriptions
apollo_router_cache_hit_timeapollo.router.cache.hit.time
apollo_router_cache_sizeapollo.router.cache.size
apollo_router_cache_miss_timeapollo.router.cache.miss.time
apollo_router_state_change_totalapollo.router.state.change.total
apollo_router_span_lru_sizeapollo.router.exporter.span.lru.size $_1
apollo_router_session_count_activeapollo.router.session.count.active
apollo_router_uplink_fetch_count_totalapollo.router.uplink.fetch.count.total
apollo_router_uplink_fetch_duration_secondsapollo.router.uplink.fetch.duration.seconds

$_1 apollo.router.exporter.span.lru.size now also has an additional exporter prefix.

Removed metrics

The following metrics have been removed:

  • apollo_router_http_request_retry_total. This will be supported now by http.client.request.duration metrics with the http.request.resend_count attribute which is automatically set when default_requirement_level is set to recommended.

  • apollo_router_timeout. This metric conflated timed-out requests from client to the router, and requests from the router to subgraphs. Timed-out requests have HTTP status code 504. Use the http.response.status_code attribute on the http.server.request.duration metric to identify timed-out router requests, and the same attribute on the http.client.request.duration metric to identify timed-out subgraph requests.

  • apollo_router_http_requests_total. This is replaced by http.server.request.duration metric for requests from clients to router and http.client.request.duration for requests from router to subgraphs.

  • apollo_router_http_request_duration_seconds_bucket. This is replaced by http.server.request.duration metric for requests from clients to router and http.client.request.duration for requests from router to subgraphs.

  • apollo_router_session_count_total. This is replaced by http.server.active_requests.

  • apollo_require_authentication_failure_count. Use the http.server.request.duration metric's http.response.status_code attribute. Requests with authentication failures have HTTP status code 401.

  • apollo_authentication_failure_count. Use the apollo.router.operations.authentication.jwt metric's authentication.jwt.failed attribute.

  • apollo_authentication_success_count. Use the apollo.router.operations.authentication.jwt metric instead. If the authentication.jwt.failed attribute is absent or false, the authentication succeeded.

  • apollo_router_deduplicated_subscriptions_total. Use the apollo.router.operations.subscriptions metric's subscriptions.deduplicated attribute

  • apollo_router_cache_miss_count. Cache miss count can be derived from apollo.router.cache.miss.time.count.

  • apollo_router_cache_hit_count. Cache hit count can be derived from apollo.router.cache.hit.time.count.

  • Calculating the overhead of injecting the router into your service stack when making multiple downstream calls is a complex task. With that, we are removing the following two misleading metrics and recommending that you instead test your workloads with the router to if the latency meets your requirements:

    • apollo_router_span

    • apollo_router_processing_time

  • If you used the subgraph_response_body selector for telemetry like this:

YAML
1telemetry:
2  instrumentation:
3    instruments:
4      subgraph:
5        http.client.request.duration:
6          attributes:
7            http.response.status_code:
8              subgraph_response_status: code
9            my_data_value:
10              subgraph_response_body: .data.test

You'll have to migrate the subgraph_response_body to subgraph_response_data selector (or subgraph_response_errors if you want to target errors):

YAML
1telemetry:
2  instrumentation:
3    instruments:
4      subgraph:
5        http.client.request.duration:
6          attributes:
7            http.response.status_code:
8              subgraph_response_status: code
9            my_data_value:
10              subgraph_response_data: $.test # Heads up, we removed the .data prefix as it's directly targeting data and we added $

Context Keys

The router request context is used to share data across stages of the request pipeline. The keys have been renamed for consistency and to better indicate which pipeline stage or plugin populates the data. If you access context entries in a custom plugin, Rhai script, coprocessor, or telemetry selector, update your context keys to account for the new names:

  • apollo_authentication::JWT::claims -> apollo::authentication::jwt_claims

  • apollo_authorization::authenticated::required -> apollo::authorization::authentication_required

  • apollo_authorization::scopes::required -> apollo::authorization::required_scopes

  • apollo_authorization::policies::required -> apollo::authorization::required_policies

  • apollo_operation_id -> apollo::supergraph::operation_id

  • apollo_override::unresolved_labels -> apollo::progressive_override::unresolved_labels

  • apollo_override::labels_to_override -> apollo::progressive_override::labels_to_override

  • apollo_router::supergraph::first_event -> apollo::supergraph::first_event

  • apollo_telemetry::client_name -> apollo::telemetry::client_name

  • apollo_telemetry::client_version -> apollo::telemetry::client_version

  • apollo_telemetry::studio::exclude -> apollo::telemetry::studio_exclude

  • apollo_telemetry::subgraph_ftv1 -> apollo::telemetry::subgraph_ftv1

  • cost.actual -> apollo::demand_control::actual_cost

  • cost.estimated -> apollo::demand_control::estimated_cost

  • cost.result -> apollo::demand_control::result

  • cost.strategy -> apollo::demand_control::strategy

  • experimental::expose_query_plan.enabled -> apollo::expose_query_plan::enabled

  • experimental::expose_query_plan.formatted_plan -> apollo::expose_query_plan::formatted_plan

  • experimental::expose_query_plan.plan -> apollo::expose_query_plan::plan

  • operation_kind -> apollo::supergraph::operation_kind

  • operation_name -> apollo::supergraph::operation_name

  • persisted_query_hit -> apollo::apq::cache_hit

  • persisted_query_register -> apollo::apq::registered

Tracing

jaeger exporter has been removed as Jaeger fully supports to OTLP format. To use the otlp exporter change your router config:

YAML
router.yaml
1telemetry:
2  exporters:
3    tracing:
4      propagation:
5        jaeger: true
6      otlp:
7        enabled: true

And ensure that you have enabled OTLP support in your Jaeger instance using COLLECTOR_OTLP_ENABLED=true and exposing ports 4317 and or 4318 for gRPC and HTTP respectively.

Check CORS configuration

If you already configured CORS on the router, the configuration won't change but with v1.x it accepted bad configuration and displayed an error log when some header names or regexes were invalid. With v2.x it will end up with an error and it won't start at all to avoid confusions and misconfigurations.

Headers propagation

If you use the ability to get data from request body and insert this data in a subgraph request header, we updated the path field type to be JSONPath compliant So if you had this configuration for example:

YAML
1headers:
2  all:
3    request:
4      - insert:
5          name: from_app_name
6          path: .extensions.metadata[0].app_name

You'll have to migrate the path field to be JSONPath compliant and just add $ at the beginning of the path, example:

YAML
1headers:
2  all:
3    request:
4      - insert:
5          name: from_app_name
6          path: $.extensions.metadata[0].app_name

OneShotAsyncCheckpoint

This has been removed because it is incompatible with the architectural optimizations we've made in Router 2.0. If you were using this in a Rust Plugin, you can replace it easily with AsyncCheckpoint as follows:

Plugin code using OneShotAsyncCheckpoint

Text
1            OneShotAsyncCheckpointLayer::new(move |request: execution::Request| {
2                let request_config = request_config.clone();
3                <etc...>
4            }
5            .service(service)
6            .boxed()

Replace with AsyncCheckpoint

Text
1            AsyncCheckpointLayer::new(move |request: execution::Request| {
2                let request_config = request_config.clone();
3                <etc...>
4            }
5            .buffered()
6            .service(service)
7            .boxed()

The buffered() method is provided by the apollo_router::layers::ServiceBuilderExt trait and ensures that you service may be cloned.

Traffic Shaping

Traffic shaping has been improved significantly in Router 2.0. We've added a new mechanism, concurrency control, and we've improved the router's ability to observe timeout and traffic shaping restrictions correctly. These improvements do mean that clients of the router may see an increase in:

errors as traffic shaping constraints are enforced.

We recommend that users experiment with their configuration in order to arrive at the right combination of timeout, concurrency and rate limit controls for their particular use case.

For more information about configuring traffic shaping.

Subgraph Request/Response

If, in your Rust plugin, you were using either the subgraph::Request::subgraph_name or the subgraph::Response::subgraph_name, please note that these are no longer Option, but are mandatory.

Emitting metrics in Rust plugins

Previously, Rust plugins could use the Router's internal metrics system using the tracing macros. This is no longer supported, and the opentelemetry crates should be used instead.

tracing field names starting with these strings are no longer treated specially:

  • counter.

  • histogram.

  • monotonic_counter.

  • value.

Instead, use the OpenTelemetry crates. You can use the new apollo_router::metrics::meter_provider() API to access the router's global meter provider to register your instruments.

Deploy your router

Make sure that you are referencing the correct router release: 2.0.0-preview.6

During upgrade, carefully monitor logs and resource consumption to ensure that your router has successfully upgraded and that your router has enough resources to perform as expected. The router will automatically migrate required configuration changes, but it is good practice to review the steps above and decide if you want to change your configuration.

For example, you may decide that you wish to change the way in which you sample OTLP traces in this new release and not simply preserve your existing configuration.

Reporting upgrade issues

If you encounter an upgrade issue that isn't resolved by this article, please search for existing GitHub discussions and start a new discussion if you don't find what you're looking for.

Feedback

Edit on GitHub

Forums