Deployment Best Practices
Best practices and workflows for deploying with managed federation
When rolling out changes to a subgraph, we recommend the following workflow:
Confirm the backward compatibility of each change by running
rover subgraph check
in your CI pipeline.Merge backward compatible changes that successfully pass schema checks.
Deploy changes to the subgraph in your infrastructure.
Wait until all replicas finish deploying.
Run
rover subgraph publish
to update your managed federation configuration:
1rover subgraph publish my-supergraph@my-variant \
2 --schema ./accounts/schema.graphql \
3 --name accounts \
4 --routing-url https://my-running-subgraph.com/api
Pushing configuration updates safely
Whenever possible, you should update your subgraph configuration in a way that is backward compatible to avoid downtime. As suggested above, the best way to do this is to run rover subgraph check
before updating. You should also generally seek to minimize the number of breaking changes you make to your schemas.
Additionally, call rover subgraph publish
for a subgraph only after all replicas of that subgraph are deployed. This ensures that resolvers are in place for all operations that are executable against your graph, and operations can't attempt to access fields that do not yet exist.
In the rare case where a configuration change is not backward compatible with your router's query planner, you should update your registered subgraph schemas before you deploy your updated code.
You should also perform configuration updates that affect query planning prior to (and separately from) other changes. This helps avoid a scenario where the query planner generates queries that fail validation in downstream services or violate your resolvers.
Examples of this include:
Modifying
@key
,@requires
, or@provides
directivesRemoving a type implementation from an interface
In general, always exercise caution when pushing configuration changes that affect your router's query planner, and consider how those changes will affect your other subgraphs.
Example scenario
Let's say we define a Channel
interface in one subgraph, and we define types that implement Channel
in two other subgraphs:
1# channel subgraph
2interface Channel @key(fields: "id") {
3 id: ID!
4}
5
6# web subgraph
7type WebChannel implements Channel @key(fields: "id") {
8 id: ID!
9 webHook: String!
10}
11
12# email subgraph
13type EmailChannel implements Channel @key(fields: "id") {
14 id: ID!
15 emailAddress: String!
16}
To safely remove the EmailChannel
type from your supergraph schema:
Perform a
rover subgraph publish
of theemail
subgraph that removes theEmailChannel
type from its schema.Deploy a new version of the subgraph that removes the
EmailChannel
type.
The first step causes the query planner to stop sending fragments ...on EmailChannel
, which would fail validation if sent to a subgraph that isn't aware of the type.
If you want to keep EmailType
but remove it from the Channel
interface, the process is similar. Instead of removing the EmailChannel
type altogether, only remove the implements Channel
addendum to the type definition. This is because the query planner expands queries to interfaces or unions into fragments on their implementing types.
For example, a query such as...
1query FindChannel($id: ID!) {
2 channel(id: $id) {
3 id
4 }
5}
...generates two queries, one to each subgraph, like so:
1# Generated by the query planner
2
3# To email subgraph
4query {
5 _entities(...) {
6 ...on EmailChannel {
7 id
8 }
9 }
10}
11
12# To web subgraph
13query {
14 _entities(...) {
15 ...on WebChannel {
16 id
17 }
18 }
19}
Currently, the router expands all interfaces into implementing types.
Removing a subgraph
To "de-register" a subgraph with Apollo, call rover subgraph delete
:
1rover subgraph delete my-supergraph@my-variant --name accounts
The next time it starts up or polls, your router obtains an updated configuration that reflects the removed subgraph.
Advanced deployment workflows
With managed federation, you can control which version of your schema your router fleet uses. In most cases, rolling over all of your router instances to a new schema version is safe, assuming you've used schema checks to confirm that your changes are backward compatible. Your deployment model, however, may require an advanced workflow to deploy a specific version of a schema.
Two types of advanced deployment workflows:
Blue-green deployment workflow. For deployments that require progressive rollout, such as blue-green deployments, you can configure your environments to refer to a single graph variant by pinning each environment's supergraph schema to your routers at deployment time. Using a single variant between different production environments enables GraphOS Studio to get usage reports and analyze the combined production traffic of all environments, as well as providing a consistent changelog of your schema over time.
Graph variant workflow. Changes at the router level might involve a variety of different updates, such as migrating entities from one subgraph to another. If your infrastructure requires a more advanced deployment process to handle the different router updates, you can use graph variants to manage router fleets running with different configurations.
A common use for graph variants is contracts, for example, to create separate contract variants for the public and private APIs of a supergraph schema.
Example blue-green deployment
A blue-green deployment strategy uses two environments: one environment (blue) serves the schema variant for live traffic, and the other environment (green) uses a variant for a new release that's under development. When the new release is ready, traffic is migrated from the blue to the green environment. This cycle repeats with each new release.
As an example, follow these steps to deploy with a supergraph schema of a new release (green) environment; the example uses the GraphOS Platform API to perform custom GraphOS actions:
Publish all the release's subgraphs at once using the Platform API
publishSubgraphs
mutation.GraphQL1## Publish multiple subgraphs together in a batch 2## and retrieve the associated launch, along with any downstream launches synchronously. 3mutation PublishSubgraphsMutation( 4 $graphId: ID! 5 $graphVariant: String! 6 $revision: String! 7 $subgraphInputs: [PublishSubgraphsSubgraphInput!]! 8) { 9 graph(id: $graphId) { 10 publishSubgraphs( 11 graphVariant: $graphVariant 12 revision: $revision 13 subgraphInputs: $subgraphInputs 14 downstreamLaunchInitiation: "SYNC" 15 ) { 16 launch { 17 id 18 downstreamLaunches { 19 id 20 graphVariant 21 status 22 } 23 } 24 } 25 } 26}
This initiates a launch, as well as any downstream launches necessary for contracts. It returns the launch IDs, with downstream launch IDs configured to return synchronously (downstreamLaunchInitiation: "SYNC"
) with the mutation.
downstreamLaunches { graphVariant }
. When querying for a specific launch, be sure to pass the variant associated with the launch in the following steps.Poll for the completed launch and any downstream launches.
GraphQL1## Poll for the status of any individual launch by ID 2query PollLaunchStatusQuery($graphId: ID!, $graphVariant: String!, $launchId: ID!) { 3 graph(id: $graphId) { 4 variant(name: $graphVariant) { 5 launch(id: $launchId) { 6 status 7 } 8 } 9 } 10} 11
ⓘ noteWhen polling for a contract, the$graphVariant
argument of this query must refer to the contract variant rather than the base variant. You can get it from the query in step 1, fromLaunch.graphVariant / downstreamLaunches { graphVariant }
.After the launch and downstream launches have completed, retrieve the supergraph schema of the launch.
GraphQL1## Fetch the supergraph SDL by launch ID. 2query FetchSupergraphSDLQuery($graphId: ID!, $graphVariant: String!, $launchId: ID!) { 3 graph(id: $graphId) { 4 variant(name: $graphVariant) { 5 launch(id: $launchId) { 6 build { 7 result { 8 ... on BuildSuccess { 9 coreSchema { 10 coreDocument 11 } 12 } 13 } 14 } 15 } 16 } 17 } 18} 19
ⓘ noteWhen retrieving for a contract, the$graphVariant
argument of this query must refer to a contract variant. You can get it from the query in step 1, fromLaunch.graphVariant / downstreamLaunches { graphVariant }
.Deploy your routers with the
-s
or--supergraph
option to specify the supergraph schema.Specifying the
-s
or--supergraph
option disables polling for the schema from Uplink.For an example using the option in a
docker run
command, see Specifying the supergraph.
If you need to roll back to a previous blue-green deployment, ensure the previous deployment is available and shift traffic back to the previous deployment.
A router image must use an embedded supergraph schema via the
--supergraph
flag.A deployment should include both router and subgraphs to ensure resolvers and schemas are compatible.
If a previous deployment can't be redeployed, repeat steps 3 and 4 with the
launchID
you want to roll back to. Ensure the deployed subgraphs are compatible with the supergraph schema, then redeploy the router with a newly fetched supergraph schema for your targetlaunchID
. Before considering only rolling back the supergraph schema, see its caveats.
Example canary deployment
A canary deployment applies graph updates in an environment separate from a live production environment and validates its updates starting with a small subset of production traffic. As updates are validated in the canary deployment, more production traffic is routed to it gradually until it handles all traffic.
To configure your canary deployment, you can fetch the supergraph schema for a launchID for the canary deployment, then have that canary deployment report metrics to a prod
variant. Similar to the blue-green deployment example, your canary deployment is pinned to the same graph variant as your other, live deployment, so metrics from both deployments are reported to the same graph variant. As your canary deployment is scaled up, it will eventually become the stable deployment serving all production traffic, so we want that deployment reporting to the live prod
variant.
To configure a canary deployment for the prod
graph variant:
Publish all the canary deployment's subgraphs at once using the Platform API
publishSubgraphs
mutation.GraphQL1## Publish multiple subgraphs together in a batch 2## and retrieve the associated launch, along with any downstream launches synchronously. 3mutation PublishSubgraphsMutation( 4 $graphId: ID! 5 $graphVariant: String! 6 $revision: String! 7 $subgraphInputs: [PublishSubgraphsSubgraphInput!]! 8) { 9 graph(id: $graphId) { 10 publishSubgraphs( 11 graphVariant: "prod" ## name of production variant 12 revision: $revision 13 subgraphInputs: $subgraphInputs 14 downstreamLaunchInitiation: "SYNC" 15 ) { 16 launch { 17 id 18 downstreamLaunches { 19 id 20 graphVariant 21 status 22 } 23 } 24 } 25 } 26}
This initiates a launch, as well as any downstream launches necessary for contracts. It returns the launch IDs, with downstream launch IDs configured to return synchronously (downstreamLaunchInitiation: "SYNC"
) with the mutation.
downstreamLaunches { graphVariant }
. When querying for a specific launch, be sure to pass the variant associated with the launch in the following steps.Poll for the completed launch and any downstream launches.
GraphQL1## Poll for the status of any individual launch by ID 2query PollLaunchStatusQuery($graphId: ID!, $graphVariant: String!, $launchId: ID!) { 3 graph(id: $graphId) { 4 variant(name: $graphVariant) { 5 launch(id: $launchId) { 6 status 7 } 8 } 9 } 10} 11
ⓘ noteWhen polling for a contract, the$graphVariant
argument of this query must refer to the contract variant rather than the base variant. You can get it from the query in step 1, fromLaunch.graphVariant / downstreamLaunches { graphVariant }
.After the launch and downstream launches have completed, retrieve the supergraph schema of the launch.
GraphQL1## Fetch the supergraph SDL by launch ID. 2query FetchSupergraphSDLQuery($graphId: ID!, $graphVariant: String!, $launchId: ID!) { 3 graph(id: $graphId) { 4 variant(name: $graphVariant) { 5 launch(id: $launchId) { 6 build { 7 result { 8 ... on BuildSuccess { 9 coreSchema { 10 coreDocument 11 } 12 } 13 } 14 } 15 } 16 } 17 } 18} 19
ⓘ noteWhen retrieving for a contract, the$graphVariant
argument of this query must refer to a contract variant. You can get it from the query in step 1, fromLaunch.graphVariant / downstreamLaunches { graphVariant }
.Deploy your routers with the
-s
or--supergraph
option to specify the supergraph schema.Specifying the
-s
or--supergraph
option disables polling for the schema from Uplink.For an example using the option in a
docker run
command, see Specifying the supergraph.
If you need to roll back, ensure the previous deployment is available and shift traffic back to the live deployment.
A router image must use an embedded supergraph schema via the
--supergraph
flag.A deployment should include both router and subgraphs to ensure resolvers and schemas are compatible.
If a previous deployment can't be redeployed, repeat steps 3 and 4 with the
launchID
you want to roll back to. Ensure the deployed subgraphs are compatible with the supergraph schema, then redeploy the router with a newly fetched supergraph schema for your targetlaunchID
. Before considering only rolling back the supergraph schema, see its caveats.
With your canary deployment reporting metrics to GraphOS, you can use GraphOS Studio to verify a canary's performance before rolling out changes to the rest of the graph.
Modifying query-planning logic
Treat migrations of your query-planning logic similarly to how you treat database migrations. Carefully consider the effects on downstream services as the query planner changes, and plan for "double reading" as appropriate.
Consider the following example of a Products
subgraph and a Reviews
subgraph:
1# Products subgraph
2
3type Product @key(fields: "upc") {
4 upc: ID!
5 nameLowerCase: String!
6}
7
8# Reviews subgraph
9
10type Product @key(fields: "upc") {
11 upc: ID!
12 reviews: [Review]! @requires(fields: "nameLowercase")
13 nameLowercase: String! @external
14}
Let's say we want to deprecate the nameLowercase
field and replace it with the name
field, like so:
1# Products subgraph
2
3type Product @key(fields: "upc") {
4 upc: ID!
5 nameLowerCase: String! @deprecated
6 name: String!
7}
8
9# Reviews subgraph
10
11type Product @key(fields: "upc") {
12 upc: ID!
13 nameLowercase: String! @external
14 name: String! @external
15 reviews: [Review]! @requires(fields: "name")
16}
To perform this migration in-place:
Modify the
Products
subgraph to add the new field. (As usual, first deploy all replicas, then userover subgraph publish
to push the new subgraph schema.)Deploy a new version of the
Reviews
subgraph with a resolver that accepts eithernameLowercase
orname
in the source object.Modify the Reviews subgraph's schema in the registry so that it
@requires(fields: "name")
.Deploy a new version of the
Reviews
subgraph with a resolver that only accepts thename
in its source object.
Alternatively, you can perform this operation with an atomic migration at the subgraph level, by modifying the subgraph's URL:
Modify the
Products
subgraph to add thename
field (as usual, first deploy all replicas, then userover subgraph publish
to push the new subgraph schema).Deploy a new set of
Reviews
replicas to a new URL that reads fromname
.Register the
Reviews
subgraph with the new URL and the schema changes above.
With this atomic strategy, the query planner resolves all outstanding requests to the old subgraph URL that relied on nameLowercase
with the old query-planning configuration, which @requires
the nameLowercase
field. All new requests are made to the new subgraph URL using the new query-planning configuration, which @requires
the name
field.
Reliability and security
Your router fetches its configuration by polling Apollo Uplink, an Apollo-hosted endpoint specifically for serving supergraph configs. In the event that your updated config is inaccessible due to an outage in Uplink, your router continues to serve its most recently fetched configuration.
If you restart a router instance or spin up a new instance during an Uplink outage, that instance can't fetch its configuration until Apollo resolves the outage.
The subgraph publish
lifecycle
Whenever you call rover subgraph publish
for a particular subgraph, it both updates that subgraph's registered schema and updates the router's managed configuration.
Because your graph is dynamically changing and multiple subgraphs might be updated simultaneously, it's possible for changes to cause composition errors, even if rover subgraph check
was successful. For this reason, updating a subgraph re-triggers composition in the cloud, ensuring that all subgraphs still compose to form a complete supergraph before updating the configuration. The workflow behind the scenes can be summed up as follows:
The subgraph schema is uploaded to Apollo and indexed.
The subgraph is updated in the registry to use its new schema.
All subgraphs are composed in the cloud to produce a new supergraph schema.
If composition fails, the command exits and emits errors.
If composition succeeds, Apollo Uplink begins serving the updated supergraph schema.
On the other side of the equation sits the router. The router can regularly poll Apollo Uplink for changes to its configuration. The lifecycle of dynamic configuration updates is as follows:
The router polls for updates to its configuration.
On update, the router downloads the updated configuration, including the new supergraph schema.
The router uses the new supergraph schema to update its query planning logic.
The router continues to resolve in-flight requests with the previous configuration, while using the updated configuration for all new requests.
Alternatively, instead of getting its configuration from Apollo Uplink, the router can specify a path to a supergraph schema upon its deployment. This static configuration is useful when you want the router to use a schema different than the latest validated schema from Uplink, or when you don't have connectivity to Apollo Uplink. For an example of this workflow, see an example of configuring the router for blue-green deployment.
Rolling back a deployment
When rolling back a deployment, you must ensure the supergraph schema and router version are compatible with the deployed subgraphs and subgraph schemas in the target environment, so all possible GraphQL operations can be successfully executed.
Roll forward to revert
Rollbacks are typically implemented by rolling forward to a new version that reverts the changes in the subgraph code repository, then performing the full release process (publishing the subgraph schema and rolling out the new code together) as outlined in the change management tech note. This ensures the supergraph schema exposed by the router matches the underlying subgraphs. It's the safest approach when using the standard schema delivery pipeline where Apollo Uplink provides the supergraph schema to the router for continuous deployment of new launches.
Roll back entire deployment
For blue-green deployment scenarios, where the router and subgraphs in a deployment have versioned Docker container images, you may be able to roll back the entire deployment (assuming no underlying database schema changes). Doing so ensures that the supergraph schema embedded in the router image is compatible with underlying subgraphs in the target environment. This kind of rollback is typically what happens when a blue-green deployment is aborted if post-promotion analysis fails.
Roll back supergraph schema only
In rare circumstances where a backwards compatible subgraph schema-only change is made (for example, setting progressive @override
percentage), it may be possible to only rollback the supergraph schema by pinning the router fleet to the supergraph schema for a specific launchID
using the --supergraph
flag.
This approach is only suitable for short term fixes for a limited set of schema-only changes. It requires the router to pin to a specific supergraph launchID
, as republishing the underlying subgraphs will result in a new supergraph schema being generated.
Given the issues with this approach, in general we recommend implementing rollbacks by rolling forward to a new version.
Rollback guidelines
A summary of rollback guidelines:
Any rollback must ensure the router's supergraph schema is compatible with the underlying subgraphs deployed in the target environment.
GraphOS's standard CI/CD schema delivery pipeline is the best choice for most environments seeking continuous deployment and empowerment of subgraph teams to ship both independently and with the safety of GraphOS checks to prevent breaking changes. For details, see the change management tech note.
In environments with existing blue-green or canary deployments that rely on an immutable infrastructure approach—where no in-place updates, patches, or configuration changes can be made on production workloads—the router image can use an embedded supergraph schema. The supergraph schema is set for the router with the
--supergraph
flag for a specific GraphOSlaunchID
that's generated by publishing the subgraph schemas for the specific subgraph image versions used in a blue-green or canary deployment. In this way, a blue-green or canary deployment can be made immutable as a whole, so rolling back to a previous deployment ensures the router's supergraph schema is compatible with the underlying subgraphs deployed in the target environment.In general, we don't recommend rolling back only the supergraph schema on the router in isolation. Subgraph compatibility must also be taken into account. Subsequent publishing of subgraphs generates a new supergraph schema that may lose rolled back changes, so in general it's better to fix the problem at the source of truth in the subgraph repository.