Deploy in Kubernetes

Self-hosted deployment of the router in Kubernetes

Learn how to deploy a self-hosted router (GraphOS Router or Apollo Router Core) in Kubernetes using Helm charts.

The following guides provide the steps to:

Get a router Helm chart from the Apollo container repository.
Deploy a router with a basic Helm chart.
Configure chart values to export metrics, enable Rhai scripting, and deploy a coprocessor.
Choose chart values that best suit migration from a gateway to the router.

note

The Apollo Router Core source code and all its distributions are made available under the Elastic License v2.0 (ELv2) license.

About the router Helm chart

Helm is a package manager for Kubernetes (k8s). Apollo provides an application Helm chart with each release of Apollo Router Core in GitHub. Since the router version 0.14.0, Apollo has released the router Helm chart as an Open Container Initiative (OCI) image in a GitHub container registry.

note

The path to the OCI router chart is oci://ghcr.io/apollographql/helm-charts/router.

You customize a deployed router with the same command-line options and YAML configuration but under different Helm CLI options and YAML keys.

Basic deployment

Follow this guide to deploy the Router using Helm to install the basic chart provided with each router release.

Each router chart has a values.yaml file with router and deployment settings. The released, unedited file has a few explicit settings, including:

Default container ports for the router's HTTP server, health check endpoint, and metrics endpoint.
A command-line argument to enable hot reloading of the router.
A single replica.

Click to expand values.yaml for router v1.31.0

The values of the Helm chart for Apollo Router Core v1.31.0 in the GitHub container repository, as output by the helm show command:

Bash

1helm show values oci://ghcr.io/apollographql/helm-charts/router

YAML

1Pulled: ghcr.io/apollographql/helm-charts/router:1.31.0
2Digest: sha256:26fbe98268456935cac5b51d44257bf96c02ee919fde8d47a06602ce2cda66a3
3# Default values for router.
4# This is a YAML-formatted file.
5# Declare variables to be passed into your templates.
6
7replicaCount: 1
8
9# -- See https://www.apollographql.com/docs/router/configuration/overview/#yaml-config-file for yaml structure
10router:
11  configuration:
12    supergraph: # HTTP server
13      listen: 0.0.0.0:80
14    health_check:
15      listen: 0.0.0.0:8088
16    telemetry:
17      metrics:
18        prometheus:
19          enabled: false
20          listen: 0.0.0.0:9090
21          path: "/metrics"
22
23  args:
24  - --hot-reload
25
26managedFederation:
27  # -- If using managed federation, the graph API key to identify router to Studio
28  apiKey:
29  # -- If using managed federation, use an existing secret which stores the graph API key instead of creating a new one.
30  # If set along `managedFederation.apiKey`, a secret with the graph API key will be created using this parameter as name
31  existingSecret:
32  # -- If using managed federation, the variant of which graph to use
33  graphRef: ""
34
35# This should not be specified in values.yaml. It's much simpler to use --set-file from helm command line.
36# e.g.: helm ... --set-file supergraphFile="location of your supergraph file"
37supergraphFile:
38
39# An array of extra environmental variables
40# Example:
41# extraEnvVars:
42#   - name: APOLLO_ROUTER_SUPERGRAPH_PATH
43#     value: /etc/apollo/supergraph.yaml
44#   - name: APOLLO_ROUTER_LOG
45#     value: debug
46#
47extraEnvVars: []
48extraEnvVarsCM: ''
49extraEnvVarsSecret: ''
50
51# An array of extra VolumeMounts
52# Example:
53# extraVolumeMounts:
54#   - name: rhai-volume
55#     mountPath: /dist/rhai
56#     readonly: true
57extraVolumeMounts: []
58
59# An array of extra Volumes
60# Example:
61# extraVolumes:
62#   - name: rhai-volume
63#     configMap:
64#       name: rhai-config
65#
66extraVolumes: []
67
68image:
69  repository: ghcr.io/apollographql/router
70  pullPolicy: IfNotPresent
71  # Overrides the image tag whose default is the chart appVersion.
72  tag: ""
73
74containerPorts:
75  # -- If you override the port in `router.configuration.server.listen` then make sure to match the listen port here
76  http: 80
77  # -- For exposing the metrics port when running a serviceMonitor for example
78  metrics: 9090
79  # -- For exposing the health check endpoint
80  health: 8088
81
82# -- An array of extra containers to include in the router pod
83# Example:
84# extraContainers:
85#   - name: coprocessor
86#     image: acme/coprocessor:1.0
87#     ports:
88#       - containerPort: 4001
89extraContainers: []
90
91# -- An array of init containers to include in the router pod
92# Example:
93# initContainers:
94#   - name: init-myservice
95#     image: busybox:1.28
96#     command: ["sh"]
97initContainers: []
98
99# -- A map of extra labels to apply to the resources created by this chart
100# Example:
101# extraLabels:
102#   label_one_name: "label_one_value"
103#   label_two_name: "label_two_value"
104extraLabels: {}
105
106lifecycle: {}
107#  preStop:
108#    exec:
109#      command:
110#        - /bin/bash
111#        - -c
112#        - sleep 10
113
114imagePullSecrets: []
115nameOverride: ""
116fullnameOverride: ""
117
118serviceAccount:
119  # Specifies whether a service account should be created
120  create: true
121  # Annotations to add to the service account
122  annotations: {}
123  # The name of the service account to use.
124  # If not set and create is true, a name is generated using the fullname template
125  name: ""
126
127podAnnotations: {}
128
129podSecurityContext: {}
130  # fsGroup: 2000
131
132securityContext: {}
133  # capabilities:
134  #   drop:
135  #   - ALL
136  # readOnlyRootFilesystem: true
137  # runAsNonRoot: true
138  # runAsUser: 1000
139
140service:
141  type: ClusterIP
142  port: 80
143  annotations: {}
144
145serviceMonitor:
146  enabled: false
147
148ingress:
149  enabled: false
150  className: ""
151  annotations: {}
152    # kubernetes.io/ingress.class: nginx
153    # kubernetes.io/tls-acme: "true"
154  hosts:
155    - host: chart-example.local
156      paths:
157        - path: /
158          pathType: ImplementationSpecific
159  tls: []
160  #  - secretName: chart-example-tls
161  #    hosts:
162  #      - chart-example.local
163
164# set to true to enable istio's virtualservice
165virtualservice:
166  enabled: false
167  # namespace: ""
168  # gatewayName: ""
169  # http:
170  #   main:
171  #     # set enabled to true to add
172  #     # the default matcher of `exact: "/" or prefix: "/graphql"`
173  #     # with the <$fullName>.<.Release.Namespace>.svc.cluster.local destination
174  #     enabled: true
175  #   # use additionals to provide your custom virtualservice rules
176  #   additionals: []
177  #   - name: "default-nginx-routes"
178  #       match:
179  #         - uri:
180  #             prefix: "/foo"
181  #       rewrite:
182  #         uri: /
183  #       route:
184  #         - destination:
185  #             host: my.custom.backend.svc.cluster.local
186  #             port:
187  #               number: 80
188
189# set to true and provide configuration details if you want to make external https calls through istio's virtualservice
190serviceentry:
191  enabled: false
192  # hosts:
193  # a list of external hosts you want to be able to make https calls to
194  #   - api.example.com
195
196resources: {}
197  # We usually recommend not to specify default resources and to leave this as a conscious
198  # choice for the user. This also increases chances charts run on environments with little
199  # resources, such as Minikube. If you do want to specify resources, uncomment the following
200  # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
201  # limits:
202  #   cpu: 100m
203  #   memory: 128Mi
204  # requests:
205  #   cpu: 100m
206  #   memory: 128Mi
207
208autoscaling:
209  enabled: false
210  minReplicas: 1
211  maxReplicas: 100
212  targetCPUUtilizationPercentage: 80
213  # targetMemoryUtilizationPercentage: 80
214
215nodeSelector: {}
216
217tolerations: []
218
219affinity: {}
220
221# -- Sets the [pod disruption budget](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) for Deployment pods
222podDisruptionBudget: {}
223
224# -- Set to existing PriorityClass name to control pod preemption by the scheduler
225priorityClassName: ""
226
227# -- Sets the [termination grace period](https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#hook-handler-execution) for Deployment pods
228terminationGracePeriodSeconds: 30
229
230probes:
231  # -- Configure readiness probe
232  readiness:
233    initialDelaySeconds: 0
234  # -- Configure liveness probe
235  liveness:
236    initialDelaySeconds: 0

Set up Helm

Install Helm version 3.x. The router's Helm chart requires Helm v3.x.
note
Your Kubernetes version must be compatible with Helm v3. For details, see Helm Version Support Policy.
Verify you can pull from the registry by showing the latest router chart values with the helm show values command:
Bash
1helm show values oci://ghcr.io/apollographql/helm-charts/router

Set up cluster

Install the tools and provision the infrastructure for your Kubernetes cluster.

For an example, see the Setup from Apollo's Reference Architecture. It provides steps you can reference for gathering accounts and credentials for your cloud platform (GCP or AWS), provisioning resources, and deploying your subgraphs.

tip

To manage the system resources you need to deploy the router on Kubernetes:

Read Managing router resources in Kubernetes.
Use the router resource estimator.

Set up graph

Set up your self-hosted graph and get its graph ref and API key.

If you need a guide to set up your graph, you can follow the self-hosted router quickstart and complete step 1 (Set up Apollo tools), step 4 (Obtain your subgraph schemas), and step 5 (Publish your subgraph schemas).

Deploy router

To deploy the router, run the helm install command with an argument for the OCI image in the container repository, an argument for the values.yaml configuration file, and additional arguments to override specific configuration values.

Bash

1helm install --namespace <router-namespace> --set managedFederation.apiKey="<graph-api-key>" --set managedFederation.graphRef="<graph-ref>"  oci://ghcr.io/apollographql/helm-charts/router --version <router-version> --values router/values.yaml

The necessary arguments for specific configuration values:

--set managedFederation.graphRef="<graph-ref>". The reference to your managed graph (id@variant), the same value as the APOLLO_GRAPH_REF environment variable.
--set managedFederation.apiKey="<graph-api-key>". The API key to your managed graph, the same value as the APOLLO_KEY environment variable.

Some optional but recommended arguments:

--namespace <router-namespace>. The namespace scope for this deployment.
--version <router-version>. The version of the router to deploy. If not specified by helm install, the latest version is installed.

Verify deployment

Verify that your router is one of the deployed releases with the helm list command.

If you deployed with the --namespace <router-namespace> option, you can list only the releases within your namespace:

Bash

1helm list --namespace <router-namespace>

Deploy with metrics endpoints

The router supports metrics endpoints for Prometheus and OpenTelemetry protocol (OTLP). A basic deployment doesn't enable metrics endpoints, because the router chart disables both Prometheus (explicitly) and OTLP (by omission).

To enable metrics endpoints in your deployed router through a YAML configuration file:

Create a YAML file, my_values.yaml, to contain additional values that override default values.
Edit my_values.yaml to enable metrics endpoints:
YAML
my_values.yaml
1router: 2 configuration: 3 telemetry: 4 metrics: 5 prometheus: 6 enabled: true 7 listen: 0.0.0.0:9090 8 path: "/metrics" 9 otlp: 10 temporality: delta 11 endpoint: <otlp-endpoint-addr>
note
Although this example enables both Prometheus and OTLP, in practice it's common to enable only one endpoint.
- router.configuration.telemetry.metrics.prometheus was already configured but disabled (enabled: false) by default. This configuration sets enabled: true.
- router.configuration.telemetry.metrics.otlp is enabled by inclusion.
- router.configuration.telemetry.temporality by default is temporality: cumulative and is a good choice for most metrics consumers. For DataDog, use temporality: delta.

Deploy the router with the additional YAML configuration file. For example, starting with the helm install command from the basic deployment step, append --values my_values.yaml:

Bash

1helm install --namespace <router-namespace> --set managedFederation.apiKey="<graph-api-key>" --set managedFederation.graphRef="<graph-ref>"  oci://ghcr.io/apollographql/helm-charts/router --version <router-version> --values router/values.yaml --values my_values.yaml

Deploy with Rhai scripts

The router supports Rhai scripting to add custom functionality.

Enabling Rhai scripts in your deployed router requires mounting an extra volume for your Rhai scripts and getting your scripts onto the volume. That can be done by following steps in a separate example for creating a custom in-house router chart. The example creates a new (in-house) chart that wraps (and depends on) the released router chart, and the new chart has templates that add the necessary configuration to allow Rhai scripts for a deployed router.

Deploy with a coprocessor

The router supports external coprocessing to run custom logic on requests throughout the router's request-handling lifecycle.

A deployed coprocessor has its own application image and container in the router pod.

To configure a coprocessor and its container for your deployed router through a YAML configuration file:

Create a YAML file, my_values.yaml, to contain additional values that override default values.
Edit my_values.yaml to configure a coprocessor for the router. For reference, follow the typical and minimal configuration examples, and apply them to router.configuration.coprocessor.

Example of typical configuration for a coprocessor

YAML

router.yaml

1coprocessor:
2  url: http://127.0.0.1:8081 # Required. Replace with the URL of your coprocessor's HTTP endpoint.
3  timeout: 2s # The timeout for all coprocessor requests. Defaults to 1 second (1s)
4  router: # This coprocessor hooks into the `RouterService`
5    request: # By including this key, the `RouterService` sends a coprocessor request whenever it first receives a client request.
6      headers: true # These boolean properties indicate which request data to include in the coprocessor request. All are optional and false by default.
7      body: false
8      context: false
9      sdl: false
10      path: false
11      method: false
12    response: # By including this key, the `RouterService` sends a coprocessor request whenever it's about to send response data to a client (including incremental data via @defer).
13      headers: true
14      body: false
15      context: false
16      sdl: false
17      status_code: false
18  supergraph: # This coprocessor hooks into the `SupergraphService`
19    request: # By including this key, the `SupergraphService` sends a coprocessor request whenever it first receives a client request.
20      headers: true # These boolean properties indicate which request data to include in the coprocessor request. All are optional and false by default.
21      body: false
22      context: false
23      sdl: false
24      method: false
25    response: # By including this key, the `SupergraphService` sends a coprocessor request whenever it's about to send response data to a client (including incremental data via @defer).
26      headers: true
27      body: false
28      context: false
29      sdl: false
30      status_code: false
31  subgraph:
32    all:
33      request: # By including this key, the `SubgraphService` sends a coprocessor request whenever it is about to make a request to a subgraph.
34        headers: true # These boolean properties indicate which request data to include in the coprocessor request. All are optional and false by default.
35        body: false
36        context: false
37        uri: false
38        method: false
39        service_name: false
40        subgraph_request_id: false
41      response: # By including this key, the `SubgraphService` sends a coprocessor request whenever receives a subgraph response.
42        headers: true
43        body: false
44        context: false
45        service_name: false
46        status_code: false
47        subgraph_request_id: false

Edit my_values.yaml to add a container for the coprocessor.

YAML

my_values.yaml

1extraContainers:
2  - name: <coprocessor-deployed-name> # name of deployed container
3    image: <coprocessor-app-image> # name of application image
4    ports:
5      - containerPort: <coprocessor-container-port> # must match port of router.configuration.coprocessor.url
6    env: [] # array of environment variables

Deploy the router with the additional YAML configuration file. For example, starting with the helm install command from the basic deployment step, append --values my_values.yaml:

Bash

1helm install --namespace <router-namespace> --set managedFederation.apiKey="<graph-api-key>" --set managedFederation.graphRef="<graph-ref>"  oci://ghcr.io/apollographql/helm-charts/router --version <router-version> --values router/values.yaml --values my_values.yaml

Separate configurations per environment

To support your different deployment configurations for different environments (development, staging, production, etc.), Apollo recommends separating your configuration values into separate files:

A common file, which contains values that apply across all environments.
A unique environment file per environment, which includes and overrides the values from the common file while adding new environment-specific values.

The helm install command applies each --values <values-file> option in the order you set them within the command. Therefore, a common file must be set before an environment file so that the environment file's values are applied last and override the common file's values.

For example, this command deploys with a common_values.yaml file applied first and then a prod_values.yaml file:

Bash

1helm install --namespace <router-namespace> --set managedFederation.apiKey="<graph-api-key>" --set managedFederation.graphRef="<graph-ref>"  oci://ghcr.io/apollographql/helm-charts/router --version <router-version> --values router/values.yaml  --values common_values.yaml --values prod_values.yaml

Deploying in Kubernetes with Istio

Istio is a service mesh for Kubernetes which is often installed on a cluster for its traffic-shaping abilities. While we do not specifically recommend or support Istio, nor do we provide specific instructions for installing the Router in a cluster with Istio, there is a known consideration to make when configuring Istio.

Consideration and additional configuration may be necessary as a consequence of how Istio does its sidecar injection. Without additional configuration, Istio may attempt to reconfigure the network interface at the same time the router is starting, which will result in a failure to start.

This is not specifically a router issue and Istio has instructions on how to manage the matter in a general sense in their own documentation. Their suggestion prevents the startup of all other containers in a pod until Istio itself is ready. We recommend this approach when using Istio.

Configure for migration from gateway

When migrating from @apollo/gateway to the router, consider the following tips to maximize the compatibility of your router deployment.

Increase maximum request bytes

By default the router sets its maximum supported request size at 2MB, while the gateway sets its maximum supported request size at 20MB. If your gateway accepts requests larger than 2MB, which it does by default, you can use the following configuration to ensure that the router is compatible with your gateway deployment.

YAML

values.yaml

1router:
2  configuration:
3    limits:
4      http_max_request_bytes: 20000000 #20MB

Increase request timeout

The router's timeout is increased to accommodate subgraph operations with high latency.

YAML

values.yaml

1router:
2  configuration:
3    traffic_shaping:
4      router:
5        timeout: 6min
6      all:
7        timeout: 5min

Propagate subgraph errors

The gateway propagates subgraph errors to clients, but the router doesn't by default, so it needs to be configured to propagate them.

YAML

values.yaml

1router:
2  configuration:
3    include_subgraph_errors:
4      all: true

Troubleshooting

Pods terminating due to memory pressure

If your deployment of routers is terminating due to memory pressure, you can add router cache metrics to monitor and remediate your system:

Add and track the following metrics to your monitoring system:
- apollo.router.cache.storage.estimated_size
- apollo.router.cache.size
- ratio of apollo.router.cache.hit.time.count to apollo.router.cache.miss.time.count
Observe and monitor the metrics:
- Observe the apollo.router.cache.storage.estimated_size to see if it grows over time and correlates with pod memory usage.
- Observe the ratio of cache hits to misses to determine if the cache is being effective.
Based on your observations, try some remediating adjustments:
- Lower the cache size if the cache reaches near 100% hit-rate but the cache size is still growing.
- Increase the pod memory if the cache hit rate is low and the cache size is still growing.
- Lower the cache size if the latency of query planning cache misses is acceptable and memory availability is limited.

Subscription Support

Log Exporters

Metrics Exporters

Trace Exporters

Instrumentation

AWS Lattice

Contracts

SAML

OIDC

Subgraph Reference

Development and Tooling

From Federation 1 to 2

Server-Driven UI

Subscription Support

Log Exporters

Metrics Exporters

Trace Exporters

Instrumentation

AWS Lattice

Deploy in Kubernetes

About the router Helm chart

Basic deployment

Set up Helm

Set up cluster

Set up graph

Deploy router

Verify deployment

Deploy with metrics endpoints

Deploy with Rhai scripts

Deploy with a coprocessor

Separate configurations per environment

Deploying in Kubernetes with Istio

Configure for migration from gateway

Increase maximum request bytes

Increase request timeout

Propagate subgraph errors

Troubleshooting

Pods terminating due to memory pressure