8. Configuring Prometheus
10m

Overview

Let's forget about combing through long terminal logs—lots of tools make exploring our metrics a lot cleaner.

In this lesson, we will:

  • Install and set up Prometheus
  • Expose the 's metrics to Prometheus
  • Explore data about how our 's running

Prometheus

Let's get into it! To make our 's logs a lot easier to browse, we'll set up Prometheus.

Prometheus is a tool for monitoring system output and metrics, primarily intended for use in a containerized environment with many different services running simultaneously. Even though our isn't running in a container, we'll still benefit from hooking Prometheus up to its metrics. Prometheus has numerous features that are worth exploring, but we'll keep our use case fairly straightforward for this course: we want to be able to visualize our router's metrics and inspect them in a cleaner, more organized format than verbose terminal logs.

Prometheus also works great with Grafana, though we won't cover their integration in this course.

To set up Prometheus, navigate to the official download page. Here you'll find a variety of download options depending on your operating system and architecture. Use the dropdowns to find the right match for your computer, then click download on the provided tar.gz file!

prometheus.io/download

The Prometheus download page, highlighting the dropdowns for operating system and architecture

When you expand the downloaded package, you'll find a bunch of different items contained in it. Among them are two files in particular that we'll concern ourselves with: the prometheus binary, and a configuration file called prometheus.yml.

Prometheus download
📦 prometheus
┣ 📂 console_libraries
┣ 📂 consoles
┣ 📂 data
┣ 📄 prometheus
┣ 📄 prometheus.yml
┗ 📄 promtool

We'll run the prometheus binary, and use the prometheus.yml file to set our configuration, but first let's make sure our 's exposing its metrics.

Exporting metrics to Prometheus

Jump into router-config.yaml. We need to set some configuration that will allow the to expose its metrics on a specific port. Then, we'll make sure that Prometheus is listening for the logs on this port.

Locate the telemetry key.

router-config.yml
# ... supergraph configuration
include_subgraph_errors:
all: true
telemetry:
instrumentation: # ... other config

We'll define three additional keys, each nested beneath the key that precedes it: exporters, metrics, and prometheus. Here's what that looks like.

router-config.yml
telemetry:
exporters:
metrics:
prometheus:
instrumentation: # ... other config

Under the prometheus key is where we'll provide our actual configuration. We'll tell the that yes, exporting metrics to Prometheus is enabled. We'll give it the host and port to listen on, followed by the path where metrics can be found on that port. We'll listen on host and port 127.0.0.1:9080, with a path of /metrics. (This is the default path Prometheus looks for on the given host!)

router-config.yml
telemetry:
exporters:
metrics:
prometheus:
enabled: true
listen: 127.0.0.1:9080
path: /metrics

That's it for our configuration! Let's stop and restart our router now.

Task!

Now open up http://127.0.0.1:9080/metrics in the browser. If everything worked correctly, we'll see a bunch of output from the .

http://127.0.0.1:9080/metrics

The metrics port opened, showing output from the router

Great, now let's boot up Prometheus.

Setting targets in Prometheus

We can use the Prometheus configuration file to do something similar: we'll give it a target to scrape data from, namely the port where our is exporting its metrics to. Open up the prometheus.yml file included in the Prometheus package we downloaded.

By default, this file is configured to listen to the running Prometheus instance's own metrics. If you scroll down, you'll find a key called scrape_configs.

scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["127.0.0.1:9090"]

We're not interested in Prometheus' metrics, so let's first clear out the job_name and replace it with "router". Then under static_configs, in the targets list, we'll update the port Prometheus should scrape metrics from to be 9080.

scrape_configs:
- job_name: "router"
static_configs:
- targets: ["127.0.0.1:9080"]

Save the file, and let's start up Prometheus. In a terminal opened to the downloaded Prometheus package, run the following command to start the binary.

./prometheus

As soon as the process is running, we can open the Prometheus client by navigating to http://localhost:9090.

http://localhost:9090

Prometheus opened on port 9090

Exploring Prometheus

We can explore any of the 's exported metrics here in Prometheus. In the search bar, type out apollo_. See the long list of options that pop up? Let's keep typing until we find the option apollo_router_cache_hit_count_total. Once it's selected, click the blue Execute button.

Every time one of our requests utilizes a cache—either the 's in-memory cache, or Redis—we'll see a new entry in Prometheus. Before starting, let's make sure our Redis cache is clear and that our router has been booted up.

Checkpoint

Jump into Explorer. We'll start with a for a particular listing.

query GetListing($listingId: ID!) {
listing(id: $listingId) {
title
numOfBeds
reviews {
rating
}
}
}

And in the Variables panel:

{
"listingId": "listing-1"
}

This is the first time the rebooted has received this request: this means that even if we press Execute in Prometheus for our apollo_router_cache_hit_count_total metric again, we won't see any results. The won't have found it in either cache (hence there are no "cache hits" in either case). Now run the again in Explorer. This time we should see a different result in Prometheus.

http://localhost:9090

Prometheus updated to find a record for cache hit count

Note: You can press the Execute button to refresh the data on a specific metric panel. If this doesn't result in new data, try refreshing the page.

Here under the Table tab we'll see a new entry; and if we look at the end of the line, we'll see a specific indication about where the discovered the cached : storage="memory". On the far right we can also see the total count for each time a cache hit has occurred for the in-memory cache; currently, it's just one time.

Let's add a new panel. Click the Add panel button, then type apollo_router_cache_miss_count_total into the search bar. Click Execute.

apollo_router_cache_miss_count_total

This new panel collects data about each time the fails to find a in one of its caches. We should see two lines: one for storage="memory", another for storage="redis". In this case we can see that each cache has a miss count of 1; the checked both caches the first time we ran our , but neither had the . The second time we ran the query, the found the plan in its in-memory cache.

http://localhost:9090

Prometheus update with a new table to keep track of cache misses

Let's pretend that a different instance needs to execute the . To do this, we'll stop our router and restart it. This clears its in-memory cache, simulating a fresh router coming on the scene without any past executed queries.

Task!

Back in Explorer, run that one more time, then refresh the metrics in Prometheus.

Under apollo_router_cache_hit_count_total, we'll see that there was one successful hit when checking our Redis cache. We'll also see that there was a cache miss when our tried to check its own in-memory cache for the plan. Not finding it, it checked the distributed cache, where any router executing a can stash its for the benefit of other router instances.

http://localhost:9090

Prometheus updated for the refreshed router instance, showing an in-memory cache miss and a Redis cache hit

Take some time to explore the other metrics available in the dropdown: explore cache size over time with apollo_router_cache_size, or check out something entirely different like apollo_router_operations_total!

Practice

Using Prometheus
We can use Prometheus to collect and display the 
 
 from our router. To expose this data, we need to use the 
 
 and 
 
 keys in the router's configuration file. We configure Prometheus to scrape data from the router by setting up a new 
 
 and providing the port we provided in the router configuration.

Drag items from this box to the blanks above

  • query results

  • exporters

  • target

  • resource

  • telemetry

  • metrics

  • redis

  • cache

Key takeaways

  • Prometheus can monitor and record our system output, allowing us to get a fine-grained look at particular metrics.
  • To enable the to export its metrics in a format accessible to Prometheus, we can configure its telemetry key to enable a metrics endpoint.
  • We can use the prometheus.yml file to customize its targets: the various systems or services it scrapes data from.

Journey's end

You've done it! We've taken the full journey through the 's current caching capabilities: from to , in-memory and distributed, to ending with a dashboard in Prometheus that shows us the finer details of a particular metric. Well done!

Previous

Share your questions and comments about this lesson

This course is currently in

beta
. Your feedback helps us improve! If you're stuck or confused, let us know and we'll help you out. All comments are public and must follow the Apollo Code of Conduct. Note that comments that have been resolved or addressed may be removed.

You'll need a GitHub account to post below. Don't have one? Post in our Odyssey forum instead.