Aggregating Data Across Subgraphs

If product requirements don't align with a single domain, it may suggest the need for a new domain


A supergraph helps you orchestrate data fetches across multiple domains, however it doesn't automatically solve some commonly encountered problems in distributed architectures:

  • Searching across a variety of types stored in multiple data sources

  • Combining lists of multiple types from multiple sources into a single list, especially when paginated

  • Filtering a list based on attributes defined in various data sources

  • Deriving data from an aggregate of multiple data sources

Although it's sometimes possible to generate query plans to support these use cases using the @requires directive, it's almost always better to provide this functionality in a new system, such as a search index.

Given an operation to search across both books and movies, we want to return a polymorphic list of books and movies that match the search term.

GraphQL
1query SearchEverything($query: String!) {
2  search(query: $query) {
3    ... on Book {
4      title
5      authors {
6        name
7      }
8    }
9    ... on Movie {
10      title
11      directors {
12        name
13      }
14    }
15  }
16}

If different subgraphs define the Book type and the Movie type, the question is: which subgraph provides the Query.search root field?

For a given operation, the query planner resolves each field in a single subgraph. This invariant holds true even if multiple subgraphs define a particular field.

GraphQL
Books subgraph
1type Query {
2  search(query: String!): [Product] @shareable
3}
4
5interface Product {
6  title: String
7}
8
9type Book implements Product {
10  title: String
11  authors: [Person]
12}
GraphQL
Movies subgraph
1type Query {
2  search(query: String!): [Product] @shareable
3}
4
5interface Product {
6  title: String
7}
8
9type Movie implements Product {
10  title: String
11  directors: [Person]
12}

The query planner deterministically chooses one subgraph to resolve the Query.search field. (It calculates all valid query plans for the operation and chooses the "cheapest" one.)

If the query planner chooses Query.search within the Books subgraph, that subgraph can provide only books, not movies.

To resolve this, we could expand the Books subgraph schema to include the Movie definition and add the @key directive to create a join to the Movies subgraph, like so:

GraphQL
1type Movie implements Product @key(fields: "id") {
2  id: ID!
3  title: String @external
4}

Now the Books subgraph can return Movie instances, but that means it needs access to the data source for movie data. This breaks the separation of concerns we rely on to create dividing lines between domains and subgraphs.

Solution: Create a new subgraph

When product requirements don't fit cleanly into a single domain, it often indicates that we need a new domain. Let's design a system that includes a new search domain. This includes a search index and a Search subgraph that provides the Query.search root field.

This pattern works for all the use cases listed above:

  • Search: A search index (such as Elasticsearch) is the most efficient way to search through a variety of data types and return only the most relevant results.

  • Combining lists: A combined index is the most efficient way to list and paginate through a variety of data types. Fetching multiple lists and combining them on the fly usually means overfetching pages of data and throwing data away when it isn't part of the result.

  • Filtering: A data store can contain indices on various attributes of a variety of data types and efficiently filter results on those attributes.

  • Derived aggregates: Many data stores can efficiently compute a derived value such as AVG(products.rating), or we can write precomputed derived value to a data store.

We can remove the Query.search root field from both the Books and Movies subgraphs and instead add it to our new Search subgraph:

GraphQL
Search subgraph
1type Query {
2  search(query: String!): [Product]
3}
4
5interface Product {
6  id: ID!
7  title: String
8}
9
10type Book implements Product @key(fields: "id") {
11  id: ID!
12  title: String @shareable
13}
14
15type Movie implements Product @key(fields: "id") {
16  id: ID!
17  title: String @shareable
18}

The query plan first uses the Search subgraph to return a polymorphic list of Books and Movies with their id and title fields. Then in parallel, it joins with the Books subgraph to fetch Book.authors and joins with the Movies subgraph to fetch Movie.directors.

Click to expand query plan
Text
1QueryPlan {
2  Sequence {
3    Fetch(service: "search") {
4      {
5        search(query: $query) {
6          __typename
7          ... on Book {
8            __typename
9            id
10            title
11          }
12          ... on Movie {
13            __typename
14            id
15            title
16          }
17        }
18      }
19    },
20    Parallel {
21      Flatten(path: "search.@") {
22        Fetch(service: "books") {
23          {
24            ... on Book {
25              __typename
26              id
27            }
28          } =>
29          {
30            ... on Book {
31              authors {
32                name
33              }
34            }
35          }
36        },
37      },
38      Flatten(path: "search.@") {
39        Fetch(service: "movies") {
40          {
41            ... on Movie {
42              __typename
43              id
44            }
45          } =>
46          {
47            ... on Movie {
48              directors {
49                name
50              }
51            }
52          }
53        },
54      },
55    },
56  },
57}

Note that the Search subgraph provides a minimal set of fields. We don't need to duplicate the entire Book and Movie types in our Search domain, just the fields we want to search on.

Tradeoffs

As you'd expect, adding an entirely new domain to your supergraph has its tradeoffs:

The question of ownership

In the Search subgraph example, we've introduced multiple new services that require ongoing development, maintenance, and support. It doesn't make sense for our existing Books and Movies teams to take on this extra burden. Usually, we want to spin up an entirely new team to hold the pager and deployment keys for our new subgraphs and data stores.

Eventual consistency

Replicating data from the canonical Books and Movies databases into the Search index inevitably involves replication lag, leading to eventually consistent results between our subgraphs.

How we handle that lag depends on our business requirements. If the results from Query.search must be "internally" consistent, we can denormalize data using @shareable and @provides. We demonstrated this by providing the title fields from the Search subgraph and its backing index.

If the results must be accurate with our canonical data source, we can make sure the query planner fetches those fields from their respective subgraphs. The Book.authors and Movie.directors fields exemplify that pattern.

Feedback

Forums