Aggregating Data Across Subgraphs
If product requirements don't align with a single domain, it may suggest the need for a new domain
A supergraph helps you orchestrate data fetches across multiple domains, however it doesn't automatically solve some commonly encountered problems in distributed architectures:
Searching across a variety of types stored in multiple data sources
Combining lists of multiple types from multiple sources into a single list, especially when paginated
Filtering a list based on attributes defined in various data sources
Deriving data from an aggregate of multiple data sources
Although it's sometimes possible to generate query plans to support these use cases using the @requires
directive, it's almost always better to provide this functionality in a new system, such as a search index.
Example: Search
Given an operation to search across both books and movies, we want to return a polymorphic list of books and movies that match the search term.
1query SearchEverything($query: String!) {
2 search(query: $query) {
3 ... on Book {
4 title
5 authors {
6 name
7 }
8 }
9 ... on Movie {
10 title
11 directors {
12 name
13 }
14 }
15 }
16}
If different subgraphs define the Book
type and the Movie
type, the question is: which subgraph provides the Query.search
root field?
For a given operation, the query planner resolves each field in a single subgraph. This invariant holds true even if multiple subgraphs define a particular field.
1type Query {
2 search(query: String!): [Product] @shareable
3}
4
5interface Product {
6 title: String
7}
8
9type Book implements Product {
10 title: String
11 authors: [Person]
12}
1type Query {
2 search(query: String!): [Product] @shareable
3}
4
5interface Product {
6 title: String
7}
8
9type Movie implements Product {
10 title: String
11 directors: [Person]
12}
The query planner deterministically chooses one subgraph to resolve the Query.search
field. (It calculates all valid query plans for the operation and chooses the "cheapest" one.)
If the query planner chooses Query.search
within the Books subgraph, that subgraph can provide only books, not movies.
To resolve this, we could expand the Books subgraph schema to include the Movie
definition and add the @key
directive to create a join to the Movies subgraph, like so:
1type Movie implements Product @key(fields: "id") {
2 id: ID!
3 title: String @external
4}
Now the Books subgraph can return Movie
instances, but that means it needs access to the data source for movie data. This breaks the separation of concerns we rely on to create dividing lines between domains and subgraphs.
Solution: Create a new subgraph
When product requirements don't fit cleanly into a single domain, it often indicates that we need a new domain. Let's design a system that includes a new search domain. This includes a search index and a Search subgraph that provides the Query.search
root field.
This pattern works for all the use cases listed above:
Search: A search index (such as Elasticsearch) is the most efficient way to search through a variety of data types and return only the most relevant results.
Combining lists: A combined index is the most efficient way to list and paginate through a variety of data types. Fetching multiple lists and combining them on the fly usually means overfetching pages of data and throwing data away when it isn't part of the result.
Filtering: A data store can contain indices on various attributes of a variety of data types and efficiently filter results on those attributes.
Derived aggregates: Many data stores can efficiently compute a derived value such as
AVG(products.rating)
, or we can write precomputed derived value to a data store.
We can remove the Query.search
root field from both the Books and Movies subgraphs and instead add it to our new Search subgraph:
1type Query {
2 search(query: String!): [Product]
3}
4
5interface Product {
6 id: ID!
7 title: String
8}
9
10type Book implements Product @key(fields: "id") {
11 id: ID!
12 title: String @shareable
13}
14
15type Movie implements Product @key(fields: "id") {
16 id: ID!
17 title: String @shareable
18}
The query plan first uses the Search subgraph to return a polymorphic list of Book
s and Movie
s with their id
and title
fields. Then in parallel, it joins with the Books subgraph to fetch Book.authors
and joins with the Movies subgraph to fetch Movie.directors
.
Click to expand query plan
1QueryPlan {
2 Sequence {
3 Fetch(service: "search") {
4 {
5 search(query: $query) {
6 __typename
7 ... on Book {
8 __typename
9 id
10 title
11 }
12 ... on Movie {
13 __typename
14 id
15 title
16 }
17 }
18 }
19 },
20 Parallel {
21 Flatten(path: "search.@") {
22 Fetch(service: "books") {
23 {
24 ... on Book {
25 __typename
26 id
27 }
28 } =>
29 {
30 ... on Book {
31 authors {
32 name
33 }
34 }
35 }
36 },
37 },
38 Flatten(path: "search.@") {
39 Fetch(service: "movies") {
40 {
41 ... on Movie {
42 __typename
43 id
44 }
45 } =>
46 {
47 ... on Movie {
48 directors {
49 name
50 }
51 }
52 }
53 },
54 },
55 },
56 },
57}
Note that the Search subgraph provides a minimal set of fields. We don't need to duplicate the entire Book
and Movie
types in our Search domain, just the fields we want to search on.
Tradeoffs
As you'd expect, adding an entirely new domain to your supergraph has its tradeoffs:
The question of ownership
In the Search subgraph example, we've introduced multiple new services that require ongoing development, maintenance, and support. It doesn't make sense for our existing Books and Movies teams to take on this extra burden. Usually, we want to spin up an entirely new team to hold the pager and deployment keys for our new subgraphs and data stores.
Eventual consistency
Replicating data from the canonical Books and Movies databases into the Search index inevitably involves replication lag, leading to eventually consistent results between our subgraphs.
How we handle that lag depends on our business requirements. If the results from Query.search
must be "internally" consistent, we can denormalize data using @shareable
and @provides
. We demonstrated this by providing the title
fields from the Search subgraph and its backing index.
If the results must be accurate with our canonical data source, we can make sure the query planner fetches those fields from their respective subgraphs. The Book.authors
and Movie.directors
fields exemplify that pattern.