Handling the N+1 Problem
Enhance subgraph performance with the dataloader pattern
GraphQL developers often encounter the "N+1 query problem" with operations that return a list.
Consider this TopReviews
query:
query TopReviews {
topReviews(first: 10) {
id
rating
product {
name
imageUrl
}
}
}
In a monolithic GraphQL server, the execution engine takes these steps:
Resolve the
Query.topReviews
field, which returns a list ofReview
s.For each
Review
, resolve theReview.product
field.
If Query.topReviews
returns ten reviews, then the executor resolves Review.product
field ten times.
If the Reviews.product
field makes a database or REST query for a single Product
, then there are ten unique calls to the data source.
This is suboptimal for the following reasons:
Fetching all products in a single query is more efficient—for example,
SELECT * FROM products WHERE id IN (<product ids>)
.If any reviews refer to the same product, then resources are wasted fetching data that was already retrieved.
The N+1 problem in a federated graph
Consider the same TopReviews
operation with the Product
type as an entity defined in Reviews and Products subgraphs.
1type Query {
2 topReviews(first: Int): [Review]
3}
4
5type Review {
6 id: ID
7 rating: Int
8 product: Product
9}
10
11type Product @key(fields: "id") {
12 id: ID
13}
1type Product @key(fields: "id") {
2 id: ID!
3 name: String
4 imageUrl: String
5}
Most subgraph implementations use reference resolvers to return the entity object corresponding to a key.
Although this pattern is straightforward, it can diminish performance when a client operation requests fields from many entities.
Recall the topReviews
query, now in the context of a federated graph:
query TopReviews {
topReviews(first: 10) { # Defined in Reviews subgraph
id
rating
product { # ⚠️ NOT defined in Reviews subgraph
name
imageUrl
}
}
}
The router executes two queries:
Fetch all fields except
Product.name
andProduct.imageURL
from the Reviews subgraph.Fetch each product's
name
andimageURL
from the Products subgraph.
In the Products subgraph, the reference resolver for Product
doesn't take a list of keys but rather a single key.
Therefore, the subgraph library calls the reference resolver once for each key:
1// Products subgraph
2const resolvers = {
3 Product: {
4 __resolveReference(productRepresentation) {
5 return fetchProductByID(productRepresentation.id);
6 }
7 },
8 // ...other resolvers...
9}
A basic fetchProductByID
function might make a database call each time it's called.
If you need to resolve Product.name
for N
different products, this results in N
database calls.
These calls are made in addition to the call made by the Reviews subgraph to fetch the initial list of reviews and the id
of each product.
This problem can cause performance problems or even enable denial-of-service attacks.
Query planning to handle N+1 queries
By default, the router's query planner handles N+1 queries for entities like the Product
type.
The query plan for the TopReviews
operation works like this:
First, the router fetches the list of
Review
s from the Reviews subgraph using the root fieldQuery.topReviews
. The router also asks for theid
of each associated product.Next, the router extracts the
Product
entity references and fetches them in a batch to the Products subgraph'sQuery._entities
root field.After the router gets back the
Product
entities, it merges them into the list ofReview
s, indicated by theFlatten
step.
Click to expand query plan
1QueryPlan {
2 Sequence {
3 Fetch(service: "reviews") {
4 {
5 topReviews(first: 10) {
6 id
7 rating
8 product {
9 __typename
10 id
11 }
12 }
13 }
14 },
15 Flatten(path: "reviews.@") {
16 Fetch(service: "products") {
17 {
18 ... on Product {
19 __typename
20 id
21 }
22 } =>
23 {
24 ... on Product {
25 name
26 imageUrl
27 }
28 }
29 },
30 },
31 },
32}
Most subgraph implementations (including @apollo/subgraph
) don't write the Query._entities
resolver directly.
Instead, they use the reference resolver API for resolving an individual entity reference:
1const resolvers = {
2 Product: {
3 __resolveReference(productRepresentation) {
4 return fetchProductByID(productRepresentation.id);
5 },
6 },
7};
The motivation for this API relates to a subtle, critical aspect of the subgraph specification: the order of resolved entities must match the order of the given entity references. If the entities are returned in the wrong order, those fields are merged with the wrong entities, leading to incorrect results. To avoid this issue, most subgraph libraries handle entity order for you.
Because order matters, it reintroduces the N+1 query problem: in the example above, fetchProductByID
gets called once for each entity reference.
The dataloader pattern solution
The solution for the N+1 problem—whether for federated or monolithic graphs—is the dataloader pattern. For example, in an Apollo Server implementation, using dataloaders could look like this:
1const resolvers = {
2 Product: {
3 __resolveReference(product, context) {
4 return context.dataloaders.products(product.id);
5 },
6 },
7};
With dataloaders, when the query planner calls the Products subgraph with a batch of Product
entities, the router makes a single batched request to the Products data source.
Nearly every GraphQL server library provides a dataloader implementation, and Apollo recommends using it in every resolver, even those that aren't for entities or don't return a list.