Explaining GraphQL Connections
Caleb Meredith
I don’t read many books these days, but I do listen to books. Thanks to the incessant work of the Audible marketing team Amazon has converted me into an audiobook consumer. I’m pretty happy now with my new found hobby that allows me to take a book anywhere and “read” while doing tasks where my eyes would otherwise be occupied.
Yet there is one thing that I find difficult about the medium. Audiobooks don’t have pages. Pages exist in books to help people find stuff in a large collection of words, and pages help create clear boundaries between ideas. An audiobook just flows continuously at one constant rate which makes it difficult to jump back or skip parts.
It’s the same with GraphQL: If APIs returned one continuous list of data then our applications would suffer major performance penalties. Sometimes, it’s not even possible process the entire data set at once. By paginating our data set we are choosing a strategy to provide data in separate, small, “pages” which are digestible by both our users and our applications.
Introducing GraphQL connections
The most popular method for pagination in GraphQL are GraphQL connections. These are often associated with Relay, a JavaScript GraphQL client open sourced by Facebook. However, connections are not specific to Relay at all, and this post will hopefully help clear up this common misconception.
Connections can be hard to understand because they use opaque vocabulary like “cursor,” “connection,” and “edge.” So in this post I will explain the terminology and try to answer three important questions about GraphQL connections:
- Where did they come from?
- Why were they created?
- What should you keep in mind when using connections?
Note: This post assumes some level of knowledge about the mechanics of cursor-based pagination. To learn more about pagination, make sure to read this excellent post by Sashko Stubailo: “Understanding pagination: REST, GraphQL, and Relay.”
Where did connections come from?
Connections were designed at Facebook as part of their internal GraphQL server design. They were first introduced into the open source ecosystem by the “Relay Cursor Connections Specification.” The way GraphQL connections were announced made it seem like an aspect of a specific data fetching library, but it’s something all GraphQL developers can get value from.
Facebook coined the term “connection” in the GraphQL context, but it’s really nothing more than a new name for the cursor-based pagination model that has been in use for a long time.
Most people who have been around GraphQL for a while understand what a connection is, but we want to take a more critical look. So let’s try to understand the how and why underlying the creation of GraphQL connections.
How were connections created?
While the connection specification was released into open-source with GraphQL, the design was born much earlier: When Facebook first started talking about, GraphQL they showed some examples of GraphQL the query language before it was open sourced.The most important thing to note about this proto-GraphQL is that it already appears to have connections:
Furthermore, we see the very same connection model with before
and after
in Facebook’s public graph API.
So, why would Facebook design connections like they did?
Why Were Connections Created?
This is an important question that schema designers should know the answer to before adding connections to their own schemas. First of all, we must remember that it is incredibly important for Facebook to get their list design right. After all, Facebook’s most important user-facing product, the news feed, is a just a fancy list.
In the Facebook Graph API documentation, Facebook makes the following claim:
Cursor-based pagination is the most efficient method of paging and should always be used where possible.
Those are strong words, but they make sense. The main method of pagination besides cursor based pagination is limit/offset pagination. Limit/offset pagination is the easiest pagination model to implement in SQL, which is likely the reason it is the most popular pagination model. However, limit/offset pagination has the fatal flaw that its pages can be unstable in common scenarios.
Let’s say we have a list of records and we are paginating with a limit of 10 like in the following diagram:
While the user is looking at the 10 records on the current page, 5 new records get added. Now the list looks as follows:
If the user was on page 1 when the records were added and now navigates to page 2, they will see records 11–15 again! However, a worse scenario is if the user is on page 2 when the records were added and now they navigate to page 1 and will never see records 11–15!
When your core product revolves around a list, like Facebook, it is imperative that you get pagination right. Both scenarios where the user either sees posts twice or the user misses a few posts entirely are unacceptable. Since cursor-based pagination can help avoid this flaw, it makes sense that Facebook always uses cursor-based pagination.
It is worth pointing out, however, that using cursor-based pagination is not always necessary, or that it is perfect. If items can be moved or added in the middle, cursor-based pagination suffers from the same potential flaws as limit/offset pagination.
Now we know why Facebook adopted cursors, but why did they choose names like “connection,” “edge,” and “node?” This is a very good question, and to answer it we will need to understand a bit of Graph Theory nomenclature.
Some Graph Theory Nomenclature
The data from a social network website may be easily represented as a graph. See the image below for an example of a graph structure:
The circles in the graph are called “nodes” and the lines of the graph are called “edges.” An edge is a line that connects two nodes together, representing some kind of relationship between the two nodes.
Now, how do we represent a social network using this data structure?
Let’s add some labels:
Notice how some of our nodes now all have a label like “User” or “Post.” The edges also have labels like “Friend,” “Author,” or “Liked.”
It’s safe to assume that Facebook likes to think about their data as a graph. It just makes sense to think about a social network’s data that way. You can see Facebook thinking and talking about their data as a graph in Facebook’s research, the Graph API and the Relay specifications. Even GraphQL itself has the name “graph” in it, even though GraphQL isn’t actually a graph query language!
This does not mean Facebook stores all of its data in a graph database. It just means that at a high level this is how they think about their data.
If you give it a couple of minutes it should be very easy to start seeing the data on apps you have worked on as a graph.
So what does this have to do with GraphQL connections?
Graph nomenclature and GraphQL connections
Let us zoom in and focus on a single node and its edges in the labeled data graph we see above. Let’s say we have a GraphQL query that fetches this node. Something like:
{ user(id: "ZW5jaG9kZSBIZWxsb1dvcmxk") { id name } }
We want to get all of the users that this user is connected to. Which would be all of the connected users in our graph.
Our GraphQL query this would then look like:
{ user(id: "ZW5jaG9kZSBIZWxsb1dvcmxk") { id name friendsConnection(first: 3) { edges { cursor node { id name } } } } }
…and that’s where the name edge comes from!
Why name a list of edges a “connection” though? Why not just call it “list” and make our lives easier? A connection is a way to get all of the nodes that are connected to another node in a specific way. In this case we want to get all of the nodes connected to our users that are friends. Another connection might be between a user node to all of the posts that they liked.
Now that we understand the what, how, and why of GraphQL connections. Let us look at some specific design recommendations for your API based off of our deeper understanding of connections.
Schema design recommendations
Let’s go through the process of designing the GraphQL schema above. First we need a user type.
type User implements Node { id: ID! name: String }
This type will represent the user nodes in our graph.
Next, we will want to add the connection types.
type UserFriendsConnection { pageInfo: PageInfo! edges: [UserFriendsEdge] }type UserFriendsEdge { cursor: String! node: User }
I recommend naming your edge type: ${Origin Type}${Relationship Type}Edge
. Likewise with your connections: ${Origin Type}${Relationship Type}Connection
. We start with the origin type name (in this case User
) so that alphabetically these types will be grouped together with User
in our printed schema and in GraphiQL. In addition, starting with User
reduces the chance of a naming collision. Say in the future you want to add a Dog
type and you want your dog to have friends?
Then we add the relationship type name, and finally Edge
or Connection
.
If you use this pattern, you’ll also be able to do the cool design pattern I’m about to show you next 😉
Data on the edges
Our UserFriendsConnection
type represents an abstract concept, but our UserFriendsEdge
represents an actual entity in our graph. The line that connects two nodes!
In graph theory, an edge can have properties of its own which act effectively as metadata. For example if we have a “liked” edge between a user and a post we might want to include the time at which the user liked that post. We might want the same thing for our UserFriendsEdge
. So let’s add a friendedAt
field.
type UserFriendsEdge { cursor: String! node: User friendedAt: DateTime }
It seems like people resist adding fields to their edge type because most tools treat the edge type as boilerplate, including <a href="https://github.com/graphql/graphql-relay-js" target="_blank" rel="noreferrer noopener">graphql-relay-js</a>
which was the first to provide an implementation of GraphQL connections. However, I encourage you to not think of edges as mundane boilerplate. Instead think of them as the relationship between two nodes, and add fields that are appropriate for that relationship (this is one of the reasons Apollo Client isn’t opinionated about how you do pagination).
Finally, we need to add the connection back to our User
type.
type User { id: ID! name: String friendsConnection( first: Int, after: String, last: Int, before: String ): UserFriendsConnection }
Note here that we name the field ${Relationship Type}Connection
. This is so that we reserve the friends
field name for a future pagination specification. After all, technology moves so fast that it is reasonable to expect a new pagination model in the future, and since our GraphQL APIs are not always versioned it is also reasonable to plan for that future by reserving names in this way.
A good GraphQL schema designer is always planning for future naming collisions. It’s useful to choose detailed, descriptive names that can be deprecated in the future over small, elegant names that are harder to deprecate given they have the perfect concise name for the data you want to represent.
Now, hopefully, when we choose to use connections in our GraphQL APIs it won’t be because it’s a standard we heard about. If we choose connections it should be because we agree that cursor based pagination has tangible benefits for our application and we find that the graph concepts used by connections accurately reflect our data, whether we use them with Relay or Apollo, or any other client.
I’m going to go back and read/listen to my audiobooks because, frankly, I’m a little tired of pages now.
{ "pageInfo: { "hasNextPage": false } }
If you liked this post, you might also be interested in contributing to Apollo Client or working on the Apollo project full-time! Join the Apollo Slack team to say hi, and check out our jobs page.