Since 1.48.0

Demand Control

Protect your graph from high-cost GraphQL operations


This feature is only available with a GraphOS Enterprise plan. You can test it out by signing up for a free GraphOS trial. To compare GraphOS feature support across all plan types, see the pricing page.

What is demand control?

Demand control provides a way to secure your supergraph from overly complex operations, based on the IBM GraphQL Cost Directive specification.

Application clients can send overly costly operations that overload your supergraph infrastructure. These operations may be costly due to their complexity and/or their need for expensive resolvers. In either case, demand control can help you protect your infrastructure from these expensive operations. When your router receives a request, it calculates a cost for that operation. If the cost is greater than your configured maximum, the operation is rejected.

Calculating cost

When calculating the cost of an operation, the router sums the costs of the sub-requests that it plans to send to your subgraphs.

  • For each operation, the cost is the sum of its base cost plus the costs of its fields.

  • For each field, the cost is defined recursively as its own base cost plus the cost of its selections. In the IBM specification, this is called field cost.

The cost of each operation type:

MutationQuerySubscription
type1000

The cost of each GraphQL element type, per operation type:

MutationQuerySubscription
Object111
Interface111
Union111
Scalar000
Enum000

Using these defaults, the following operation would have a cost of 4.

GraphQL
1query BookQuery {
2  book(id: 1) {
3    title
4    author {
5      name
6    }
7    publisher {
8      name
9      address {
10        zipCode
11      }
12    }
13  }
14}
Example query's cost calculation
Text
1 Query (0) + 1 book object (1) + 1 author object (1) + 1 publisher object (1) + 1 address object (1) = 4 total cost

Customizing cost

Since version 1.53.0, the router supports customizing the cost calculation with the @cost directive. The @cost directive has a single argument, weight, which overrides the default weights from the table above.

note
The Apollo Federation @cost directive differs from the IBM specification in that the weight argument is of type Int! instead of String!.

Annotating your schema with the @cost directive customizes how the router scores operations. For example, imagine that the Address resolver for an example query is particularly expensive. We can annotate the schema with the @cost directive with a larger weight:

GraphQL
1type Query {
2  book(id: ID): Book
3}
4
5type Book {
6  title: String
7  author: Author
8  publisher: Publisher
9}
10
11type Author {
12  name: String
13}
14
15type Publisher {
16  name: String
17  address: Address
18}
19
20type Address
21  @cost(weight: 5) {
22  zipCode: Int!
23}

This increases the cost of BookQuery from 4 to 8.

Example query's updated cost calculation
Text
1 Query (0) + 1 book object (1) + 1 author object (1) + 1 publisher object (1) + 1 address object (5) = 8 total cost

Handling list fields

During the static analysis phase of demand control, the router doesn't know the size of the list fields in a given query. It must use estimates for list sizes. The closer the estimated list size is to the actual list size for a field, the closer the estimated cost will be to the actual cost.

note
The difference between estimated and actual operation cost calculations is due only to the difference between assumed and actual sizes of list fields.

There are two ways to indicate the expected list sizes to the router:

The @listSize directive supports field-level granularity in setting list size. By using its assumedSize argument, you can set a statically defined list size for a field. If you are using paging parameters which control the size of the list, use the slicingArguments argument.

Continuing with our example above, let's add two queryable fields. First, we will add a field which returns the top five best selling books:

GraphQL
1type Query {
2  book(id: ID): Book
3  bestsellers: [Book] @listSize(assumedSize: 5)
4}

With this schema, the following query has a cost of 40:

GraphQL
1query BestsellersQuery {
2  bestsellers {
3    title
4    author {
5      name
6    }
7    publisher {
8      name
9      address {
10        zipCode
11      }
12    }
13  }
14}
Cost of bestsellers query
Text
1 Query (0) + 5 book objects (5 * (1 book object (1) + 1 author object (1) + 1 publisher object (1) + 1 address object (5))) = 40 total cost

The second field we will add is a paginated resolver. It returns the latest additions to the inventory:

GraphQL
1type Query {
2  book(id: ID): Book
3  bestsellers: [Book] @listSize(assumedSize: 5)
4  newestAdditions(after: ID, limit: Int!): [Book]
5    @listSize(slicingArguments: ["limit"])
6}

The number of books returned by this resolver is determined by the limit argument.

GraphQL
1query NewestAdditions {
2  newestAdditions(limit: 3) {
3    title
4    author {
5      name
6    }
7    publisher {
8      name
9      address {
10        zipCode
11      }
12    }
13  }
14}

The router will estimate the cost of this query as 24. If the limit was increased to 7, then the cost would increase to 56.

Text
When requesting 3 books:
1 Query (0) + 3 book objects (3 * (1 book object (1) + 1 author object (1) + 1 publisher object (1) + 1 address object (5))) = 24 total cost

When requesting 7 books:
1 Query (0) + 3 book objects (7 * (1 book object (1) + 1 author object (1) + 1 publisher object (1) + 1 address object (5))) = 56 total cost

Configuring demand control

To enable demand control in the router, configure the demand_control option in router.yaml:

YAML
router.yaml
1demand_control:
2  enabled: true
3  mode: measure
4  strategy:
5    static_estimated:
6      list_size: 10
7      max: 1000

When demand_control is enabled, the router measures the cost of each operation and can enforce operation cost limits, based on additional configuration.

Customize demand_control with the following settings:

OptionValid valuesDefault valueDescription
enabledbooleanfalseSet true to measure operation costs or enforce operation cost limits.
modemeasure, enforce--- measure collects information about the cost of operations.
- enforce rejects operations exceeding configured cost limits
strategystatic_estimated--static_estimated estimates the cost of an operation before it is sent to a subgraph
static_estimated.list_sizeinteger--The assumed maximum size of a list for fields that return lists.
static_estimated.maxinteger--The maximum cost of an accepted operation. An operation with a higher cost than this is rejected.

When enabling demand_control for the first time, set it to measure mode. This will allow you to observe the cost of your operations before setting your maximum cost.

Telemetry for demand control

tip
New to router telemetry? See Router Telemetry.

You can define router telemetry to gather cost information and gain insights into the cost of operations sent to your router:

  • Generate histograms of operation costs by operation name, where the estimated cost is greater than an arbitrary value.

  • Attach cost information to spans.

  • Generate log messages whenever the cost delta between estimated and actual is greater than an arbitrary value.

Instruments

InstrumentDescription
cost.actualThe actual cost of an operation, measured after execution.
cost.estimatedThe estimated cost of an operation before execution.
cost.deltaThe difference between the actual and estimated cost.

Attributes

Attributes for cost can be applied to instruments, spans, and events—anywhere supergraph attributes are used.

AttributeValueDescription
cost.actualbooleanThe actual cost of an operation, measured after execution.
cost.estimatedbooleanThe estimated cost of an operation before execution.
cost.deltabooleanThe difference between the actual and estimated cost.
cost.resultbooleanThe return code of the cost calculation. COST_OK or an error code

Selectors

Selectors for cost can be applied to instruments, spans, and events—anywhere supergraph attributes are used.

KeyValueDefaultDescription
costestimated, actual, delta, resultThe estimated, actual, or delta cost values, or the result string

Examples

Example instrument

Enable a cost.estimated instrument with the cost.result attribute:

YAML
router.yaml
1telemetry:
2  instrumentation:
3    instruments:
4      supergraph:
5        cost.estimated:
6          attributes:
7            cost.result: true
8            graphql.operation.name: true

Example span

Enable the cost.estimated attribute on supergraph spans:

YAML
router.yaml
1telemetry:
2  instrumentation:
3    spans:
4      supergraph:
5        attributes:
6          cost.estimated: true

Example event

Log an error when cost.delta is greater than 1000:

YAML
router.yaml
1telemetry:
2  instrumentation:
3    events:
4      supergraph:
5        COST_DELTA_TOO_HIGH:
6          message: "cost delta high"
7          on: event_response
8          level: error
9          condition:
10            gt:
11              - cost: delta
12              - 1000
13          attributes:
14            graphql.operation.name: true
15            cost.delta: true

Filtering by cost result

In router telemetry, you can customize instruments that filter their output based on cost results.

For example, you can record the estimated cost when cost.result is COST_ESTIMATED_TOO_EXPENSIVE:

YAML
router.yaml
1telemetry:
2  instrumentation:
3    instruments:
4      supergraph:
5        # custom instrument
6        cost.rejected.operations:
7          type: histogram
8          value:
9            # Estimated cost is used to populate the histogram
10            cost: estimated
11          description: "Estimated cost per rejected operation."
12          unit: delta
13          condition:
14            eq:
15              # Only show rejected operations.
16              - cost: result
17              - "COST_ESTIMATED_TOO_EXPENSIVE"
18          attributes:
19            graphql.operation.name: true # Graphql operation name is added as an attribute

Configuring instrument output

When analyzing the costs of operations, if your histograms are not granular enough or don't cover a sufficient range, you can modify the views in your telemetry configuration:

YAML
1telemetry:
2  exporters:
3    metrics:
4      common:
5        views:
6          # Define a custom view because cost is different than the default latency-oriented view of OpenTelemetry
7          - name: cost.*
8            aggregation:
9              histogram:
10                buckets:
11                  - 0
12                  - 10
13                  - 100
14                  - 1000
15                  - 10000
16                  - 100000
17                  - 1000000
Example histogram of operation costs from a Prometheus endpoint
Text
# TYPE cost_actual histogram
cost_actual_bucket{otel_scope_name="apollo/router",le="0"} 0
cost_actual_bucket{otel_scope_name="apollo/router",le="10"} 3
cost_actual_bucket{otel_scope_name="apollo/router",le="100"} 5
cost_actual_bucket{otel_scope_name="apollo/router",le="1000"} 11
cost_actual_bucket{otel_scope_name="apollo/router",le="10000"} 19
cost_actual_bucket{otel_scope_name="apollo/router",le="100000"} 20
cost_actual_bucket{otel_scope_name="apollo/router",le="1000000"} 20
cost_actual_bucket{otel_scope_name="apollo/router",le="+Inf"} 20
cost_actual_sum{otel_scope_name="apollo/router"} 1097
cost_actual_count{otel_scope_name="apollo/router"} 20
# TYPE cost_delta histogram
cost_delta_bucket{otel_scope_name="apollo/router",le="0"} 0
cost_delta_bucket{otel_scope_name="apollo/router",le="10"} 2
cost_delta_bucket{otel_scope_name="apollo/router",le="100"} 9
cost_delta_bucket{otel_scope_name="apollo/router",le="1000"} 7
cost_delta_bucket{otel_scope_name="apollo/router",le="10000"} 19
cost_delta_bucket{otel_scope_name="apollo/router",le="100000"} 20
cost_delta_bucket{otel_scope_name="apollo/router",le="1000000"} 20
cost_delta_bucket{otel_scope_name="apollo/router",le="+Inf"} 20
cost_delta_sum{otel_scope_name="apollo/router"} 21934
cost_delta_count{otel_scope_name="apollo/router"} 1
# TYPE cost_estimated histogram
cost_estimated_bucket{cost_result="COST_OK",otel_scope_name="apollo/router",le="0"} 0
cost_estimated_bucket{cost_result="COST_OK",otel_scope_name="apollo/router",le="10"} 5
cost_estimated_bucket{cost_result="COST_OK",otel_scope_name="apollo/router",le="100"} 5
cost_estimated_bucket{cost_result="COST_OK",otel_scope_name="apollo/router",le="1000"} 9
cost_estimated_bucket{cost_result="COST_OK",otel_scope_name="apollo/router",le="10000"} 11
cost_estimated_bucket{cost_result="COST_OK",otel_scope_name="apollo/router",le="100000"} 20
cost_estimated_bucket{cost_result="COST_OK",otel_scope_name="apollo/router",le="1000000"} 20
cost_estimated_bucket{cost_result="COST_OK",otel_scope_name="apollo/router",le="+Inf"} 20
cost_estimated_sum{cost_result="COST_OK",otel_scope_name="apollo/router"}
cost_estimated_count{cost_result="COST_OK",otel_scope_name="apollo/router"} 20

An example chart of a histogram:

You can also chart the percentage of operations that would be allowed or rejected with the current configuration:

Feedback

Edit on GitHub

Forums