Pattern: Pagination

How can an API provider deliver large sequences of structured data without overwhelming clients?


The final version of this pattern is featured in our book Patterns for API Design: Simplifying Integration with Loosely Coupled Message Exchanges.

Pattern: Pagination

also known as: Query with Partial Result Sets, Response Sequence

Context

Clients often query APIs, fetching data item collections to be displayed to the user or to be processed in other applications. In many such queries, the API provider responds by sending a large number of items. The size of this response may be larger than what the client needs or is ready to consume.

The data set may consist of identically structured data elements (e.g., rows fetched from a relational database or line items in a batch job executed by an enterprise information system in the backend) or of heterogeneous data not adhering to a common schema (e.g., parts of a document from a document-oriented NoSQL database such as MongoDB).

The data set may be read only or change while being retrieved.

Problem

How can an API provider deliver large sequences of structured data without overwhelming clients?

Forces

Pagination balances the following forces:

  • Performance, scalability, and resource use
  • Information needs of individual clients
  • Loose coupling and interoperability
  • Developer convenience and experience
  • Security and data privacy
  • Test and maintenance effort
  • Session awareness and isolation
  • Data set size and data access profile

Pattern forces are explained in depth in the book.

Solution

Divide large response data sets into manageable and easy-to-transmit chunks (also known as pages). Send one chunk of partial results per response message, and inform the client about the total and/or remaining number of chunks. Provide optional filtering capabilities to allow clients to request a particular selection of results. For extra convenience, include a reference to the next chunk/page from the current one.

Sketch

A solution sketch for this pattern from pre-book times is:

Figure 1: Pagination: query and follow-on request messages, response messages with filtered, partial result sets (pages)

Variants

The pattern comes in several variants: Page-Based Pagination (a somewhat tautological name), Offset-Based Pagination, Cursor-Based Pagination (also known as Token-Based Pagination) and Time-Based Pagination.

Page-Based Pagination (a somewhat tautological name) and Offset-Based Pagination refer to the elements of the data set differently. The page-based variant divides the data set into same-sized pages; the client or the provider can specify the page size. Clients then request pages by their index (like page numbers in a book). With Offset-Based Pagination, a client selects an offset into the whole data set (i.e., how many single elements to skip) and the number of elements to return in the next chunk (often referred to as “limit”). Both approaches may be used interchangeably (the offset can be calculated by multiplying the page size with the page number); they address the problem and resolve the forces in similar ways. Page-Based Pagination and Offset-Based Pagination do not differ much. Whether entries are requested with an offset and limit or all entries are divided into pages of a particular size and then requested by an index is a minor difference. Either case requires two integer parameters.

These variants are not well suited for data that changes in between requests and therefore invalidates the index or offset calculations. For example, given a data set ordered by creation time from most recent to oldest, let us assume that a client has retrieved the first page and now requests the second one. In between these requests, the element at the front of the data set is then removed, causing an element to move from the second to the first page without the client ever seeing it.

The Cursor-Based Pagination variant solves this problem: it does not rely on the absolute position of an element in the data set. Instead, clients use the Id Element of a specific element along with the number of elements to retrieve. The resulting chunk does not change even if new elements have been added since the last request. The remaining fourth variant, Time-Based Pagination, is similar to Cursor-Based Pagination, but uses timestamps instead of element IDs. It is used in practice less frequently but could be applied to scroll through a time plot by gradually requesting older or newer data points.

Example

The Lakeside Mutual customer care backend API illustrates the Offset-Based Pagination pattern in its customer endpoint:

curl http://localhost:8080/customers?limit=2&offset=0

This call will return the first chunk of two entities and several control Metadata Elements. Besides the two HATEOAS-style link relations (Allamaraju (2010)) that link to the current chunk and the next chunk, the response also contains the corresponding offset, limit, and total size values. Note that size is not required to implement Pagination on the provider side, but allows API clients to show end users or other consumers how many more data elements (or pages) may be requested subsequently.

{
  "offset" : 0,
  "limit" : 2,
  "size" : 50,
  "customers" : [ 
    ...
  , 
    ...
  ],
  "_links" : {
    "next" : {
      "href" : "http://localhost:8080/customers?limit=2&offset=2"
    }
  }
}

The example shown above can easily be mapped to the corresponding SQL query LIMIT 2 OFFSET 0. Instead of talking about offsets and limits, the API could also use the page metaphor in its message vocabulary, as shown here:

{
  "page" : 0,
  "pageSize" : 2,
  "totalPages" : 25,
  "customers" : [ 
    ...
}

Using Cursor-Based Pagination, the client first requests the initial page of a desired size:

curl http://localhost:8080/customers?page-size=2

{
  "pageSize" : 2,
  "customers" : [ 
    ...
  , 
    ...
  ],
  "_links" : {
    "next" : {
      "href" : "http://localhost:8080/customers?page-size=2&cursor=mfn834fj"
    }
  }
}

The response contains a link to next chunk of data, represented by the cursor value mfn834fj. The cursor could be as simple as the primary key of the database or contain more information, such as a query filter.

Are you missing implementation hints? Our papers publications provide them (for selected patterns).

Consequences

The resolution of pattern forces and other consequences are discussed in our book.

Known Uses

The JSON API specification illustrates Pagination; while it does not mention Time-Based Pagination, it differentiates between the offset and page variants. Hence, JSON API-based implementations provide further examples for studying the pattern realization.

The roots of the pattern and its name go back to plain Web page design, e.g., when displaying search or other query results on a series of linked Web pages. An early SOA and Web services production reference that uses Pagination is Brandner et al. (2004). While not being message-based, remote JDBC applies sophisticated Pagination concepts via its Result Set abstraction.

Many public Web APIs use Pagination; typically both Page-Based and Offset-Based and Cursor-Based Pagination are supported while the Time-Based Pagination variant is less common. For example, Google’s search results are paginated as well as GitHub’s Query API and the Slack APIs. Atlassian also features Pagination explicitly and prominently in its JIRA Cloud REST APIs. Regarding correlation, the Twitter REST API is an interesting example because the timeline often changes, and simple Page-based and Offset-Based Pagination therefore does not work that well. Instead, a since_id=12345 Atomic Parameter can be used to only retrieve tweets that are more recent than the specified id Id Element.

A Swiss software vendor specializing on the insurance industry describes page- and offset-based Pagination in its internal REST API Design Guidelines. Sorting and filtering of collection records is supported via operators that travel as HTTP parameters containing control metadata.

An online API Stylebook website lists a number of Web APIs, API design guideline books, and websites that discuss Pagination.

The Pattern Adoption Story: Dutch Government & Energy Sector contributed by a reader has more known uses and further discussion of this pattern.

More Information

Related Patterns

Pagination can be seen as the opposite of Request Bundle: While Pagination is concerned with reducing the individual message size by splitting one large message into many smaller pages, Request Bundle combines several messages into a single large one.

A paginated query typically defines an Atomic Parameter List for its input parameters containing the query parameters and a Parameter Tree for its output parameters (i.e., the pages). Pagination is an alternative to sending one big Parameter Tree if a large number of repetitive data records has to be transmitted and the client only required a small fraction of the data records to proceed.

A request-response(s) correlation scheme might be required so that the client can distinguish the partial results of multiple queries in arriving response messages; the pattern Correlation Identifier (Hohpe and Woolf (2003)) might be eligible in such cases.

A Message Sequence from Hohpe and Woolf (2003) also can be used when a single large data element has to be split up.

Other Sources

Chapter 10 of Sturgeon (2016) covers Pagination types, discusses implementation approaches, and presents examples in PHP; Chapter 8 in the RESTful Web Services Cookbook by Allamaraju (2010) deals with queries in an RESTful HTTP context.

In a broader context, the User Interface (UI) and Web design community has captured Pagination patterns in different contexts (i.e., not API design and management, but interaction design and information visualization). See for example coverage of the topic at the Interaction Design Foundation and a UI Patterns website.

“Web API Design: The Missing Link”, an eBook by apigee, covers Pagination under “More on Representation Design”.

Chapter 8 of Vernon (2013) features stepwise retrieval of a notification log/archive, which can be seen as Offset-Based Pagination. RFC 5005 covers “Feed Paging and Archiving” for Atom (Nottingham 2007).

References

Allamaraju, Subbu. 2010. RESTful Web Services Cookbook. O’Reilly.
Brandner, Michael, Michael Craes, Frank Oellermann, and Olaf Zimmermann. 2004. “Web Services-Oriented Architecture in Production in the Finance Industry.” Informatik-Spektrum 27 (2): 136–45. https://doi.org/10.1007/s00287-004-0380-2.
Fowler, Martin. 2002. Patterns of Enterprise Application Architecture. Addison-Wesley.
Hohpe, Gregor, and Bobby Woolf. 2003. Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions. Addison-Wesley.
Nottingham, Mark. 2007. Feed Paging and Archiving.” Request for Comments. RFC 5005; RFC Editor. https://doi.org/10.17487/RFC5005.
Sturgeon, Phil. 2016. Build APIs You Won’t Hate. LeanPub. https://leanpub.com/build-apis-you-wont-hate.
Vernon, Vaughn. 2013. Implementing Domain-Driven Design. Addison-Wesley.