Why do you need instrumental support for pagination on keys?

Hi all! I'm a backend developer writing microservices in Java + Spring. I work in one of the internal product development teams at Tinkoff.

Why do you need instrumental support for pagination on keys?

In our team, the question of optimizing queries in a DBMS often arises. You always want to be a little faster, but you can’t always get by with thoughtfully constructed indexesβ€”you have to look for some workarounds. During one of these wanderings around the web in search of reasonable optimizations when working with databases, I found Marcus Wynand's endlessly helpful blog, author of SQL Performance Explained. This is that rare type of blog in which you can read all the articles in a row.

I would like to translate a short article by Marcus for you. It can be called to some extent a manifesto that seeks to draw attention to the old, but still relevant problem of the performance of the offset operation according to the SQL standard.

In some places I will supplement the author with explanations and comments. I will refer to all such places as β€œapprox.” for more clarity

A small introduction

I think many people know how problematic and slow working with page selects via offset is. Did you know that it can be quite easily replaced with a more efficient design?

So, the offset keyword tells the database to skip the first n records in the request. However, the database still needs to read these first n records from disk, in the given order (note: apply sorting if it is specified), and only then will it be possible to return records from n+1 onwards. The most interesting thing is that the problem is not in the specific implementation in the DBMS, but in the original definition according to the standard:

…the rows are first sorted according to the and then limited by dropping the number of rows specified in the from the beginning...
-SQL:2016, Part 2, 4.15.3 Derived tables (note: currently the most used standard)

The key point here is that offset takes a single parameter - the number of records to skip, and that's it. Following this definition, the DBMS can only retrieve all the records and then discard the unnecessary ones. Obviously, this definition of offset forces us to do extra work. And it doesn’t even matter whether it’s SQL or NoSQL.

Just a little more pain

The problems with offset don't end there, and here's why. If, between reading two pages of data from disk, another operation inserts a new record, what will happen in this case?

Why do you need instrumental support for pagination on keys?

When offset is used to skip records from previous pages, in the situation of adding a new record between reads of different pages, you will most likely get duplicates (note: this is possible when we read page by page using the order by construct, then in the middle of our output it may get a new entry).

The figure clearly depicts this situation. The base reads the first 10 records, after which a new record is inserted, which offsets all read records by 1. Then the base takes a new page from the next 10 records and starts not from the 11th, as it should, but from the 10th, duplicating this record. There are other anomalies associated with the use of this expression, but this is the most common.

As we have already found out, these are not problems of a specific DBMS or their implementations. The problem is in defining pagination according to the SQL standard. We tell the DBMS which page to fetch or how many records to skip. The database is simply not able to optimize such a request, since there is too little information for this.

It's also worth clarifying that this is not a problem with a specific keyword, but rather with the semantics of the query. There are several more syntaxes that are identical in their problematic nature:

  • The offset keyword is as mentioned earlier.
  • A construction of two keywords limit [offset] (although limit itself is not so bad).
  • Filtering by lower bounds, based on row numbering (for example, row_number(), rownum, etc.).

All of these expressions simply tell you how many lines to skip, no additional information or context.

Later in this article, the offset keyword is used as a summary of all these options.

Life without OFFSET

Now let’s imagine what our world would be like without all these problems. It turns out that life without offset is not so difficult: with a select, you can select only those rows that we have not yet seen (note: that is, those that were not on the previous page), using a condition in where.

In this case, we start from the fact that selects are executed on an ordered set (good old order by). Since we have an ordered set, we can use a fairly simple filter to get only the data that is behind the last record of the previous page:

    SELECT ...
    FROM ...
    WHERE ...
    AND id < ?last_seen_id
    ORDER BY id DESC
    FETCH FIRST 10 ROWS ONLY

That's the whole principle of this approach. Of course, things get more fun when sorting by many columns, but the idea is still the same. It is important to note that this design is applicable to many NoSQL-decisions.

This approach is called seek method or keyset pagination. It solves the floating result problem (note: the situation with writing between page reads described earlier) and, of course, what we all love, it works faster and more stable than the classic offset. Stability lies in the fact that the request processing time does not increase in proportion to the number of the requested table (note: if you want to learn more about the work of different approaches to pagination, you can look through the author's presentation. You can also find comparative benchmarks for different methods there).

One of the slides talks about thatthat pagination by keys, of course, is not omnipotent - it has its limitations. The most significant is that she does not have the ability to read random pages (note: inconsistently). However, in the era of endless scrolling (note: on the front end), this is not such a problem. Specifying a page number for clicking is a bad decision in UI design anyway (note: opinion of the author of the article).

What about the tools?

Pagination on keys is often not suitable due to the lack of instrumental support for this method. Most development tools, including various frameworks, do not allow you to choose exactly how pagination will be performed.

The situation is aggravated by the fact that the described method requires end-to-end support in the technologies used - from the DBMS to the execution of an AJAX request in the browser with endless scrolling. Instead of specifying just the page number, you now have to specify a set of keys for all pages at once.

However, the number of frameworks that support pagination on keys is gradually growing. Here's what we have at the moment:

(Note: some links were removed due to the fact that at the time of translation some libraries had not been updated since 2017-2018. If you are interested, you can look at the original source.)

It is at this moment that your help is needed. If you develop or support a framework that makes any use of pagination, then I ask, I urge, I implore you to provide native support for pagination on keys. If you have questions or need help, I will be happy to help (forum, Twitter, contact form) (note: from my experience with Marcus, I can say that he is really enthusiastic about spreading this topic).

If you use ready-made solutions that you think are worthy of having support for pagination by keys, create a request or even offer a ready-made solution, if possible. You can also link to this article.

Conclusion

The reason why such a simple and useful approach as pagination by keys is not widespread is not that it is difficult to implement technically or requires any great effort. The main reason is that many are accustomed to seeing and working with offset - this approach is dictated by the standard itself.

As a result, few people think about changing the approach to pagination, and because of this, instrumental support from frameworks and libraries is developing poorly. Therefore, if the idea and goal of offset-free pagination is close to you, help spread it!

Source: https://use-the-index-luke.com/no-offset
Author: Markus Winand

Source: habr.com

Add a comment