Today there will be no complex cases and intricate algorithms in SQL. Everything will be very simple, at the level of Captain Evidence - do viewing the event log sorted by time.
That is, here lies in the base plate events
, and she has a field ts
- exactly the same time at which we want to show these records in an orderly manner:
CREATE TABLE events(
id
serial
PRIMARY KEY
, ts
timestamp
, data
json
);
CREATE INDEX ON events(ts DESC);
It is clear that we will have more than a dozen records there, so we will need in some form page navigation.
#0. "I'm a pogromist with my mother"
cur.execute("SELECT * FROM events;")
rows = cur.fetchall();
rows.sort(key=lambda row: row.ts, reverse=True);
limit = 26
print(rows[offset:offset+limit]);
Not even a joke - rare, but found in the wild. Sometimes, after working with ORM, it can be difficult to switch to “direct” work with SQL.
But let's move on to more common and less obvious problems.
#1. OFFSET
SELECT
...
FROM
events
ORDER BY
ts DESC
LIMIT 26 OFFSET $1; -- 26 - записей на странице, $1 - начало страницы
Where did the number 26 come from? This is the approximate number of entries to fill one screen. More precisely, 25 displayed records, plus 1, signaling that there is at least something else in the selection and it makes sense to move on.
Of course, this value can not be "sewn" into the request body, but passed through a parameter. But in this case, the PostgreSQL scheduler will not be able to rely on the knowledge that there should be relatively few entries, and will easily choose an inefficient plan.
And while viewing the registry in the application interface is implemented as switching between visual "pages", no one notices anything suspicious for a long time. Exactly until the moment when, in the struggle for convenience, UI / UX does not decide to remake the interface to “infinite scroll” - that is, all registry entries are drawn in a single list that the user can scroll up and down.
And now, at the next test, you are caught on duplication of records in the registry. Why, because the table has a normal index (ts)
on which your query relies?
Exactly because you did not take into account that ts
is not a unique key in this table. Actually, and its values are not unique, like any “time” in real conditions - therefore, the same record in two adjacent queries easily “jumps” from page to page due to a different final order within sorting the same key value.
In fact, there is also a second problem hidden here, which is much more difficult to notice - some entries will not be shown at all! After all, the "duplicated" records took someone's place. Detailed explanation with beautiful pictures
Expanding the index
A cunning developer understands that you need to make the index key unique, and the easiest way is to extend it with a known unique field, which PK is perfect for:
CREATE UNIQUE INDEX ON events(ts DESC, id DESC);
And the request mutates:
SELECT
...
ORDER BY
ts DESC, id DESC
LIMIT 26 OFFSET $1;
#2. Switching to "cursors"
Some time later, the DBA comes to you and is “pleased” that your requests
SELECT
...
WHERE
(ts, id) < ($1, $2) -- последние полученные на предыдущем шаге значения
ORDER BY
ts DESC, id DESC
LIMIT 26;
You breathed a sigh of relief before...
#3. Cleaning up indexes
Because one day your DBA read (ts DESC)
.
But what to do with the initial problem of "jumping" records between pages?.. And everything is simple - you need to choose blocks with a non-fixed number of records!
In general, who forbids us to read not “exactly 26”, but “at least 26”? For example, so that the next block contains records with obviously different values ts
- then after all, there will be no problem with “jumping” records between blocks!
Here's how to achieve this:
SELECT
...
WHERE
ts < $1 AND
ts >= coalesce((
SELECT
ts
FROM
events
WHERE
ts < $1
ORDER BY
ts DESC
LIMIT 1 OFFSET 25
), '-infinity')
ORDER BY
ts DESC;
What's going on here anyway?
- We step 25 records "down" and get the "boundary" value
ts
. - If there is already nothing there, then we replace the NULL value with
-infinity
. - Subtract the entire segment of values between the received value
ts
and the parameter $1 passed from the interface (the previous "last" rendered value). - If the block returned with less than 26 entries, it is the last one.
Or the same picture:
Because now we have the sample does not have any definite "beginning", then nothing prevents us from "deploying" this request in the opposite direction and implementing dynamic loading of data blocks from the "reference point" in both directions - both up and down.
Comment
- Yes, in this case we refer to the index twice, but everything is “pure by index”. Therefore, the nested query will only result in to one additional Index Only Scan.
- It is quite obvious that this technique can only be used when you have values
ts
can only intersect by chance, and few of them. If your typical case is “a million records at 00:00:00.000”, then you should not do this. In a sense, such a case should not be allowed. But if that's the case, use the extended index option.
Source: habr.com