Messenger database (Part 2): partitioning "for profit"

We successfully designed the structure of our PostgreSQL database for storing correspondence, a year has passed, users are actively filling it, and now millions of records, and ... something started to slow down everything.

Messenger database (Part 2): partitioning "for profit"
Fact is that with the growth of the table volume, the β€œdepth” of the indexes also grows - albeit logarithmically. But over time it forces the server to do the same read/write tasks process many times more data pagesthan at the beginning.

This is where it comes to the rescue sectioning.

I note that this is not about sharding, that is, the distribution of data between different databases or servers. Because even dividing the data by some servers, you will not get rid of the problem of "swelling" indexes over time. It is clear that if you can afford to put a new server into operation every day, then your problems will no longer lie on the plane of a particular database.

We will consider not specific scripts for implementing partitioning "in hardware", but the approach itself - what and how should be "cut into slices", and what such a desire leads to.

Concept

Once again, we define our goal: we want to make sure that today, tomorrow, and a year later, the amount of data read by PostgreSQL for any read / write operation remains approximately the same.

For any chronologically accumulated data (messages, documents, logs, archives, ...) the natural choice for a partition key is date/time of the event. In our case, this event is the moment the message was sent.

Note that users almost always work only with the latest such data - they read the latest messages, analyze the latest logs, ... No, of course, they can scroll further back in time, only they do it very rarely.

From these restrictions, it becomes obvious that the optimal solution for messages will be "daily" sections - after all, almost always our user will read what came to him "today" or "yesterday".

If we write and read practically only in one section during the day, then this gives us also more efficient memory and disk usage - since all section indexes easily fit into the RAM, in contrast to the "big and fat" ones throughout the table.

step-by-step

In general, all of the above sounds like one big profit. And it is achievable, but for this we will have to work hard - because the decision to partition one of the entities leads to the need to "saw" and associated.

Message, its properties and projections

Since we decided to cut messages by dates, it is also reasonable to divide the entity-properties dependent on them (attached files, list of recipients), and also by post date.

Since one of our typical tasks is just viewing the message registries (unread, incoming, all), it is also logical to β€œdraw” them into sectioning by message dates.

Messenger database (Part 2): partitioning "for profit"

We add the partition key (message date) to all tables: recipients, file, registries. You can not add to the message itself, but use the existing DateTime.

Topics

Since the topic is one for several messages, it will not be possible to β€œcut” it in the same model, it is necessary to rely on something else. Perfect for our case. date of the first message in the correspondence - that is, the moment of creation, in fact, of the topic.

Messenger database (Part 2): partitioning "for profit"

Add a partition key (subject date) to all tables: topic, member.

But now we have two problems at once:

  • in which section to look for messages on the topic?
  • in which section to look for the subject of the message?

You can, of course, continue to search in all sections, but it will be very sad, and will negate all our winnings. Therefore, in order to know where exactly to look, we will make logical links / pointers to sections:

  • add to the message subject date field
  • add to the topic message date set this correspondence (you can use a separate table, or you can use an array of dates)

Messenger database (Part 2): partitioning "for profit"

Since there will be few modifications to the list of message dates for each individual correspondence (after all, almost all messages fall on 1-2 adjacent days), I will focus on this option.

In total, the structure of our database took the following form, taking into account partitioning:

Tables : RU

-- сСкции ΠΏΠΎ Π΄Π°Ρ‚Π΅ сообщСния
CREATE TABLE "Π‘ΠΎΠΎΠ±Ρ‰Π΅Π½ΠΈΠ΅_YYYYMMDD"(
  "Π‘ΠΎΠΎΠ±Ρ‰Π΅Π½ΠΈΠ΅"
    uuid
      PRIMARY KEY
, "Π’Π΅ΠΌΠ°"
    uuid
, "Π”Π°Ρ‚Π°Π’Π΅ΠΌΡ‹"
    date
, "Автор"
    uuid
, "ДатаВрСмя" -- ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅ΠΌ ΠΊΠ°ΠΊ Π΄Π°Ρ‚Ρƒ
    timestamp
, "ВСкст"
    text
);

CREATE TABLE "АдрСсат_YYYYMMDD"(
  "ДатаБообщСния"
    date
, "Π‘ΠΎΠΎΠ±Ρ‰Π΅Π½ΠΈΠ΅"
    uuid
, "ΠŸΠ΅Ρ€ΡΠΎΠ½Π°"
    uuid
, PRIMARY KEY("Π‘ΠΎΠΎΠ±Ρ‰Π΅Π½ΠΈΠ΅", "ΠŸΠ΅Ρ€ΡΠΎΠ½Π°")
);

CREATE TABLE "Π€Π°ΠΉΠ»_YYYYMMDD"(
  "ДатаБообщСния"
    date
, "Π€Π°ΠΉΠ»"
    uuid
      PRIMARY KEY
, "Π‘ΠΎΠΎΠ±Ρ‰Π΅Π½ΠΈΠ΅"
    uuid
, "BLOB"
    uuid
, "Имя"
    text
);

CREATE TABLE "РССстрБообщСний_YYYYMMDD"(
  "ДатаБообщСния"
    date
, "Π’Π»Π°Π΄Π΅Π»Π΅Ρ†"
    uuid
, "ВипРССстра"
    smallint
, "ДатаВрСмя"
    timestamp
, "Π‘ΠΎΠΎΠ±Ρ‰Π΅Π½ΠΈΠ΅"
    uuid
, PRIMARY KEY("Π’Π»Π°Π΄Π΅Π»Π΅Ρ†", "ВипРССстра", "Π‘ΠΎΠΎΠ±Ρ‰Π΅Π½ΠΈΠ΅")
);
CREATE INDEX ON "РССстрБообщСний_YYYYMMDD"("Π’Π»Π°Π΄Π΅Π»Π΅Ρ†", "ВипРССстра", "ДатаВрСмя" DESC);

-- сСкции ΠΏΠΎ Π΄Π°Ρ‚Π΅ Ρ‚Π΅ΠΌΡ‹
CREATE TABLE "Π’Π΅ΠΌΠ°_YYYYMMDD"(
  "Π”Π°Ρ‚Π°Π’Π΅ΠΌΡ‹"
    date
, "Π’Π΅ΠΌΠ°"
    uuid
      PRIMARY KEY
, "Π”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚"
    uuid
, "НазваниС"
    text
);

CREATE TABLE "УчастникВСмы_YYYYMMDD"(
  "Π”Π°Ρ‚Π°Π’Π΅ΠΌΡ‹"
    date
, "Π’Π΅ΠΌΠ°"
    uuid
, "ΠŸΠ΅Ρ€ΡΠΎΠ½Π°"
    uuid
, PRIMARY KEY("Π’Π΅ΠΌΠ°", "ΠŸΠ΅Ρ€ΡΠΎΠ½Π°")
);

CREATE TABLE "Π”Π°Ρ‚Ρ‹Π‘ΠΎΠΎΠ±Ρ‰Π΅Π½ΠΈΠΉΠ’Π΅ΠΌΡ‹_YYYYMMDD"(
  "Π”Π°Ρ‚Π°Π’Π΅ΠΌΡ‹"
    date
, "Π’Π΅ΠΌΠ°"
    uuid
      PRIMARY KEY
, "Π”Π°Ρ‚Π°"
    date
);

Save a pretty penny

Well, what if we use classic sectioning. based on the distribution of field values ​​(via triggers and inheritance or PARTITION BY), but "manually" at the application level, you can see that the value of the partitioning key is already stored in the name of the table itself.

So if you are so are very worried about the amount of data stored, then you can get rid of these "extra" fields and address specifically to specific tables. True, all selections from several sections in this case will already have to be transferred to the application side.

Source: habr.com

Add a comment