Messenger database (part 1): designing the base frame

How can business requirements be translated into specific data structures using the example of designing a base for a messenger from scratch.

Messenger database (part 1): designing the base frame
Our database will not be as large and distributed, like on VKontakte or Badoo, and β€œto be”, but it was good - functionally, quickly and fit on one server PostgreSQL - so that you can deploy a separate instance of the service somewhere on the side, for example.

Therefore, we will not touch on the issues of sharding, replication and geo-distributed systems, but will focus on circuit solutions inside the database.

Step 1: Some business specifics

We will not design our messaging in the abstract, but embed it in the environment corporate social network. That is, our people do not β€œjust correspond”, but communicate with each other in the context of solving certain business problems.

And what are the tasks of a business?.. Let's look at the example of Vasily, the head of the development department.

  • β€œNikolay, we need a patch for this problem today!”
    This means that correspondence can be conducted in the context of some document.
  • β€œKolya, why in the evening in DotA?”
    That is, even one pair of interlocutors can communicate at the same time on various topics.
  • "Peter, Nikolai, look in the attachment for the price list for the new server."
    So, one message can have multiple recipients. In this case, the message may contain Attached files.
  • "Semyon, look at you too."
    And there should be an opportunity in an existing correspondence invite a new member.

Let's dwell on this list of "obvious" needs for now.

Without understanding the application specifics of the task and the restrictions set for it, design effective DB schema to solve it is almost impossible.

Step 2: Minimal Logic

Schematically, so far everything turns out very similar to email-correspondence - a traditional business tool. So yes, β€œalgorithmically” many business tasks are similar to each other, therefore, the tools for solving them will be structurally similar.

Let's fix the already obtained logical scheme of entity relationships. For ease of understanding of our model, we will use the most primitive display option ER models without the complications of UML or IDEF notations:

Messenger database (part 1): designing the base frame

In our example, the person, the document, and the binary β€œbody” of the file are β€œexternal” entities that exist on their own even without our service. Therefore, we will simply perceive them in the future as some links "somewhere" by UUID.

Draw diagrams as easy as possible β€” most of the people you will be showing them to are not experts in reading UML/IDEF. But be sure to draw.

Step 3: Sketching out the table structure

About table and field namesThe "Russian" names of fields and tables can be treated differently, but this is a matter of taste. Because the in our Tensor there are no foreign developers, and PostgreSQL allows us to give names even in hieroglyphs, if they enclosed in quotation marks, then we prefer to name objects unambiguously, so that there are no discrepancies.
Since many people write messages at once, some of them can do it at all offline, then the simplest option is use UUIDs as identifiers not only for external entities, but for all objects inside our service. Moreover, they can be generated even on the client side - this will help us support sending messages during a short-term inaccessibility of the database, and the probability of a collision is extremely small.

The draft table structure in our database will look like this:
Tables : RU

CREATE TABLE "Π’Π΅ΠΌΠ°"(
  "Π’Π΅ΠΌΠ°"
    uuid
      PRIMARY KEY
, "Π”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚"
    uuid
, "НазваниС"
    text
);

CREATE TABLE "Π‘ΠΎΠΎΠ±Ρ‰Π΅Π½ΠΈΠ΅"(
  "Π‘ΠΎΠΎΠ±Ρ‰Π΅Π½ΠΈΠ΅"
    uuid
      PRIMARY KEY
, "Π’Π΅ΠΌΠ°"
    uuid
, "Автор"
    uuid
, "ДатаВрСмя"
    timestamp
, "ВСкст"
    text
);

CREATE TABLE "АдрСсат"(
  "Π‘ΠΎΠΎΠ±Ρ‰Π΅Π½ΠΈΠ΅"
    uuid
, "ΠŸΠ΅Ρ€ΡΠΎΠ½Π°"
    uuid
, PRIMARY KEY("Π‘ΠΎΠΎΠ±Ρ‰Π΅Π½ΠΈΠ΅", "ΠŸΠ΅Ρ€ΡΠΎΠ½Π°")
);

CREATE TABLE "Π€Π°ΠΉΠ»"(
  "Π€Π°ΠΉΠ»"
    uuid
      PRIMARY KEY
, "Π‘ΠΎΠΎΠ±Ρ‰Π΅Π½ΠΈΠ΅"
    uuid
, "BLOB"
    uuid
, "Имя"
    text
);

Tables : EN

CREATE TABLE theme(
  theme
    uuid
      PRIMARY KEY
, document
    uuid
, title
    text
);

CREATE TABLE message(
  message
    uuid
      PRIMARY KEY
, theme
    uuid
, author
    uuid
, dt
    timestamp
, body
    text
);

CREATE TABLE message_addressee(
  message
    uuid
, person
    uuid
, PRIMARY KEY(message, person)
);

CREATE TABLE message_file(
  file
    uuid
      PRIMARY KEY
, message
    uuid
, content
    uuid
, filename
    text
);

The simplest thing when describing the format is to start β€œuntwisting” the connection graph from tables that are not referenced themselves to no one.

Step 4: Find out non-obvious needs

That's it, we have designed a base in which you can write perfectly and somehow to read.

Let's put ourselves in the place of the user of our service - what do we want to do with it?

  • Recent Posts
    It is a chronologically sorted on various grounds, a register of "my" messages. Where I am one of the addressees, where I am the author, where they wrote to me, but I did not answer, where they did not answer me, ...
  • Correspondence members
    Who even participates in this long, long chat?

Our structure allows us to solve both of these problems β€œin general”, but not quickly. The problem is that for sorting within the first task unable to create index, suitable for each of the participants (and you have to extract all the records), and to solve the second one, you need extract all messages on this topic.

Unforeseen user tasks can put bold cross on performance.

Step 5: Smart Denormalization

Both of our problems will help solve additional tables in which we will duplicate part of the dataneeded to form indexes suitable for our tasks on them.
Messenger database (part 1): designing the base frame

Tables : RU

CREATE TABLE "РССстрБообщСний"(
  "Π’Π»Π°Π΄Π΅Π»Π΅Ρ†"
    uuid
, "ВипРССстра"
    smallint
, "ДатаВрСмя"
    timestamp
, "Π‘ΠΎΠΎΠ±Ρ‰Π΅Π½ΠΈΠ΅"
    uuid
, PRIMARY KEY("Π’Π»Π°Π΄Π΅Π»Π΅Ρ†", "ВипРССстра", "Π‘ΠΎΠΎΠ±Ρ‰Π΅Π½ΠΈΠ΅")
);
CREATE INDEX ON "РССстрБообщСний"("Π’Π»Π°Π΄Π΅Π»Π΅Ρ†", "ВипРССстра", "ДатаВрСмя" DESC);

CREATE TABLE "УчастникВСмы"(
  "Π’Π΅ΠΌΠ°"
    uuid
, "ΠŸΠ΅Ρ€ΡΠΎΠ½Π°"
    uuid
, PRIMARY KEY("Π’Π΅ΠΌΠ°", "ΠŸΠ΅Ρ€ΡΠΎΠ½Π°")
);

Tables : EN

CREATE TABLE message_registry(
  owner
    uuid
, registry
    smallint
, dt
    timestamp
, message
    uuid
, PRIMARY KEY(owner, registry, message)
);
CREATE INDEX ON message_registry(owner, registry, dt DESC);

CREATE TABLE theme_participant(
  theme
    uuid
, person
    uuid
, PRIMARY KEY(theme, person)
);

Here we have applied two typical approaches used when creating auxiliary tables:

  • Record multiplication
    We form several records-consequences for one initial message record at once in different types of registries for different owners - both for the sender and for the recipient. But each of the registries now falls on the index - after all, in the typical case, we want to see only the first page.
  • Uniqueization of records
    Each time you send a message within a specific topic, it is enough to check if such an entry already exists. If not, we add it to our "dictionary".

In the next part of the article, we will talk about implementation of partitioning into our database structure.

Source: habr.com

Add a comment