Denormalization of ERP databases and its impact on software development: opening a tavern in Tortuga

Hello! My name is Andrey Semenov, I am a senior analyst at Sportmaster. In this post, I want to raise the issue of denormalizing ERP databases. We will look at the general conditions, as well as a specific example - let's say it will be a great tavern-monopoly for pirates and sailors. In which pirates and sailors must be served differently, because the ideas of beauty and consumer patterns of these good gentlemen differ significantly.

How to make everyone happy? How not to go crazy designing and maintaining such a system? What to do if not only the usual pirates and sailors begin to come to the tavern?

Denormalization of ERP databases and its impact on software development: opening a tavern in Tortuga

Everything is under the cut. But let's go in order.

1. Limitations and assumptions

All of the above applies only to relational databases. Well-publicized, including on the Internet, the consequences of denormalization in the form of anomalies of modification, deletion and insertion are not considered. Outside the scope of the publication, there are cases where denormalization is a common place, with classic examples: series and passport number, date and time, and so on.

The post uses intuitive and practically applicable definitions of normal forms, without reference to mathematical terms. In the form in which they can be applied to the survey of real business processes (BP) and the design of industrial software.

It has been argued that the design of data warehouses, reporting tools, and integration agreements (which use tabular representation of information) differs from the design of databases in ERP systems in that ease of consumption and the use of deliberate denormalization to achieve it can take precedence over integrity protection. data. I share this opinion, and the following applies solely to the master data and transactional data models of ERP systems.

The explanation of normal forms is given on an example that is understandable at the everyday level to most readers. However, as a visual illustration in paragraphs 4-5, an emphatically β€œfictitious” task was deliberately used. If you do not do this and take some textbook example, for example, the same order storage model from paragraph 2, you may find yourself in a situation where the reader’s attention will be shifted from the proposed expansion of the process into a model, to personal experience and perception of how processes and models of data storage in IS should be built. In other words, take two qualified IT analysts, let one provide service to the logisticians transporting passengers, the other to the logisticians transporting machines for the production of microchips. Ask them, without discussing in advance automated BPs, to create a data model for storing information about a railway trip.

There is a non-zero chance that in the proposed models you will find not only a markedly different set of attributes, but also mismatched sets of entities, because each analyst will rely on processes and tasks familiar to him. And it is impossible to say in such a situation which model is β€œcorrect”, because there is no evaluation criterion.

2. Normal forms

Denormalization of ERP databases and its impact on software development: opening a tavern in Tortuga

First normal form database requires atomicity of all attributes.
In particular, if object A has non-key attributes a and b, such that c=f(a,b) and you store the value of attribute c in the table describing object A, then the first normal form is violated in the database. For example, if an order specification specifies a quantity, the units of which depend on the type of goods: in one case, it can be pieces, in another, liters, in the third case, packages consisting of pieces (in the model above Good_count_WR), then the atomicity of attributes is violated in the database. In this case, in order to say what a cluster of tables should be for an order specification, you need a target description of the process of working in IS, and since the processes can be different, there can be many β€œcorrect” versions.

Second normal form database requires compliance with the first form and its own table for each entity related to the work process in IS. If in one table there are dependencies c=f1(a) and d=f2(b) and there is no dependency c=f3(b), then the second normal form is violated in the table. In the example above, in the Order table, there is no relationship between order and address. Change the name of a street or city and you won't have any effect on the essential attributes of the order.

third normal form database requires compliance with the second normal form and the absence of functional dependencies between attributes of different entities. This rule can be formulated as follows: "everything that can be calculated must be calculated." In other words, if there are two objects A and B. In the table storing the attributes of object A, the attribute C is manifested, the object B has an attribute b, such that c=f4(b) exists, then the third normal form is violated. In the example below, the attribute "Number of pieces" (Total_count_WR) in the order record clearly claims to violate the third normal form

3. My approach to applying normalization

1. Only a target automated business process can provide an analyst with criteria for identifying entities and attributes when creating a data storage model. Creating a process model is a prerequisite for creating a normal data model.

2. Achieving the third normal form in the strict sense may not be practical in the actual practice of creating ERP systems if some or all of the following conditions are met:

  • automated processes are rarely subject to change,
  • deadlines for research and development are short,
  • data integrity requirements are conditionally low (potential errors in industrial software do not lead to the loss of money or customers by the software customer)
  • etc.

Under the described conditions, the costs of identification, description of the life cycle of some objects and their attributes may not be justified from the point of view of economic efficiency.

3. Any consequences of denormalization of the data model in an already created IS can be stopped by a thorough preliminary study of the code and testing.

4. Denormalization is a way to transfer labor costs from the stage of researching data sources and designing a business process to the development stage, from the implementation period to the system development period.

5. It is advisable to strive for the third normal form of the database if:

  • The direction of change in automated business processes is difficult to predict
  • Within the implementation and / or development team there is a poorly permeable division of labor
  • The systems included in the integration circuit develop according to their own plans
  • Data inconsistency can lead to loss of customers or money by the company

6. The design of a data model should be carried out by an analyst only in connection with the models of the target business process and the process in IS. If a developer is designing a data model, he will have to immerse himself in the subject area to such an extent that, in particular, he understands the difference between attribute values ​​- a necessary condition for extracting atomic attributes. Thus, taking on unusual functions.

4 Task for illustration

Let's say you have a small robotic tavern in the port. Your market segment: sailors and pirates who call at the port and need a rest. To sailors you sell tea with thyme, and to pirates you sell rum and bone combs for combing your beard. The service in the tavern itself turns out to be a robot hostess and a robot bartender. Thanks to high quality and low prices, you have driven everyone out of the competition, so that everyone who leaves the ship comes to your tavern, which is the only one in the port.

The complex of information systems of the tavern consists of the following software:

  • Early warning system about the client, recognizing his category by characteristic features
  • Control system for hostess robots and bartender robots
  • Warehouse management system and delivery to the point of sale
  • Supplier Relationship Management System (SRM)

Process:

The early warning system recognizes people descending from the ship. If a person is clean-shaven, she defines him as a sailor, if a person has a beard, then he is defined as a pirate.

Entering the tavern, the guest hears a greeting from the robot-hostess in accordance with his category, for example: β€œHo-ho-ho, dear pirate, go to table number…”

The guest goes to the specified table, where the robot bartender has already prepared goods for him in accordance with the category. The bartender robot sends information to the warehouse system that the next portion of the delivery should be increased, the warehouse IS, based on the balance in storage, forms a purchase request in the SUOP.

Let the early warning system be developed by your internal IT, the bar robot management program was created by an external contractor specifically for your business. And the systems for warehouse and supplier relationship management are customized out-of-the-box solutions from the market.

5. Examples of denormalization and its impact on software development

When designing a business process, interviewed subject matter experts unanimously stated that all over the world, pirates drink rum and comb their beards with bone combs, and sailors drink thyme tea and are always clean-shaven.

A directory of types of clients appears from two values: 1 - pirates, 2 - sailors, common to the entire information loop of the company.

The client notification system immediately saves the result of image processing as an identifier (ID) of the recognized client and its type: sailor or pirate.

Recognized Object ID
Client category

100500
Pirate

100501
Pirate

100502
Sailor

Let us once again note that

1. Our sailors are actually clean-shaven people.
2. Our pirates are actually bearded people

What problems need to be eliminated in this case so that our structure tends to the third normal form:

  • Attribute Atomicity Violation - Customer Category
  • mixing the analyzed fact and conclusion in one table
  • fixed functional dependency between attributes of different entities.

In a normalized form, we would get two tables:

  • recognition result in the form of a set of established features,

Recognized Object ID
Hairline on the face

100500
Yes

100501
Yes

100502
No

  • the result of determining the type of client as an application of the logic embedded in the IS for interpreting the established signs

Recognized object ID
identification ID
Client category

100500
100001
Pirate

100501
100002
Pirate

100502
100003
Sailor

How can a normalized organization of data storage facilitate the development of the IS complex? Let's say you suddenly have new customers. Let them be Japanese pirates who may not have a beard, but they walk with a parrot on their shoulder, and environmental pirates, you can easily recognize them by Greta's blue profile on the left chest.

Environmental pirates, of course, cannot use bone combs and require an analogue made from recycled marine plastic.

You need to rework the algorithms of the programs in accordance with the new inputs. If the normalization rules were followed, then you would only have to add inputs for some process branches in part of the systems and create new branches only for those cases and in those ISs where facial hair matters. But, since the rules have not been met, you will have to analyze the entire code, in the entire circuit, where the values ​​of the reference book of client types are used and unambiguously establish that in one case the algorithm should take into account the professional activity of the client, and in the other physical features.

In the form that seeks to normalized, we would get two tables with operational data and two directories:

Denormalization of ERP databases and its impact on software development: opening a tavern in Tortuga

  • recognition result in the form of a set of established features,

Recognized object ID
Greta on the left chest
Bird on the shoulder
Hairline on the face

100510
1
1
1

100511
0
0
1

100512

1
0

  • the result of determining the type of client (let it be a custom view in which descriptions from reference books are displayed)

Does the detected denormalization mean that the systems cannot be modified to fit the new conditions? Of course not. If we imagine that all ISs were created by one team with zero staff turnover, the developments are well documented, and the information in the team is transferred without loss, then the required changes can be made with negligible effort. But if we return to the initial conditions of the task, only 1,5 keyboards will be erased for printing the protocols of joint discussions and another 0,5 for registration of procurement procedures.

In the example above, all three normal forms are violated, let's try to violate them one by one.

Violation of the first normal form:

Let's say goods are delivered to your warehouse from suppliers' warehouses by self-delivery using one 1.5-ton gazelle that belongs to your tavern. The size of your orders is so small in relation to the turnover of suppliers that they are always carried out one to one without waiting for production. Do you need separate tables for such a BP: vehicles, types of vehicles, do you need to separate the plan and the fact in your orders for departed suppliers?

Just imagine how many β€œextra” connections your programmers will have to write if you use the model below to develop a program.

Denormalization of ERP databases and its impact on software development: opening a tavern in Tortuga

Suppose we have decided that the proposed structure is unnecessarily complex, for our case, separating the plan and the fact in the order record is redundant information, and the generated order specification is overwritten based on the results of the acceptance of the arrived goods, a rare re-sorting and the arrival of goods of inadequate quality are settled outside the IS.
And then one day you see how the entire hall of the tavern is filled with indignant and unkempt pirates. What happened?

It turned out that as your business grew, so did your consumption. Once upon a time, a management decision was made that if the gazelle was overloaded in volume and / or weight, which was extremely rare, the supplier would prioritize the load in favor of drinks.

Undelivered goods ended up in the next order and left on a new flight, the presence of an irreducible balance in the warehouse at the tavern made it possible not to notice the punctured cases.

The last competitor was closed in the port, and the gouged case of overloading the gazelle, bypassed by prioritization based on the assumption of the sufficiency of the minimum balance and periodic underloading of the vehicle, became a common practice. The created system will ideally work out in accordance with the algorithms embedded in it and will be deprived of any possibility to trace the systematic non-fulfillment of planned orders. Only a damaged reputation and dissatisfied customers will be able to detect the problem.

The attentive reader must have noticed that the ordered quantity in the order specification (T_ORDER_SPEC) in section 2 and in section 5 may or may not meet the requirements of the first normal form. It all depends on whether, with the selected range of goods, essentially different units of measurement can fall into the same field.

Violation of the second normal form:

As your needs grow, you get a couple more vehicles of different sizes. In the above context, the creation of a vehicle directory was considered redundant, as a result, all algorithms for working with data that serve the needs of delivery and warehouse perceive the movement of cargo from the supplier to the warehouse as a flight of only a 1,5-ton gazelle. So, along with the purchase of new vehicles, you still create a vehicle directory, but when finalizing, you will have to analyze all the code that refers to the movement of cargo to find out if in each specific place references to the characteristics of the very vehicle from which business started.

Violation of the third normal form:

At some point, you start creating a loyalty program, a regular customer record appears. Why, for example, spend time creating material views that store aggregated data on sales to an individual client for use in reporting and transfer to analytical systems, if at the start of the loyalty program everything that the customer is interested in can be placed on the record of the client himself? And, indeed, it makes no sense at first glance. But every time your business connects, for example, new sales channels, there should be someone among your analysts who will remember that such an aggregation attribute exists.

When designing each new process, for example, sales on the Internet, sales through distributors connected to a common loyalty system, someone must keep in mind that all new processes must ensure data integrity at the code level. For an industrial database with a thousand tables, this seems like an insurmountable task.

An experienced developer, of course, knows how to stop all the problems mentioned above, but, in my opinion, the task of an experienced analyst is not to bring them to the point.

I would like to express my gratitude for the valuable feedback during the preparation of the publication to the lead developer Evgeny Yarukhin.

Literature

https://habr.com/en/post/254773/
Connolly Thomas, Begg Carolyn. Database. Design, implementation and support. Theory and practice

Source: habr.com

Add a comment