Data marts DATA VAULT

In previous articles, we have seen the basics of DATA VAULT, extending DATA VAULT to a more parsable state, and creating a BUSINESS DATA VAULT. It's time to end the series with the third article.

As I announced in the previous ARTICLES, this article will focus on the topic of BI, or rather the preparation of DATA VAULT as a data source for BI. Let's look at how to create fact and dimension tables and thus create a star schema.

When I started studying English-language materials on the topic of creating data marts over DATA VAULT, I had a feeling that the process was rather complicated. Since the articles are of considerable length, there are references to changes in the wording that appeared in the Data Vault 2.0 methodology, the importance of these wordings is indicated.

However, having delved into the translation, it became clear that this process is not so complicated. But you may have a different opinion.

And so, let's get to the point.

Dimension and fact tables in DATA VAULT

The most difficult information to understand:

  • Measurement tables are built on information from hubs and their satellites;
  • Fact tables are built on information from links and their satellites.

And this is obvious after reading the article about DATA VAULT basics. Hubs store unique keys of business objects, their time-bound satellites of the state of business object attributes, satellites tied to links supporting transactions store the numerical characteristics of these transactions.

This is where the theory basically ends.

But, nevertheless, in my opinion, it is necessary to note a couple of concepts that can be found in articles about the DATA VAULT methodology:

  • Raw Data Marts - showcases of "raw" data;
  • Information Marts - information showcases.

The concept of "Raw Data Marts" - denotes marts built over DATA VAULT data by performing fairly simple JOINs. The “Raw Data Marts” approach allows you to flexibly and quickly expand the warehouse project with information suitable for analysis. This approach does not involve performing complex data transformations and executing business rules before being placed in the storefront, however, the Raw Data Marts data should be understandable to the business user and should serve as a basis for further transformation, for example, by BI tools.

The concept of “Information Marts” appeared in the Data Vault 2.0 methodology, it replaced the old concept of “Data Marts”. This change is due to the realization of the task of implementing a data model for reporting as a transformation of data into information. The “Information Marts” scheme, first of all, should provide the business with information suitable for decision making.

Rather wordy definitions reflect two simple facts:

  1. Showcases of the “Raw Data Marts” type are built on a raw (RAW) DATA VAULT, a repository containing only the basic concepts: HUBS, LINKS, SATELLITES;
  2. Showcases "Information Marts" are built using elements of BUSINESS VAULT: PIT, BRIDGE.

If we turn to examples of storing information about an employee, we can say that a storefront that displays the current (current) phone number of an employee is a storefront of the “Raw Data Marts” type. To form such a showcase, the employee's business key and the MAX() function used on the satellite loading date attribute (MAX(SatLoadDate)) are used. When it is required to store the history of attribute changes in the showcase - it is used, you need to understand from which to which date the phone was up-to-date, the compilation of the business key and the upload date to the satellite will add the primary key to such a table, the field of the end date of the validity period is also added.

Creating a storefront that stores up-to-date information for each attribute of several satellites included in the hub, for example, phone number, address, full name, implies the use of a PIT table, through which it is easy to access all dates of relevance. Showcases of this type are referred to as "Information Marts".

Both approaches are relevant for both measurements and facts.

To create storefronts that store information about several links and hubs, access to BRIDGE tables can be used.

With this article, I complete the series on the concept of DATA VAULT, I hope the information that I shared will be useful in the implementation of your projects.

As always, in conclusion, a few useful links:

  • Article Kenta Graziano, which, in addition to a detailed description, contains model diagrams;

Source: habr.com

Add a comment