Tableau in retail, really?

The time for reporting in Excel is rapidly running out - the trend for convenient tools for presenting and analyzing information is visible in all areas. We have been discussing the digitalization of reporting for a long time and chose the visualization system and self-service analytics Tableau. Alexander Bezugly, Head of the Department of Analytical Solutions and Reporting of the M.Video-Eldorado Group, spoke about the experience and results of building a combat dashboard.

I will say right away that not everything that was conceived was implemented, but the experience was interesting, I hope it will be useful to you. And if someone has ideas on how it could be done better, I will be very grateful for advice and ideas.

Tableau in retail, really?

Under the cut about what we encountered and what we learned about.

Where did we start

M.Video-Eldorado has a well-developed data model: structured information with the required storage depth and a huge number of fixed-form reports (see details). this article). Of these, analysts make either pivot tables or formatted mailings in Excel, or beautiful presentations in PowerPoint for end users.

About two years ago, instead of fixed-form reports, we started creating analytical reports in SAP Analysis (an add-on for Excel, in fact, a pivot table over the OLAP engine). But this tool could not cover the needs of all users, most of them continued to use the information additionally processed by analysts.

Our end users fall into three categories:

Top management. Requests information in a well-presented and easily understandable way.

Middle management, advanced users. Interested in data exploration and able to build reports on their own with tools available. It was they who became the key users of analytical reports in SAP Analysis.

Bulk Users. Not interested in independent data analysis, use reports with a limited degree of freedom, in the format of mailing lists and pivot tables in Excel.

Our idea was to cover the needs of all users and give them a single convenient tool. We decided to start with top management. They needed easy-to-use dashboards to analyze key business results. So, we started with Tableau and chose two areas to start with: retail and online sales metrics with limited depth and breadth of analysis that would cover about 80% of the data requested by top management.

Since top management was the users of dashboards, another additional KPI of the product appeared - response speed. No one will wait 20-30 seconds until the data is updated. Navigation had to fit in 4-5 seconds, and it is better to work out instantly. And, alas, we failed to achieve this.

This is what our main dashboard layout looked like:

Tableau in retail, really?

The key idea is to combine the main KPI drivers, of which there are 19 in the end, on the left and present their dynamics and breakdown by main attributes on the right. The task seems simple, the visualization is logical and understandable until you dive into the details.

Detail 1. Scope of data

The main sales table for the year takes us about 300 million rows. Since it is also necessary to reflect the dynamics for the past and the year before last, the amount of data on actual sales alone is about 1 billion rows. In addition, information on planned data and an online sales block are stored separately. Therefore, even though we used the SAP HANA columnar in-memory DB, the speed of the query with the selection of all indicators for one week from the current storages on the fly was about 15-20 seconds. The solution to this problem suggests itself - additional materialization of data. But it also has pitfalls, about them below.

Detail 2. Non-additive indicators

Many of our KPIs are tied to the number of receipts. And this indicator is a COUNT DISTINCT of the number of lines (check headers) and shows different amounts depending on the selected attributes. For example, how this indicator and its derivative should be considered:

Tableau in retail, really?

For the correctness of the calculations, you can:

  • Calculate such indicators on the fly in the repository;
  • Perform calculations on the entire amount of data in Tableau, i.e. upon request in Tableau, return all data for the selected filters in the granularity of the receipt position;
  • Make a materialized showcase in which all indicators will be calculated in all variants of samples that give different non-additive results.

It is clear that in the example UTE1 and UTE2 are material attributes representing the product hierarchy. This is not a static thing; management within the company goes through it, because. Different managers are responsible for different product groups. We had many global revisions of this hierarchy, when all levels changed, when relationships were revised, and constant point changes, when one group moves from one node to another. In conventional reporting, all this is calculated on the fly from the attributes of the material; in the case of materialization of this data, it is necessary to develop a mechanism for tracking such changes and automatically reloading historical data. A very non-trivial task.

Detail 3. Data comparison

This item is similar to the previous one. The bottom line is that in the analysis it is customary for a company to form several levels of comparison with the previous period:

Comparison with the previous period (day by day, week by week, month by month)

This comparison assumes that, depending on the period selected by the user (for example, week 33 of the year), we should show the dynamics by week 32, if we selected data for a month, for example, May, then this comparison would show the dynamics by April.

Comparison with last year

Here the main nuance is that when comparing by days and by weeks, you do not take the same day of the last year, i.e. you can't just put the current year minus one. You must look at the day of the week being compared. And when comparing months, on the contrary, you need to take exactly the same calendar day of the last year. There are also nuances with leap years. In the source repositories, all information is distributed by day, there are no separate fields with weeks, months, years. Therefore, to obtain a complete analytical slice in the panel, it is necessary to count not one period, for example, a week, but 4 weeks, and then compare these data, reflect the dynamics, deviations. Accordingly, this logic of forming a comparison in dynamics can also be implemented either in Tableau or on the side of the storefront. Yes, and we certainly knew and thought about these details at the design stage, but it was difficult to predict their impact on the performance of the final dashboard.

When implementing the dashboard, we went a long Agile way. Our task was to provide a working tool with the necessary data for testing as soon as possible. Therefore, we went in sprints and started from minimizing work on the side of the current storage.

Part 1. Faith in Tableau

To simplify IT support and quickly implement changes, we decided to make the logic for calculating non-additive indicators and comparing past periods in Tableau.

Stage 1. All in Live, no showcase modifications.

At this stage, we connected Tableau to the current storefronts and decided to see how the number of receipts for one year would be calculated.

Result:

The answer was depressing - 20 minutes. Distillation of data over the network, high load on Tableau. We realized that logic with non-additive indicators needs to be implemented on HANA. This didn’t scare us too much, we already had a similar experience with BO and Analysis, and we were able to build fast storefronts in HANA that give correctly calculated non-additive indicators. Now it was necessary to adjust them for Tableau.

Stage 2. Tune the showcase, no materialization, everything is on the fly.

We made a separate new storefront that gave out the required data for TABLEAU on the fly. In general, we got a good result, we reduced the time for the formation of all indicators in one week to 9-10 seconds. And we honestly expected that in Tableau the response time of the dashboard would be 20-30 seconds at the first opening and then due to the cache from 10 to 12, which in general would suit us.

Result:

First opened dashboards: 4-5 minutes
Any click: 3-4 minutes
No one expected such an additional increase in the work of the showcase.

Part 2. Dive into Tableau

Stage 1. Tableau performance analysis and quick tuning

We began to analyze what Tableau spends most of its time on. And there are quite good tools for this, which, of course, is a plus for Tableau. The main problem we identified was the very complex SQL queries that Tableau built. They were primarily associated with:

— data transposition. Since Tableau does not have tools for transposing datasets, to build the left side of the dashboard with a detailed view of all KPIs, we had to form a table through case. The size of SQL queries in the database reached 120 characters.

Tableau in retail, really?

- choice of time period. Such a query at the database level took more time to compile than to execute:

Tableau in retail, really?

Those. request processing 12 seconds + 5 seconds execution.

We decided to simplify the calculation logic on the Tableau side and move one more part of the calculations to the storefront and the database level. This brought good results.

First, we did the transposition on the fly, did it through full outer join at the final stage of the VIEW calculation, according to this approach described on wiki Transpose - Wikipedia, the free encyclopedia и Elementary matrix - Wikipedia, the free encyclopedia.

Tableau in retail, really?

That is, we made a tuning table - a transposition matrix (21x21) and got all the indicators line by line.

It was:
Tableau in retail, really?

After:
Tableau in retail, really?

It takes almost no time to transpose the database itself. The request for all indicators for the week worked out as before in about 10 seconds. But on the other hand, flexibility was lost in terms of building a dashboard according to a specific indicator, i.e. for the right side of the dashboard, where the dynamics and detailed breakdown of a specific indicator are presented, the showcase used to work out in 1-3 seconds, because the query was for one indicator, and now the database always selects all indicators and filters the result before returning the result to Tableau.

As a result, the speed of the dashboard was reduced by almost 3 times.

Result:

  1. 5 sec - parsing dashboards, visualizations
  2. 15-20 sec - preparation for compiling queries with precalculations in Tableau
  3. 35-45 sec - compilation of SQL queries and their parallel-sequential execution in Hana
  4. 5 sec - processing results, sorting, recalculating visualizations in Tableau
  5. Of course, such results did not suit the business, and we continued to optimize.

Stage 2. Minimum logic in Tableau, full materialization

We understood that it was impossible to build a dashboard with a response time of several seconds on a storefront that works for 10 seconds, and we considered options for materializing data on the database side exactly for the required dashboard. But we faced the global problem described above - non-additive indicators. We could not make it so that, when changing filters or sweeps, Tableau flexibly switched between different storefronts and levels pre-calculated for different product hierarchies (in the example, three queries without UTE, with UTE1 and UTE2 form different results). Therefore, we decided to simplify the dashboard, abandon the product hierarchy in the dashboard and see how fast it can be in a simplified version.

So, at this last stage, we have collected a separate repository, in which we have added all the KPIs in a transposed form. On the database side, any request to such a storage is processed in 0,1 - 0,3 seconds. In the dashboard, we got the following results:

First opening: 8-10 seconds
Any click: 6-7 seconds

The time that Tableau spends consists of:

  1. 0,3 sec. — dashboard parsing and compilation of SQL queries
  2. 1,5-3 sec. — execution of SQL queries in Hana for the main visualizations (launched in parallel with step 1)
  3. 1,5-2 sec. — rendering, recalculation of visualizations
  4. 1,3sec — execution of additional SQL queries to obtain relevant filter values ​​(Brand, Division, City, Store), parsing the results

To summarize briefly

We liked the Tableau tool in terms of visualization. At the layout stage, we considered different visualization elements and found all of them in libraries, including complex multi-level segmentation and multi-driver waterfall.

While implementing dashboards with key sales metrics, we ran into performance issues that we haven't been able to overcome yet. We spent more than two months and received a functionally incomplete dashboard, the response speed of which is on the verge of acceptable. And they came to their own conclusions:

  1. Tableau does not know how to work with large amounts of data. If you have more than 10 GB of data in the original data model (approximately 200 million x 50 rows), then the dashboard will seriously slow down - from 10 seconds to several minutes per click. We experimented with both live-connect and extract. The speed is comparable.
  2. Limitation when using multiple storages (datasets). There is no way to indicate the relationship of datasets by standard means. If you use workarounds to link datasets, then this will greatly affect performance. In our case, we considered the option of materializing data in each required section of views and making switches on these materialized datasets while maintaining the previously selected filters - this turned out to be impossible to do in Tableau.
  3. It is not possible to make dynamic parameters in Tableau. You cannot fill in the parameter that is used to filter the dataset in the extract or during live-connect with the result of another selection from the dataset or the result of another SQL query, only native user input or a constant.
  4. Restrictions associated with building a dashboard with OLAP|Pivot Table elements.
    In MSTR, SAP SAC, SAP Analysis, if you add a dataset to a report, then all objects on it are connected to each other by default. This is not the case in Tableau, the relationship must be configured manually. This is probably more flexible, but for all our dashboards, this is a mandatory requirement for elements - so it's additional labor. Moreover, if you make related filters, so that, for example, when filtering a region, the list of cities is limited only to the cities of this region, you immediately get to sequential queries to the database or Extract, which noticeably slows down the dashboard.
  5. Function restrictions. Mass transformations cannot be done both over the extract and ALL THE MORE over the dataset from Live-connecta. This can be done through Tableau Prep, but this is additional work and another tool that needs to be learned and maintained. You, for example, cannot transpose data, join it with itself. What is closed through transformations on individual columns or fields, which must be selected through case or if, and this generates very complex SQL queries, in which the database spends most of its time compiling the query text. These inflexibility of the tool had to be solved at the storefront level, which leads to storage complexity, additional loads and transformations.

We have not given up on Tableau. But as a tool capable of building industrial dashboards and a tool with which you can replace and digitalize the entire corporate reporting system of a company, Tableau is not considered.

Now we are actively developing a similar dashboard on another tool and at the same time we are trying to revise the architecture of the dashboard in Tableau in order to simplify it even more. If the community is interested, we will tell about the results.

We are also waiting for your ideas or advice on how you can build quick dashboards on such large amounts of data in Tabeau, because we also have a site where there is much more data than in retail.

Source: habr.com

Add a comment