Data division. year 2013. retrospective

In 2013 year IBS, which then seem to have created Data Division, asked me to make such a braindump (only based on the experience of interaction with corporate oil and gas customers) about the problem area of ​​Big Data, and Data in general. So I stumbled upon it 7 years later and it seemed funny. Some things are obvious. Some turned out to be not entirely true, but ... 7 years have passed.

I wrote in English and now I thought of translating it into Russian. Is there anything relevant right now? (I will translate the bulletins, and I will leave the signs in English from laziness. Green is good, red is dangerous, blue is a dream).

I will issue the minimum comments from “today” italicto be clear and distinct.

So DATA! We have data...

The Data Division is the Blood Division, because data can be compared, for example, to blood running through the veins and arteries of a business organism. However, although the blood is one, the organisms are different and therefore productization very difficult, but it also represents an opportunity for development.

There are people to whom the data is thrown directly into the eyes - this is Мы.
And there are people who, alas, point-blank do not see the data. This, again, alas, our Customers!

Data division. year 2013. retrospective

So, business principles...

  1. Sell businessAnd not IT (may all IT people forgive me at once) because we are solving world problems, well, more money.
  2. All business problems are concentrated around thematic industry verticals and will require adequate specializations.
  3. Attempts to prove the value of "data" or, even more difficult, the value of "data management" to business is eternal suffering and pain. In fact, it's like going to a person who feels good and saying: “Dude, we will treat your blood now, and, dude, it's expensive!”
  4. My direct “wet dream” is to sell “data extraction” and “analytics” within the SaaS model small and medium business, who got into 123 cloud services with cool interfaces: project management, helpdesk, accounting, CRM, payroll, time reporting, marketing, … you name it, and dug into the data. Youcalc and Successfactors (there probably aren't any) This is good!
  5. Look for people who like to tinker crunch with data. They are rare and strange (like fortune tellers on coffee grounds), but key to business. A poet, for example, can be very good at understanding correlation.
  6. Engineers needed! Needed to turn the problems the Crunchers pulled out of the data into solutions. And the success or failure of the decision depends entirely on them.
  7. Development opensource projects is of great value and makes it possible to "assemble" complex solutions almost "from scratch".
  8. But ... we must not forget that Hadoop is a library, and Lucene is also a library, and the distance between library and industrial product much!
  9. The built solutions will have to be significantly adapted, because modularity и integrability - key points.
  10. Agile (forgive me lord) is a key technique in interaction with the customer and verification hypotheses, which will be many.
  11. It is especially possible and necessary to outsource any coding and UI. All business analytics and specifications backend must be left inside and considered as a core competency.
  12. Business decision makers need to be constantly “informed” about the need to work properly with data and the constant search for new ways to analyze them. The combination of technical and business competencies of our employees will help raise the status of the entire organization as a whole.
  13. Internet - there is an endless source of inspiration (there were not so many cats back then) in relation to approaches to corporate data management, despite the fact that the objectives and scope differ significantly.

Data division. year 2013. retrospective

Technological postulates…

  1. There is huge development potential in simplification how data is shown to people. You can call it the word "iphonization".
  2. Despite the fact that BI vendors claim that they directly bring analytics to end users, (and they are certainly moving in that direction) - the breakthrough has not happened yet. People just don't understand multidimensional data.
  3. A user interface representing more or less complex, loosely structured data in faceted form - also presents an infinite number of problems. Conclusion: the more flatter (flatter) - the better.
  4. A platform built on the basis of automatic data extraction from sources (which are not always designed for such extraction) is highly dependent on the sources, the stability of the connectors, and the infrastructure. Failure to provide a result will always be blamed on the platform (messenger). Confidence – the capital of such platforms. Capital that is hard to earn and easy to lose.
  5. From a business perspective, there is no difference between Big Data analysis and Just Data. Often behind numbers as simple as 2x2 lie millions of dollars of opportunity. A good example is data on the end of life of infrastructure elements on the Norwegian shelf. When all the dates of future cap. repairs of all equipment were put on one axis and found out that in N years a shelf Armageddon was coming - one very wealthy person got up from his chair and hurriedly bowed out of the room with the words: “Sorry, I don’t have much time, I need to prepare the fleet ...”
  6. Excel, and in fact a clear and precise tabular presentation of data, has great power and a great future. I believe in beautiful tables (and still) and that's it!
  7. The main bow of all this "analytics" is decision making automation. There are the fattest opportunities, but also the highest risks, that’s why the opportunities are fat, that’s why the risks, that’s why the opportunities, that’s why the toffees… 🙂 Well drilling management, for example…
  8. If “integrability” is a key feature, then data should de facto be presented as a service. REST drives, but we must not forget about optimization productivity, which is now often sacrificed for integrability as computing power continues to grow.
  9. master data - this is what needs to be localized, extracted, standardized before addressing any business questions. The master data is small, but the problems with it are big! As the brothers of semantics say, 50% of all world problems are due to the fact that people call the same things by different names, and the other 50% from the fact that they call different things by the same name.
  10. Any encapsulation at the storage level limits the openness of the solution and leads to SILO fication. Well, if you are a big vendor, otherwise - so-so. (Here we are talking, of course, not about the block level and not about AWS S3, which was already 6 years old then, but about files).
  11. Relational Modeling data is no longer our friend. RDF and key-value are cool! We have seen the magical transformation of relational databases with models in 2000 tables into 15 tables, and none of the users lost anything.
  12. The Internet works because there is URL as a single addressing method. Importance of URL or rather URI for the information resources of the enterprise is difficult to overestimate.
  13. Text mining and NLP are popular. In the Internet. But in the corporate sector, huge gains can be made by extracting structured data from unstructured corporate data.
  14. Synergy between structured data and information extracted from unstructured data, i.e. files - analytical Klondike.
  15. When extracting data, do not forget about the rights and copyrights.
  16. A data mining company must form ahacker department, in the good sense of the word. Inspired by the hard-fought yellow pages defenses against search bots.
  17. Before working with data - they must be "see" in its entirety. It is hard to explain. Tabular forms come to my mind. For some, graphical representations, but any graph is already an interpretation. One way or another... "see"!
  18. Repeating in the issue of “trust” of users to the frontend. Trust in connectors/data generation processes, trust in data, trust in decisions.

Source: habr.com

Add a comment