Gartner MQ Review 2020: Machine Learning and AI Platforms

It is impossible to explain the reason why I read this. It was just that there was time and it was interesting how the market works. And this is already a full-fledged market according to Gartner since 2018. From 2014-2016 it was called advanced analytics (roots in BI), in 2017 - Data Science (I don’t know how to translate this into Russian). Who is interested in the movement of vendors around the square - you can here look. And I will talk about the square of 2020, especially since the changes there since 2019 are minimal: SAP left and Altair bought Datawatch.

This is not a systematic analysis and not a table. Individual view, still from the point of view of geophysics. But I'm always curious to read Gartner MQ, they formulate some points perfectly. So here are the things that I paid attention to both technically, and in the market, and philosophically.

This is not for people who are deep in the subject of ML, but for people who are interested in what is generally happening in the market.

The DSML market itself logically nests between BI and Cloud AI developer services.

Gartner MQ Review 2020: Machine Learning and AI Platforms

First liked quotes and terms:

  • "A Leader may not be the best choice" — The market leader is not necessarily what you need. Very pressing! As a consequence of the absence of a functional customer, everyone is always looking for the “best” solution, and not the “suitable” one.
  • "Model operation" - abbreviated as MOPs. And with pugs it’s hard for everyone! – (cool pug theme makes the model work).
  • Notebook environment is an important concept where code, comments, data and results are combined together. This is very clear, promising and can significantly reduce the amount of UI code.
  • "Rooted in Open Source" - well said - rooted in open source.
  • "Citizen Data Scientists" - such easy dudes, such lamers, not experts who need a visual environment and all sorts of auxiliary things. They won't code.
  • Democracy - often used in the sense of "make available to a wider range of people." You can say "democratize the data" instead of the dangerous "free the data" we used to use. "Democratise" is always a long tail and all vendors are running after it. Lose in science intensity - win in accessibility!
  • "Exploratory Data Analysis - EDA" - consideration by these improvised means. Some statistics. A little visualization. Something that everyone does to one degree or another. Didn't know there was a name for this
  • "Reproducibility" - the maximum preservation of all parameters of the environment, inputs and outputs so that you can repeat the experiment once carried out. The most important term for the experimental test environment!

So:

alteryx

Cool interface just like a toy. With scalability, of course, a bit tight. Accordingly, the community of Citizen engineers around the same with tsatskami to play. Analytics has its own all in one bottle. Reminded me of a complex of spectral-correlation data analysis Coscad, which was programmed in the 90s.

Anaconda

Community around Python and R experts. The open source is large, respectively. It turned out that my colleagues constantly use. I didn't know.

DataBricks

Consists of three open source projects - Spark developers have raised a hell of a lot of money since 2013. I have to directly quote the wiki:

“In September 2013, Databricks announced that it had raised $13.9 million from Andreessen Horowitz. The company raised additional $33 million in 2014, $60 million in 2016, $140 million in 2017, $250 million in 2019 (Feb) and $400 million in 2019 (Oct)”!!!

Some great people sawed Spark. Not familiar sorry!

And the projects are:

  • Delta Lake - ACID on Spark was recently released (what we dreamed about with Elasticsearch) - turns it into a database: hard schema, ACID, audit, versions ...
  • M.L. Flow – tracking, packaging, management and storage of models.
  • Koalas - Pandas DataFrame API on Spark - Pandas - Python API for working with tables and data in general.

You can see about Spark, who suddenly does not know or forgot: link. I looked at the videos with examples from a little boring but detailed consulting woodpeckers: DataBricks for Data Science ( link) and for Data Engineering ( link).

In short, Databricks is pulling out Spark. Who wants to use Spark normally in the cloud takes DataBricks without hesitation, as intended 🙂 Spark is the main differentiator here.
Learned that Spark Streaming is not real fake realtime or microbatching. And if you need a real Real Real time - this is in Apache STORM. Everyone else says and writes that Spark is cooler than MapReduce. This is the slogan.

DATAIKU

Cool little end-to-end thing. Lots of ads. I do not understand how it differs from Alteryx?

DataRobot

Paxata for data preparation cool is a separate company that was bought by Data Robots in December 2019. Raised 20 MUSD and sold. All in 7 years.

Preparing data in Paxata, not Excel - see here: link.
Automatic lookups are there and offer joins between two datasets. A great thing - to deal with the data, there would be more emphasis on textual information ( link).
Data Catalog is an excellent catalog of useless “live” datasets.
It's also interesting how directories are formed in Paxata ( link).

“According to analyst firm Ovum, the software is made possible through advances in predictive analytics, machine learning and the NoSQL data caching methodology.[15] The software uses semantics algorithms to understand the meaning of a data table's columns and pattern recognition algorithms to find potential duplicates in a data-set.[15][7] It also uses indexing, text pattern recognition and other technologies traditionally found in social media and search software.”

The main product of Data Robot is here. Their slogan is from Model to Enterprise Application! I found consulting for the oil industry in connection with the crisis, but very banal and uninteresting: link. I watched their videos on Mops or MLops ( link). This is such a Frankenstein assembled from 6-7 acquisitions of various products.

Of course, it becomes clear that a large team of Data Scientists should have just such an environment for working with models, otherwise they will spawn a lot of them and never deploy anything. And in our oil and gas upstream reality - one model would be successful to create and this is already a big progress!

The process itself was very reminiscent of the work of design systems in geology and geophysics, for example Petrel. All and sundry make and modify models. Collect data in the model. Then they made a reference model and put it into production! Those between say a geological model and an ML model can find a lot in common.

Domino

Emphasis on an open platform and collaboration. Business users are allowed free. Their Data Lab is very similar to Sharepoint. (And from the name strongly gives IBMom). All experiments are linked to the original dataset. How familiar it is 🙂 As in our practice, some data was dragged into the model, then it was cleaned and put in order in the model, and all this already lives in the model and there are no ends in the original data.

Domino has a cool infrastructure virtualization. I assembled the machine as many cores as needed in a second and went to count. How it was done is not entirely clear right away. Docker everywhere. Lots of freedom! Any workspaces of the latest versions can be connected. Parallel run of experiments. Tracking and selection of successful ones.

The same as DataRobot - the results are published for business users in the form of applications. For especially gifted "stakeholders". And the actual use of models is also monitored. All for Pugs!

I didn’t fully understand how complex models go into production. Some kind of API is provided to feed them data and get results.

H2O

Driveless AI is a very compact and clear system for Supervised ML. All in one box. About the backend is not clear until the end right away.

The model is automatically packaged in a REST server or Java App. This is a great idea. Much has been done for Interpretability and Explainability. Interpretation and explanation of the results of the model (What, in its essence, should not be explained, otherwise a person can calculate the same?).
For the first time, a case about unstructured data and NLP. High-quality architectural picture. In general, I liked the pictures.

There is a large open source H2O framework that is not entirely clear (a set of algorithms / libraries?). Own laptop visual without programming like Jupiter ( link). I also read about Pojo and Mojo - H2O models wrapped in java. The first in the forehead, the second with optimization. The H20s are the only ones(!) that Gartner has listed text analytics and NLP as strengths, as well as their explanability efforts. It is very important!

Ibid: high performance, optimization, and the industry standard for hardware and cloud integration.

And in weakness it is logical - Driverles AI is rather weak and narrow compared to their own open source. Data preparation is lame compared to the same Paxata! And they ignore industrial data - stream, graph, geo. Well, it can't be all right.

KNIME

I liked 6 very specific very interesting business cases on the main page. Strong OpenSource.

Gartner is downgraded from leaders to visionaries. Poor money is earned - a good sign for users, given that the Leader is not always the best choice.

The key word, as in H2O, is augmented, which means helping poor citizen data scientists. For the first time someone in the review scolded for performance! Interesting? That is, there is so much computing power that performance cannot be a system problem at all? About this word “Augmented”, Gartner has A separate article, which could not be reached.
And KNIME in the review seems to be the first non-American! (And our designers really liked their landing page. Strange people.

MathWorks

MatLab is an old honorary friend known to everyone! Toolboxes for all areas of life and situations. Something very different. In fact, a lot, a lot, a lot of mathematics for all general occasions!

Simulink complementary product for systems design. I dug into toolboxes for Digital Twins - I don’t understand anything about it, but here a lot has been written. For oil industry. In general, this is a fundamentally different product from the depths of mathematics and engineering. For the selection of specific mathematics toolkits. According to Gartner, they have problems like smart engineers - no collaboration - everyone digs in their own model, no democracy, no exploitability.

Rapid Miner

I have come across and heard a lot before (along with Matlab) in the context of a good open source. I dug a little into TurboPrep as usual. I'm interested in how to get clean data from dirty data.

Again, you can see that the people are good in the 2018 marketing materials and the terrible English speaking people in the feature demo.

And people from Dortmund since 2001 with a strong German past)

Gartner MQ Review 2020: Machine Learning and AI Platforms
I didn’t understand from the site what exactly is available in open source - you need to dig deeper. Good videos about deployment and AutoML concepts.

There is nothing special about the RapidMiner Server backend either. It will probably be compact and work well on premice out of the box. Packaged in Docker. Shared environment only on the RapidMiner server. And there is also Radoop, data from Hadup, rhymes from Spark in Studio workflow.

Moved them down as expected young hot vendors "sellers of striped sticks." Gartner, however, predicts their future success in the Enterprise space. You can raise money there. Germans know how to do this 🙂 Don't mention SAP!!!

They do a lot for the Citizens! But on the page you can see how Gartner says that they are a little tight with sales innovation and they are not fighting for the breadth of coverage, but for profitability.

Remained SAS и Tibco typical BI vendors for me… And both are in the very top, which confirms my confidence that normal DataScience is growing logically
from BI, not from clouds and Hadoop infrastructures. From business, that is, not from IT. As in Gazpromneft, for example: link, a mature DSML environment grows out of a solid BI practice. But maybe she's stinky and skewed to MDM and other things, who knows.

SAS

Nothing special to say. Only obvious things.

TIBCO

The strategy is read in a shopping list on a page long Wiki. Yes, a long story, but 28!!! Charles. bribed BI Spotfire (2007) back in my techno days. And also reporting by Jaspersoft (2014), then as many as three vendors of predictive analytics Insightful (S-plus) (2008), Statistica (2017) and Alpine Data (2017), event processing and streaming Streambase System (2013), MDM Orchestra Networks (2018 ) and Snappy Data (2019) in-memory platform.

Hello Frankie!

Gartner MQ Review 2020: Machine Learning and AI Platforms

Source: habr.com

Add a comment