Apache Ignite Zero Deployment: exactly Zero?

Apache Ignite Zero Deployment: exactly Zero?

We are a retail network technology development department. Once the management set the task of speeding up volumetric calculations by using Apache Ignite in conjunction with MSSQL, showed a site with beautiful illustrations and Java code examples. Liked it on the site Zero Deployment, whose description promises miracles: you don't have to manually deploy your Java or Scala code on each node in the grid and re-deploy it each time it changes. In the course of work, it turned out that Zero Deployment has the specifics of use, the features of which I want to share. Under the cut are reflections and implementation details.

1. Statement of the problem

The essence of the problem is as follows. There is a SalesPoint point of sale directory and a Sku product directory (Stock Keeping Unit). The point of sale has an attribute "Store type" with the values ​​"small" and "large". An assortment (a list of products of the point of sale) is connected to each point of sale (loaded from the DBMS) and information is provided that from the specified date the specified product
removed from the range or added to the range.

It is required to organize a partitioned cache of points of sale and store in it information about connected products for a month in advance. Compatibility with the combat system requires the Ignite client node to load data, calculate an aggregate of the form (Store type, ItemID, day, number_of_sales_points) and upload it back to the DBMS.

2. Literature study

There is no experience yet, so I'm starting to dance from the stove. That is, from a review of publications.

2016 Article Introduction to Apache Ignite: Getting Started contains a link to the documentation of the Apache Ignite project and at the same time reproach for the vagueness of this documentation. Re-read a couple of times, clarity does not come. Referring to the official tutorial getting-startedWhich
optimistically promises "You'll be up and running in a jiffy!". I deal with the settings of environment variables, I watch two Apache Ignite Essentials videos, for my specific task they were not very useful. Successfully launch Ignite from the command line with the default "example-ignite.xml" file, build the first application Compute Application using maven. The application works and uses Zero Deployment, what a beauty!

I read further, and there the example immediately uses affinityKey (created earlier through an SQL query), and even the mysterious BinaryObject is used:

IgniteCache<BinaryObject, BinaryObject> people 
        = ignite.cache("Person").withKeepBinary(); 

Read slightly: binary format - something like reflection, access to object fields by name. Can read the value of a field without completely deserializing the object (memory savings). But why is BinaryObject used instead of Person, because there is Zero Deployment? Why IgniteCache translated into IgniteCache ? It's not clear yet.

I am redesigning the Compute Application for my case. The primary key of the point of sale directory in MSSQL is defined as [id] [int] NOT NULL, I create a cache by analogy

IgniteCache<Integer, SalesPoint> salesPointCache=ignite.cache("spCache")

In the xml-config I specify that the cache is partitioned

<bean class="org.apache.ignite.configuration.CacheConfiguration">
    <property name="name" value="spCache"/>
    <property name="cacheMode" value="PARTITIONED"/>
</bean>

Partitioning by points of sale assumes that the required aggregate will be built on each node of the cluster for the salesPointCache records available there, after which the client node will perform the final summation.

Reading the tutorial First Ignite Compute Application, I do by analogy. On each node of the cluster, I run IgniteRunnable (), something like this:

  @Override
  public void run() {
    SalesPoint sp=salesPointCache.get(spId);
    sp.calculateSalesPointCount();
    ..
  }

I add the logic of aggregation and unloading, I run it on a test data set. Everything works locally on the development server.

I run two CentOs test servers, specify ip addresses in default-config.xml, execute on each

./bin/ignite.sh config/default-config.xml

Both Ignite nodes start up and see each other. I specify the necessary addresses in the xml-config of the client application, it starts, adds a third node to the topology, and immediately there are two nodes again. The log says "ClassNotFoundException: model.SalesPoint" in the line

SalesPoint sp=salesPointCache.get(spId);

StackOverflow says that the cause of the error is that there is no custom SalesPoint class on CentOs servers. We've arrived. How about "you don't have to manually deploy your Java code on each node" and so on? Or is β€œyour Java code” not about SalesPoint?

I probably missed something - I start searching again, reading and searching again. After a while, there is a feeling that I have read everything on the topic, there is nothing new anymore. While searching, I found some interesting remarks.

Valentin Kulichenko, Lead Architect at GridGain Systems, answer on StackOverflow, April 2016:

Model classes are not peer deployed, but you can use withKeepBinary() flag
on the cache and query BinaryObjects. This way you will avoid deserialization
on the server side and will not get ClassNotFoundException.

Another authoritative opinion: Denis Magda, Director of product management, GridGain Systems.

Article on HabrΓ© about microservices cites three articles by Denis Magda: Microservices Part I, Microservices Part II, Microservices Part III 2016-2017. In the second article, Denis suggests starting a cluster node via MaintenanceServiceNodeStartup.jar. You can also use launch with xml configuration and command line, but then you need to manually put custom classes on each deployed cluster node:

That's it. Start (..)  node using MaintenanceServiceNodeStartup file or pass
maintenance-service-node-config.xml to Apache Ignite's ignite.sh/bat scripts.
If you prefer the latter then make sure to build a jar file that will contain
all the classes from java/app/common and java/services/maintenance directories.
The jar has to be added to the classpath of every node where the service
might be deployed.

Indeed, that's it. Here it is, it turns out, why, this mysterious binary format!

3.SingleJar

Denis took first place in my personal rating, IMHO the most useful tutorial available. In his MicroServicesExample the github contains a completely ready-made example of setting up cluster nodes, which compiles without any additional squats.

I do it in the image and likeness, I get a single jar file that launches the "data node" or "client node" depending on the command line argument. The build is up and running. Zero Deployment is defeated.

The transition from megabytes of test data to tens of gigabytes of combat data showed that the binary format exists for a reason. It was necessary to optimize the memory consumption on the nodes, and here BinaryObject turned out to be very useful.

4. findings

The first reproach met with the indistinctness of the documentation of the Apache Ignite project turned out to be fair, little has changed since 2016. It is not easy for a beginner to build a functioning prototype based on a website and/or a repository.

As a result of the work done, the impression was that Zero Deployment works, but only at the system level. Something like this: BinaryObject is used to teach remote cluster nodes to work with custom classes; Zero Deployment - internal mechanism
Apache Ignite itself and distributes system objects throughout the cluster.

I hope my experience will be useful to new users of Apache Ignite.

Source: habr.com

Add a comment