Apache NIFI - A Brief Overview of Opportunities in Practice

Introduction

It so happened that at my current place of work I had to get acquainted with this technology. I'll start with a little background. At the next rally, our team was told that we need to create an integration with known system. Integration meant that this well-known system would send us requests via HTTP to a specific endpoint, and, oddly enough, we would send back responses in the form of a SOAP message. Everything seems to be simple and trivial. What follows from this is…

Task

Create 3 services. The first one is the Database Update Service. This service, upon receipt of new data from a third-party system, updates the data in the database and generates a file in CSV format to transfer it to the next system. The endpoint of the second service is called - the FTP Transportation Service, which receives the transferred file, validates it, and puts it in the file storage via FTP. The third service - Data transfer service to the consumer, works asynchronously with the first two. It receives a request from a third-party external system to receive a file, which was discussed above, takes a ready-made response file, modifies it (updates the id, description, linkToFile fields) and sends a response in the form of a SOAP message. That is, in general, the picture is as follows: the first two services start their work only when the data for updating has arrived. The third service works constantly because there are many consumers of information, about 1000 requests for data per minute. Services are always available and their instances are located on different environments, such as test, demo, preprod and prod. Below is a diagram of how these services work. Let me explain right away that some details are simplified to avoid unnecessary complexity.

Apache NIFI - A Brief Overview of Opportunities in Practice

Technical deepening

When planning a solution to the problem, we first decided to make applications in java using the Spring framework, the Nginx balancer, the Postgres database and other technical and not very technical things. Since the time to develop a technical solution made it possible to consider other approaches to solving this problem, the look fell on Apache NIFI technology, fashionable in certain circles. I must say right away that this technology allowed us to notice these 3 services. This article will describe the development of a file transportation service and a data transfer service to the consumer, however, if the article comes in, I will write about the data update service in the database.

What's this

NIFI is a distributed architecture for fast parallel loading and processing of data, a large number of plug-ins for sources and transformations, configuration versioning and much more. A nice bonus is that it is very easy to use. Trivial processes such as getFile, sendHttpRequest and others can be represented as squares. Each square represents a certain process, the interaction of which can be seen in the figure below. More detailed documentation on the process setup interaction is written here , for those who in Russian - here. The documentation perfectly describes how to unpack and run NIFI, as well as how to create processes, they are also squares
The idea to write an article was born after a long search and structuring the information received into something conscious, as well as the desire to make life a little easier for future developers..

Example

An example of how squares interact with each other is considered. The general scheme is quite simple: We receive an HTTP request (In theory, with a file in the request body. To demonstrate the capabilities of NIFI, in this example, the request starts the process of obtaining a file from the local FH), then we send back a response that the request has been received, in parallel, the process of obtaining a file from FH and then the process of moving it via FTP to FH. It is worth explaining that the processes interact with each other through the so-called flowFile. This is the basic entity in NIFI that stores attributes and content. Content - the data that is represented by the stream file. That is, roughly speaking, if you received a file from one square and transfer it to another, your file will be the content.

Apache NIFI - A Brief Overview of Opportunities in Practice

As you can see, this picture shows the overall process. HandleHttpRequest - accepts requests, ReplaceText - generates a response body, HandleHttpResponse - gives a response. FetchFile - receives a file from the file storage and transfers it to the PutSftp square - puts this file on FTP, at the specified address. Now more about this process.

In this case, request is the beginning of everything. Let's see its configuration options.

Apache NIFI - A Brief Overview of Opportunities in Practice

Everything is pretty trivial here, with the exception of StandardHttpContextMap - this is a kind of service that allows you to send and receive requests. For more details and even with examples, you can see - here

Next, let's look at the ReplaceText square configuration options. It is worth paying attention to ReplacementValue - this is what will be returned to the user as a response. In settings, you can adjust the logging level, you can see the logs {where nifi was unpacked}/nifi-1.9.2/logs, there are also failure / success parameters - based on these parameters, you can control the process as a whole. That is, in the case of successful text processing, the process of sending a response to the user will be called, and in the other case, we simply pledge the unsuccessful process.

Apache NIFI - A Brief Overview of Opportunities in Practice

There is nothing particularly interesting in the HandleHttpResponse properties, except for the status when the response was successfully created.

Apache NIFI - A Brief Overview of Opportunities in Practice

We figured out the request response - let's move on to receiving the file and placing it on the FTP server. FetchFile - receives a file from the path specified in the settings and transfers it to the next process.

Apache NIFI - A Brief Overview of Opportunities in Practice

And then the square PutSftp - places the file in the file storage. The configuration options can be seen below.

Apache NIFI - A Brief Overview of Opportunities in Practice

It is worth paying attention to the fact that each square is a separate process that must be launched. We have considered the simplest example that does not require any complex customization. Next, we will consider the process a little more complicated, where we will write a little on the grooves.

More complex example

The data transfer service to the consumer is made a little more complicated by the process of modifying the SOAP message. The overall process is shown in the figure below.

Apache NIFI - A Brief Overview of Opportunities in Practice

Here, the idea is also not very complicated: we received a request from the consumer that he needed data, sent a response that he received a message, started the process of receiving the response file, then edited it with certain logic, and then transferred the file to the consumer in the form of a SOAP message to the server.

I think it’s not worth describing again those squares that we saw above - let’s move on to the new ones right away. If you need to edit any file and ordinary squares like ReplaceText are not suitable, you will have to write your own script. This can be done using the ExecuteGroogyScript square. Its settings are shown below.

Apache NIFI - A Brief Overview of Opportunities in Practice

There are two options for loading the script into this square. The first is by uploading a file with a script. The second is by inserting the script into the scriptBody. As far as I know, the executeScript square supports several PLs - one of them is groovy. I'll disappoint java developers - you can't write scripts in java in such squares. For those who really want to, you need to create your own custom square and throw it into the NIFI system. This whole operation is accompanied by rather long dances with a tambourine, which we will not deal with in this article. I chose the groovy language. Below is a test script that simply incrementally updates the id in the SOAP message. It is important to note. You take a file from flowFile and update it, do not forget that you need it, updated, put it back there. It is also worth noting that not all libraries are connected. It may turn out that you still have to import one of the libs. The downside is that the script in this square is quite difficult to debug. There is a way to connect to the NIFI JVM and start the debugging process. Personally, I ran a local application and simulated getting a file from the session. Debugging was also done locally. Errors that come up when loading the script are quite easy to google and are written by NIFI itself to the log.

import org.apache.commons.io.IOUtils
import groovy.xml.XmlUtil
import java.nio.charset.*
import groovy.xml.StreamingMarkupBuilder

def flowFile = session.get()
if (!flowFile) return
try {
    flowFile = session.write(flowFile, { inputStream, outputStream ->
        String result = IOUtils.toString(inputStream, "UTF-8");
        def recordIn = new XmlSlurper().parseText(result)
        def element = recordIn.depthFirst().find {
            it.name() == 'id'
        }

        def newId = Integer.parseInt(element.toString()) + 1
        def recordOut = new XmlSlurper().parseText(result)
        recordOut.Body.ClientMessage.RequestMessage.RequestContent.content.MessagePrimaryContent.ResponseBody.id = newId

        def res = new StreamingMarkupBuilder().bind { mkp.yield recordOut }.toString()
        outputStream.write(res.getBytes(StandardCharsets.UTF_8))
} as StreamCallback)
     session.transfer(flowFile, REL_SUCCESS)
}
catch(Exception e) {
    log.error("Error during processing of validate.groovy", e)
    session.transfer(flowFile, REL_FAILURE)
}

Actually, this is where the customization of the square ends. Next, the updated file is transferred to the square that sends the file to the server. Below are the settings for this square.

Apache NIFI - A Brief Overview of Opportunities in Practice

We describe the method by which the SOAP message will be transmitted. We write where. Next, you need to specify that this is SOAP.

Apache NIFI - A Brief Overview of Opportunities in Practice

We add some properties such as host and action (soapAction). Save, check. For more details on how to send SOAP requests, see here

We have considered several options for using NIFI processes. How do they interact and what are the real benefits of them. The considered examples are test ones and are slightly different from what is real in battle. I hope this article will be of some use to developers. Thank you for your attention. If you have any questions - write. I'll try to answer.

Source: habr.com

Add a comment