How we made cloud FaaS inside Kubernetes and won the Tinkoff hackathon

How we made cloud FaaS inside Kubernetes and won the Tinkoff hackathon
Since last year, our company has started organizing hackathons. The first such competition was very successful, we wrote about it in article. The second hackathon was held in February 2019 and was no less successful. About the purpose of holding the latter not so long ago писал organizer.

The participants were given a rather interesting task with complete freedom in choosing a technology stack for its implementation. It was necessary to implement a decision-making platform for convenient deployment of customer scoring functions that could work on a fast flow of applications, withstand heavy loads, and the system itself was easily scalable.

The task is not trivial and can be solved in many ways, as we saw in the demonstration of the final presentations of the participants' projects. There were 6 teams of 5 people at the hackathon, all participants had good projects, but our platform turned out to be the most competitive. We have a very interesting project, which I would like to talk about in this article.

Our solution is a platform based on Serverless architecture inside Kubernetes, which reduces the time to bring new features to production. It allows analysts to write code in a convenient environment for them and deploy it to production without the participation of engineers and developers.

What is scoring

Tinkoff.ru, like many modern companies, has customer scoring. Scoring is a customer assessment system based on statistical data analysis methods.

For example, a client asks us to issue him a loan, or open an IP account with us. If we plan to give him a loan, then we need to assess his solvency, and if the account is an individual entrepreneur, then we need to be sure that the client will not conduct fraudulent transactions.

Such decisions are based on mathematical models that analyze both the data of the application itself and the data of our storage. In addition to scoring, similar statistical methods can also be used in the work of the service for the formation of individual recommendations on new products for our customers.

The evaluation method can take a variety of input data. And at some point, we can add a new parameter to the input, which, based on the results of analysis on historical data, will increase the conversion of using the service.

We store a lot of customer relationship data, and the volume of this information is constantly growing. For scoring to work, data processing also requires rules (or mathematical models) that allow you to quickly decide who to approve the application, to whom to refuse, and to whom to offer a couple of other products, assessing his potential interest.

For the task at hand, we already use a specialized decision-making system IBM WebSphere ILOG JRules BRMS, which, based on the rules set by analysts, technologists and developers, decides whether to approve this or that banking product to the client or refuse.

There are many ready-made solutions on the market, both scoring models and decision-making systems themselves. We use one of these systems in our company. But the business is growing, diversifying, increasing both the number of customers and the number of products offered, and along with this, ideas are emerging on how to improve the existing decision-making process. Surely people working with the existing system have a lot of ideas on how to make it easier, better, more convenient, but sometimes ideas from outside are useful. In order to collect sound ideas, the New Hackathon was organized.

The task

The hackathon was held on February 23rd. The participants were offered a combat task: to develop a decision-making system that had to meet a number of conditions.

We were told how the existing system functions and what difficulties arise during its operation, as well as what business goals the developed platform should pursue. The system should have a fast time-to-market of developed rules so that the working code of analysts gets into production as quickly as possible. And for the incoming flow of applications, the decision-making time should tend to a minimum. Also, the developed system should have cross-sell capabilities to give the client the opportunity to purchase other products of the company in case of their approval from our side and potential interest from the client.

It is clear that in one night it is impossible to write a ready-to-release project that will certainly go into production, and it is quite difficult to cover the entire system, so we were asked to implement at least part of it. A number of requirements were established that the prototype must meet. It was possible to try both to cover all the requirements in their entirety, and to work out in detail individual sections of the platform being developed.

As for technologies, then all participants were given complete freedom of choice. Any concepts and technologies could be used: Data streaming, machine learning, event sourcing, big data and others.

Our solution

After a little brainstorming, we decided that a FaaS solution would be ideal for the task.

For this solution, it was necessary to find a suitable Serverless framework to implement the rules of the developed decision-making system. Since Tinkoff actively uses Kubernetes for infrastructure management, we reviewed several ready-made solutions based on it, I will talk about it in more detail later.

To find the most effective solution, we looked at the product being developed through the eyes of its users. The main users of our system are analysts involved in the development of rules. The rules must be deployed to the server, or, as in our case, deployed in the cloud, for subsequent decision making. From the perspective of an analyst, the workflow looks like this:

  1. The analyst writes a script, rule, or ML model based on data from the warehouse. As part of the hackathon, we decided to use Mongodb, but the choice of data storage system is not fundamental here.
  2. After testing the developed rules on historical data, the analyst uploads his code to the admin area.
  3. In order to ensure versioning, all code will end up in Git repositories.
  4. Through the admin panel, it will be possible to deploy the code in the cloud as a separate functional Serverless module.

Initial data from clients must pass through a specialized Enrichment service designed to enrich the initial request with data from the store. It was important to implement this service in such a way that it would work with a single repository (from which the analyst takes data when developing rules) to maintain a single data structure.

Even before the hackathon, we decided on the Serverless framework that we will use. Today, there are quite a few technologies on the market that implement this approach. The most popular solutions within the Kubernetes architecture are Fission, Open FaaS and Kubeless. There are even a good article with their description and comparative analysis.

After weighing all the pros and cons, we chose Fission. This Serverless framework is quite easy to manage and meets the requirements of the task.

To work with Fission, you need to understand two basic concepts: function and environment. Function (function) is a piece of code written in one of the languages ​​for which there is a Fission-environment (environment). List of environments implemented within this framework includes Python, JS, Go, JVM and many other popular languages ​​and technologies.

Also, Fission is able to perform functions that are divided into several files, previously packed into an archive. The work of Fission in a Kubernetes cluster is provided by specialized pods, which are managed by the framework itself. To interact with the cluster pods, each function must be assigned its own route, and to which you can pass GET parameters or request body in case of a POST request.

As a result, we planned to get a solution that allows analysts to deploy the developed rule scripts without the participation of engineers and developers. The described approach also relieves developers of the need to rewrite analyst code into another language. For example, for the current decision-making system we use, we have to write rules in narrow-profile technologies and languages, the scope of which is extremely limited, and there is also a strong dependence on the application server, since all draft bank rules are deployed in a single environment. As a result, to deploy new rules, it is necessary to release the entire system.

In our proposed solution, there is no need to release the rules, the code is easily deployed at the click of a button. Also, infrastructure management in Kubernetes allows you not to think about load and scaling, such problems are solved out of the box. And the use of a single data warehouse eliminates the need to compare real-time data with historical data, which simplifies the work of an analyst.

What did we get

Since we came to the hackathon with a ready-made solution (in our fantasies), we only have to convert all our thoughts into lines of code.

The key to success at any hackathon is preparation and a well-written plan. Therefore, first of all, we decided on which modules the architecture of our system will consist of and what technologies we will use.

The architecture of our project was as follows:

How we made cloud FaaS inside Kubernetes and won the Tinkoff hackathon
This diagram shows two entry points, an analyst (the main user of our system) and a client.

The work process is structured like this. The analyst develops a rule function and a data enrichment function for his model, stores his code in a Git repository, and deploys his model to the cloud through the administrator application. Let's consider how the deployed function will be called and make decisions on incoming requests from clients:

  1. The client, filling out the form on the site, sends his request to the controller. An application comes to the input of the system, on which it is necessary to make a decision, and is recorded in the database in its original form.
  2. Next, the raw request is sent for enrichment, if necessary. You can supplement the initial request with data both from external services and from the storage. The resulting enriched query is also stored in the database.
  3. The analyst function is launched, which accepts an enriched request as input and returns a solution, which is also written to the storage.

As a storage in our system, we decided to use MongoDB in the form of document-oriented data storage in the form of JSON documents, since enrichment services, including the original request, aggregated all data through REST controllers.

So, we had XNUMX hours to implement the platform. We quite successfully distributed the roles, each team member had his own area of ​​responsibility in our project:

  1. Front-end admin panel for the analyst, through which he could download rules from the versioning control system for written scripts, select options for enriching input data and edit rule scripts online.
  2. Backend admin, including REST API for the front and integration with VCS.
  3. Setting up the infrastructure in Google Cloud and developing a source data enrichment service.
  4. Module for integrating the admin application with the Serverless framework for the subsequent deployment of rules.
  5. Rule scripts for testing the performance of the entire system and aggregation of analytics on incoming requests (decisions made) for the final demonstration.

Let's start in order.

Our frontend was written in Angular 7 using the banking UI Kit. The final version of the admin panel looked like this:

How we made cloud FaaS inside Kubernetes and won the Tinkoff hackathon
Since there was not much time, we tried to implement only the key functionality. To deploy a function in the Kubernetes cluster, it was necessary to select an event (the service for which it is necessary to deploy the rule in the cloud) and the code of the function that implements the decision logic. For each deployment of the rule for the selected service, we wrote a log of this event. In the admin panel, you could see the logs of all events.

All function code was stored in a remote Git repository, which also needed to be set in the admin panel. For code versioning, all functions were stored in different branches of the repository. The admin panel also provides the ability to make adjustments to the written scripts, so that before deploying the function to production, you can not only check the written code, but also make the necessary changes.

How we made cloud FaaS inside Kubernetes and won the Tinkoff hackathon
In addition to the rule functions, we also implemented the possibility of gradual enrichment of the source data using Enrichment functions, the code of which was also scripts in which it was possible to go to the data warehouse, call third-party services, and perform preliminary calculations. To demonstrate our solution, we calculated the zodiac sign of the client who left the application and determined his mobile operator using a third-party REST service.

The backend of the platform was written in Java and implemented as a Spring Boot application. We initially planned to use Postgres to store admin panel data, but, as part of the hackathon, we decided to limit ourselves to a simple H2 in order to save time. On the back, integration with Bitbucket was implemented for versioning query enrichment functions and rule scripts. For integration with remote Git repositories, we used jgit library, which is a kind of wrapper over CLI commands that allows you to execute any git instructions using a convenient programming interface. So we had two separate repositories, for enrichment functions and rules, and all the scripts were arranged in directories. Through the UI, it was possible to select the last commit of the script of an arbitrary branch of the repository. When making changes to the code through the admin panel, commits of the changed code were created in remote repositories.

To implement our idea, we needed the right infrastructure. We have made the decision to deploy our Kubernetes cluster in the cloud. Our choice was the Google Cloud Platform. The Fission serverless framework was installed on a Kubernetes cluster that we deployed on Gcloud. Initially, the source data enrichment service was implemented by a separate Java application wrapped in a Pod inside the k8s cluster. But after previewing our project in the middle of the hackathon, we were advised to make the Enrichment service more flexible to allow us to choose how to enrich the raw data of incoming applications. And we had no choice but to make the enrichment service also Serverless.

To work with Fission, we used the Fission CLI, which must be installed on top of the Kubernetes CLI. Deploying functions in a k8s cluster is quite simple, you just need to assign an internal route to the function and an ingress to allow incoming traffic if you need access outside the cluster. Deploying a single function usually takes no more than 10 seconds.

Final show of the project and summing up

To demonstrate the operation of our system, we placed a simple form on a remote server where you can apply for one of the bank's products. For the request, it was necessary to enter your initials, date of birth and phone number.

The data from the client form went to the controller, which simultaneously sent requests for all available rules, having previously enriched the data according to the specified conditions, and saved them to a shared storage. In total, we deployed three functions that make decisions on incoming requests and 4 data enrichment services. After sending the application, the client received our solution:

How we made cloud FaaS inside Kubernetes and won the Tinkoff hackathon
The client, in addition to rejection or approval, also received a list of other products, requests for which we sent in parallel. This is how we demonstrated the possibility of cross-sale in our platform.

In total, 3 fictitious products of the bank were available:

  • ΠšΡ€Π΅Π΄ΠΈΡ‚.
  • Toy
  • Mortgage.

For each service during the demonstration, we deployed prepared functions and enrichment scripts.

Each rule required its own set of input data. So, for mortgage approval, we calculated the client's zodiac sign and connected it with the logic of the lunar calendar. To approve the toy, we checked that the client had reached the age of majority, and to issue a loan, we sent a request to an external open service for determining the mobile operator, and made a decision on it.

We tried to make our demo interesting and interactive, everyone present could go to our form and check the availability of our fictitious services to him. And at the very end of the presentation, we demonstrated the analytics of received applications, which showed how many people used our service, the number of approvals, refusals.

To collect online analytics, we additionally deployed an open source BI tool metabase and screwed it to our storage. Metabase allows you to build screens with analytics on the data of interest to us, you just need to register a connection to the database, select tables (in our case, data collections, since we used MongoDB), and specify the fields of interest to us.

As a result, we got a good prototype of the decision-making platform, and at the demonstration, each listener could personally check its performance. An interesting solution, a ready prototype and a successful demonstration allowed us to win despite strong competition from other teams. I am sure that an interesting article can also be written on the project of each team.

Source: habr.com

Add a comment