🥇How we predicted the outflow, approaching it as a natural disaster

Sometimes, in order to solve a problem, you just need to look at it from a different angle. Even if for the last 10 years similar problems have been solved in the same way with different effects, it is not a fact that this method is the only one.

There is such a thing as customer churn. The thing is inevitable, because customers of any company can, for a variety of reasons, stop using its products or services. Of course, for a company, churn is a natural, but not the most desirable action, so everyone tries to minimize this churn. Better yet, predict the probability of a churn of a particular category of users, or a specific user, and offer some steps to retain.

It is necessary to analyze and try to retain the client, if possible, at least for the following reasons:

attracting new customers is more expensive than retention procedures. To attract new customers, as a rule, you need to spend some money (advertising), while existing customers can be activated with a special offer with special conditions;
Understanding why customers leave is the key to improving products and services.

There are standard approaches to churn forecasting. But at one of the AI championships, we decided to take and try the Weibull distribution for this. It is most often used for survival analysis, weather forecasting, natural disaster analysis, industrial engineering, and the like. The Weibull distribution is a special distribution function parameterized by two parameters и .

Wikipedia

In general, the thing is interesting, but for predicting the outflow, and indeed in fintech, it is used not so often. Under the cut, we’ll tell you how we (the Data Mining Lab) did it, simultaneously winning gold at the Artificial Intelligence Championship in the AI in Banks nomination.

About outflow in general

Let's understand a little what customer churn is and why it is so important. Your customer base is important to your business. New customers come to this database, for example, having learned about a product or service from advertising, they live for some time (actively use products) and after some time stop using it. This period is called the Customer Lifecycle, which is a term that describes the stages that a customer goes through when he learns about a product, makes a purchase decision, pays, uses and becomes a loyal consumer, and ultimately stops using it. products for one reason or another. Accordingly, churn is the final stage of the client's life cycle, when the client stops using the services, and for business, this means that the client has ceased to make a profit and, in general, any benefit.

Each client of the bank is a specific person who chooses one or another bank card specifically for his needs. Travels often - a card with miles is useful. Buys a lot - hello, cashback card. He buys a lot in specific stores - and for this there is already a special partner plastic. Of course, sometimes the card is selected according to the criterion "The cheapest service". In general, there are enough variables here.

And also a person chooses the bank itself - does it make sense to choose a bank card, the branches of which are only in Moscow and the region, when you are from Khabarovsk? Whether the card of such a bank is at least 2 times more profitable, the presence of bank branches nearby is still an important criterion. Yes, 2019 is already here and digital is our everything, but a number of issues for some banks can only be resolved in a branch. Plus, again, some part of the population trusts a physical bank much more than an application in a smartphone, this should also be taken into account.

As a result, a person can have many reasons for refusing bank products (or from the bank itself). I changed jobs, and the card tariff changed from salary to "For mere mortals", which is less profitable. Moved to another city where there are no bank branches. I did not like the communication with an unqualified operator in the branch. That is, there may be even more reasons for closing an account than for using a product.

And the client can not only explicitly express his intention - to come to the bank and write a statement, but simply stop using the products without terminating the contract. To understand such tasks, it was decided to use machine learning and AI.

Moreover, customer churn can occur in any industry (telecom, Internet providers, insurance companies, in general, wherever there is a customer base and periodic transactions).

What have we done

First of all, it was necessary to describe a clear boundary - from what time we begin to consider the client as gone. From the point of view of the bank that provided us with the data to work with, the client's activity state was binary - he is either active or not. There was an ACTIVE_FLAG flag in the "Activity" table, the value of which could be either "0" or "1" (respectively, "Inactive" and "Active"). And everything would be fine, but a person is such that he can actively use it for some time, and then drop out of the active list for a month - he fell ill, went to another country to rest, or even went to test a card from another bank. Or maybe after a long period of inactivity, start using the services of the bank again

Therefore, we decided to call the period of inactivity a certain continuous period of time during which the flag for it was set to "0".

Clients transition from inactive to active after periods of inactivity of various lengths. We have the opportunity to calculate the degree of empirical value "reliability of periods of inactivity" - that is, the likelihood that a person will again begin to use bank products after temporary inactivity.

For example, this graph shows the resumption of activity (ACTIVE_FLAG=1) of clients after several months of inactivity (ACTIVE_FLAG=0).

Here we will refine a bit the dataset we started working with. So, the bank provided aggregated information for 19 months in the following tables:

"Activity" - monthly transactions of clients (by cards, in Internet banking and mobile banking), including payroll and information on turnover.
"Cards" - data on all cards that the client has, with a detailed tariff scale.
"Contracts" - information about the client's contracts (both open and closed): loans, deposits, etc., indicating the parameters of each.
"Customers" - a set of demographic data (gender and age) and the presence of contact information.

For work, we needed all the tables, except for the "Maps".

The difficulty here was something else - in this data, the bank did not indicate what kind of activity took place on the cards. That is, we could understand whether there were transactions or not, but we can no longer determine their type. Therefore, it was not clear whether the client was withdrawing cash, whether he received a salary, or whether he spent money on purchases. We also didn't have data on account balances, which would have been useful.

The sample itself was unbiased - on this cutoff for 19 months, the bank did not make any attempts to retain customers and minimize the outflow.

So, about periods of inactivity.

To formulate a definition of churn, one must choose a period of inactivity. To create a point-in-time churn forecast , you must have a customer history of at least 3 months in the interval . Our history was limited to 19 months, so we decided to take a period of inactivity of 6 months, if any. And for the minimum period for a qualitative forecast, they took 3 months. We took the figures for 3 and 6 months empirically based on the analysis of the behavior of these clients.

We defined the churn as follows: customer churn month this is the first month with ACTIVE_FLAG=0, where from this month there are at least six consecutive zeros in the ACTIVE_FLAG field, in other words, the month since which the client has been inactive for 6 months.

Number of clients who left

Number of remaining clients

How is outflow calculated?

In such competitions, and indeed in practice, the outflow is often predicted in this way. The client uses products and services at different time intervals, data on interaction with him is presented as a feature vector of a fixed length n. Most often, this information includes:

User-specific data (demographic data, marketing segment).
The history of the use of banking products and services (these are the actions of customers that are always tied to a specific time or period of the interval we need).
External data, if it was possible to obtain it - for example, reviews from social networks.

And after that, they derive the definition of outflow, which is different for each task. Then they use a machine learning algorithm that predicts the probability of a client leaving based on the factor vector . To train the algorithm, one of the well-known frameworks for constructing ensembles of decision trees is used, XGBoost, LightGBM, catboost or their modifications.

The algorithm itself is not bad, but it has several serious disadvantages in terms of outflow forecasting.

He has no so-called "memory". The input of the model receives a given number of features that correspond to the current moment in time. In order to store information about the history of parameter changes, it is necessary to calculate special signs that characterize changes in parameters over time, for example, the number or amount of bank transactions over the past 1,2,3 months. Such an approach may only partially reflect the nature of temporal changes.
Fixed forecast horizon. The model is only able to predict customer churn for a predetermined amount of time, such as one month ahead. If a forecast is required for a different period of time, for example, for three months, then you need to rebuild the training sample and retrain the new model.

Our approach

We decided right away that we would not use standard approaches. In addition to us, 497 more people registered in the championship, each of whom had considerable experience behind him. So trying to do something according to the standard scheme in such conditions is not a good idea.

And we began to solve the problems facing the binary classification model by predicting the probability distribution of customer churn time. A similar approach can be seen here, it allows you to more flexibly predict churn and test more complex hypotheses than in the classical approach. As a family of distributions modeling the outflow time, we chose the distribution Weibulla for its widespread use in survival analysis. The client's behavior can be seen as a kind of survival.

Here are examples of the Weibull probability density distribution depending on the parameters и :

This is the probability density distribution of a customer churn of three different customers over time. Time is presented in months. In other words, this chart shows when a client is most likely to churn in the next two months. As you can see, a distribution client has a greater potential to leave earlier than clients with Weibull(2, 0.5) and Weibull(3,1) distributions.

The result is a model that for each client for any
month predicts the parameters of the Weibull distribution, which best reflects the onset of the churn probability over time. If in more detail:

The target features on the training sample are the time left until the outflow in a particular month for a particular client.
If there is no churn metric for the customer, we assume that the churn time is greater than the number of months from the current to the end of the history we have.
Model used: recurrent neural network with LSTM layer.
As a loss function, we use the negative log-likelihood function for the Weibull distribution.

Here are the advantages of this method:

The probability distribution, in addition to the obvious possibility of binary classification, allows flexible prediction of various events, for example, whether a client will stop using bank services within 3 months. Also, if necessary, various metrics can be averaged over this distribution.
The LSTM Recurrent Neural Network has memory and makes efficient use of all available history. With the expansion or refinement of history, accuracy grows.
The approach can be scaled without problems when breaking time intervals into smaller ones (for example, when breaking months into weeks).

But it is not enough to create a good model, you also need to properly evaluate its quality.

How was the quality assessed?

We chose Lift Curve as the metric. It is used in business for such cases because of the clear interpretation, it is well described. here и here. If we describe the meaning of this metric in one sentence, we get “How many times the algorithm makes a better prediction on the first % than randomly.

Train the models

The conditions of the competition did not establish a specific quality metric by which different models and approaches can be compared. Moreover, the definition of churn may be different and may depend on the statement of the problem, which, in turn, is determined by business goals. Therefore, in order to understand which method is better, we trained two models:

A commonly used binary classification approach using an ensemble decision tree machine learning algorithm (LightGBM);
Model Weibull-LSTM

The test sample consisted of 500 pre-selected clients who were not in the training sample. For the model, hyper-parameters were selected using cross-validation, broken down by clients. The same sets of features were used to train each model.

Due to the fact that the model does not have memory, special signs were taken for it, showing the ratio of the change in the parameters of one month to the average value of the parameters for the last three months. What characterized the rate of change of values for the last period of three months. Without this, the Random Forest-based model would be in a losing position relative to the Weibull-LSTM.

Why Weibull LSTM is better than an ensemble decision tree approach

Here everything is clearly just a couple of pictures.

Comparison of Lift Curve for classical algorithm and Weibull-LSTM

Comparison of the Lift Curve Metric by Month for the Classical Algorithm and Weibull-LSTM

In general, LSTM does the classical algorithm in almost all cases.

Churn prediction

A model based on a recurrent neural network with LSTM cells with Weibull distribution can predict churn in advance, for example, predict customer churn within the next n months. Consider the case for n = 3. In this case, for each month, the neural network must correctly determine whether the client leaves from the next month until the nth month. In other words, it must correctly determine whether the client will remain after n months. This can be considered a prediction in advance: a prediction of the moment when the client just started thinking about leaving.

Compare Lift Curve for Weibull-LSTM 1, 2 and 3 months ahead of churn:

We already wrote above that the forecasts that are made for clients who drop out of active for some time are also important. Therefore, here we will add to the sample such cases where the client has already been inactive for one or two months, and check that Weibull-LSTM correctly classifies such cases as churn. Since such cases were present in the sample, we expect the network to handle them well:

Customer retention

Actually, this is the main thing that can be done, having on hand the information that such and such customers are preparing to stop using the product. Speaking of building a model that could offer something useful to customers in order to keep them - this will not work if you do not have a history of similar attempts that would end well.

We didn’t have such a history, so we decided it like this.

We build a model that defines interesting products for each client.
Every month we run the classifier and identify potentially leaving customers.
We offer a product to some customers, according to the model from point 1, we remember our actions.
A few months later, we look at which of these potentially leaving customers left and who stayed. Thus, we form a training sample.
We train the model on the history obtained in step 4.
Optionally, we repeat the procedure, replacing the model from step 1 with the model obtained in step 5.

The quality of such retention can be checked by the usual A / B testing - we divide customers who potentially leave into two groups. To one we offer products based on our retention model, to the other we offer nothing. We decided to train a model that could be useful already at point 1 of our example.

We wanted to make segmentation as interpretable as possible. To do this, we chose several features that could be easily interpreted: the total number of transactions, wages, total account turnover, age, gender. The features from the "Maps" table were not taken into account as uninformative, and the features from Table 3 "Contracts" were not taken into account due to the complexity of processing in order to avoid data leakage between the validation set and the training set.

Clustering was performed using Gaussian mixture models. The Akaike information criterion allowed us to determine 2 optimums. The first optimum corresponds to 1 cluster. The second optimum, less pronounced, corresponds to 80 clusters. Based on this result, the following conclusion can be drawn: it is extremely difficult to divide the data into clusters without a priori given information. For better clustering, you need data that describes each client in detail.

Therefore, the task of training with a teacher was considered in order to offer each individual client a product. The following products were considered: "Term deposit", "Credit card", "Overdraft", "Consumer loan", "Car loan", "Mortgage".

There was another type of product in the data: "Current account". But we did not consider it because of the low information content. For users who are bank customers, i.e. did not stop using its products, a model was built to predict which product they might be interested in. Logistic regression was chosen as a model, and the Lift value for the first 10 percentiles was used as a quality assessment metric.

The quality of the model can be assessed in the figure.

Results of the product recommendation model for customers

Сonclusion

This approach brought us the first place in the AI in Banks category at the RAIF-Challenge 2017 AI Championship.

Apparently, the main thing was to approach the problem from an unfamiliar side, and use the method that is customary to use for other situations.

Although a massive outflow of users may well be a natural disaster for services.

This method can be taken into account for any other area where it is important to take into account the outflow, not by banks alone. For example, we also used it to calculate our own outflow - in the Siberian and St. Petersburg branches of Rostelecom.

Data Mining Laboratory, Sputnik Search Portal

Source: habr.com

How we predicted outflow by approaching it like a natural disaster