How we use Markov chains in evaluating solutions and finding bugs. With a Python script

It is important for us to understand what happens to our students during training and how these events affect the result, so we are building a Customer Journey Map - a customer experience map. After all, the learning process is not something continuous and integral, it is a chain of interrelated events and student actions, and these actions can be very different for different students. So he passed the lesson: what will he do next? Go to homework? Launch a mobile app? Change the course, ask to change the teacher? Will you go straight to the next lesson? Or just walk away disappointed? Is it possible, after analyzing this map, to identify patterns that lead to the successful completion of the course, or vice versa, the “falling off” of the student?

How we use Markov chains in evaluating solutions and finding bugs. With a Python script

Usually, specialized, very expensive closed-source tools are used to build CJM. But we wanted to come up with something simple, requiring minimal effort and, if possible, open source. This is how the idea to use Markov chains came about, and we succeeded. We built a map, interpreted data about student behavior in the form of a graph, saw completely non-obvious answers to global business questions, and even found deeply hidden bugs. We did all this with the help of open source Python script solutions. In this article, I will talk about two cases with those very non-obvious results and share the script with everyone.

So, Markov chains show the probability of transitions between events. Here is a primitive example from Wikipedia:

How we use Markov chains in evaluating solutions and finding bugs. With a Python script

Here “E” and “A” are events, arrows are transitions between them (including the transition from an event to it), and the weights of the arrows are the transition probability (“weighted directed graph”).

What was used

The circuit was trained with standard Python functionality fed with student activity logs. The graph on the resulting matrix was built using the NetworkX library.

The log looks like this:

How we use Markov chains in evaluating solutions and finding bugs. With a Python script

This is a csv file containing a table with three columns: student id, event name, time when it happened. These three fields are enough to track the client's movements, build a map, and eventually get a Markov chain.

The library returns the constructed graphs in .dot or .gexf format. To visualize the first one, you can use the free Graphviz package (gvedit tool), we worked with .gexf and Gephi, which is also free.

Next, I want to give two examples of the use of Markov chains, which allowed us to take a fresh look at our goals, learning processes and the Skyeng ecosystem itself. Well, fix the bugs.

First case: mobile application

First, we explored the student journey through our most popular product, the General course. At that moment, I was working in Skyeng's children's department and we wanted to see how effectively the mobile application works with our children's audience.

Taking the logs and running them through the script, I got something like this:

How we use Markov chains in evaluating solutions and finding bugs. With a Python script

The starting node is Start General, and at the bottom there are three output nodes: the student “fell asleep”, changed course, finished the course.

  • Fell asleep, “I fell asleep” - it means that the class is no longer taking place, most likely, he fell off. We optimistically call this state "asleep", because. in theory, he still has the opportunity to continue his studies. Worst result for us.
  • Dropped general, Changed course - switched from General to something else and got lost for our Markov chain.
  • Finished course, Finished course - perfect condition, a person has completed 80% of the lessons (not all lessons are required).

Hitting the successfull class node means successfully completing the lesson on our platform together with the teacher. It captures the progress in the course and the approach to the desired result - “I completed the course”. It is important for us that students attend it as much as possible.

To get more accurate quantitative conclusions for the mobile application (app session node), we built separate chains for each of the final nodes and then compared the weights of the edges in pairs:

  • from the app session back to it;
  • from app session to successful class;
  • from successful class to app session.

How we use Markov chains in evaluating solutions and finding bugs. With a Python script
On the left - students who completed the course, on the right - "falling asleep"

These three edges show the relationship between a student's success and their use of the mobile app. We expected to see that students who completed the course would have a stronger connection with the application than those who fell asleep. However, in reality, they got exactly the opposite results:

  • we have seen that different user groups interact differently with the mobile application;
  • successful students use the mobile application less intensively;
  • falling asleep students use the mobile application more actively.

This means that "sleeping" students are beginning to spend more and more time in the mobile application and eventually stay in it forever.

How we use Markov chains in evaluating solutions and finding bugs. With a Python script

At first we were surprised, but after thinking, we realized that this is a completely natural effect. At one time, I studied French on my own using two tools: a mobile app and grammar lectures on YouTube. At first, I divided the time between them in a 50-50 ratio. But the application is more fun, there is gamification, everything is simple, fast and clear, and you need to delve into the lectures, write something down, practice in a notebook. Gradually, I began to spend more time on my smartphone, until its share grew to 100%: if you hang in it for three hours, a false feeling of work done is created, because of which there is no desire to go and listen to something.

But how is it? After all, we specifically created a mobile application, built into it the Ebbinghaus curve, gamified, made it attractive so that people spend time in it, but it turns out that it only distracts them? In fact, the reason is that the mobile app team did too well in their tasks, as a result of which it became a cool self-sufficient product and began to fall out of our ecosystem.

As a result of the study, the understanding came that the mobile application needs to be somehow changed so that it is less distracting from the main course of study. And both children and adults. Now this work is underway.

Second case: onboarding bugs

Onboarding is an optional additional procedure when registering a new student, eliminating potential technical problems in the future. The basic scenario implies that a person has registered on the landing page, gained access to their personal account, is contacted and given an introductory lesson. At the same time, we note a large percentage of technical difficulties during the introductory lesson: the wrong version of the browser, the microphone or sound does not work, the teacher cannot immediately suggest a solution, and all this is especially difficult when it comes to children. Therefore, we have developed an additional application in your personal account, where you can follow four simple steps: check the browser, camera, microphone and confirm that the parents will be there during the introductory lesson (after all, they are the ones who pay for the education of children).

These few onboarding pages showed this funnel:

How we use Markov chains in evaluating solutions and finding bugs. With a Python script
1: a starting block with three slightly different (depending on the client) login-password entry forms.
2: checkbox for consent to an additional onboarding procedure.
2.1-2.3: check for parent presence, chrome version and sound.
3: final block.

It looks very natural: on the first two steps, most of the visitors merge, realizing that something needs to be filled in, checked, but there is no time. If the client has reached the third step, then he will almost certainly reach the final. The funnel does not show any reason to suspect something.

Nevertheless, we decided to analyze our onboarding not on a classic one-dimensional funnel, but using a Markov chain. We turned on a little more events, ran the script and got this:

How we use Markov chains in evaluating solutions and finding bugs. With a Python script

Only one thing can be unambiguously understood in this chaos: something went wrong. The onboarding process is linear, this is by design, there should not be such a web of connections in it. And here you can immediately see that the user is thrown between steps, between which there should be no transitions at all.

How we use Markov chains in evaluating solutions and finding bugs. With a Python script

There can be two reasons for this strange picture:

  • shoals crept into the database of logs;
  • jambs are present in the product itself - onboarding.

The first reason is most likely the case, but it is quite laborious to check it, and fixing the logs will not help improve the UX in any way. But with the second, if it exists, something had to be done urgently. Therefore, we set off to examine the nodes, identify edges that should not be, look for the reasons for their occurrence. We saw that some users got stuck and went in circles, others fell out from the middle to the beginning, and still others, in principle, could not get out of the first two steps. We passed the data to QA - and yes, it turned out that there were enough bugs in onboarding: this is such a by-product, a bit of a crutch product, it was not tested deep enough, because. didn't expect any problems. Now the whole recording process has changed.

This story showed us an unexpected application of Markov chains in QA.

Try it yourself!

I posted mine Python script for training Markov chains in open access - use it to your health. Documentation on GitHub, questions can be asked here, I will try to answer everything.

Well, some useful links: NetworkX library, Graphviz visualizer. And here There is an article on Habré about Markov chains. The graphs in the article are made using Gephi.

Source: habr.com

Add a comment