Retentioneering: how we wrote open-source product analytics tools in Python and Pandas

Hey Habr. This article is devoted to the results of a four-year development of a set of methods and tools for processing user movement trajectories in an application or on a website. Author of the development - Maxim Godzi, who is at the head of the team of product creators, he is also the author of the article. The product itself was called Retentioneering, now it has been converted into an open-source library and hosted on Github so that anyone can use it. All this may be of interest to those involved in product and marketing analysis, product promotion and development. By the way, on HabrΓ© an article has already been published about one of the cases of working with Retentioneering. The new material explains what the product is capable of and how it can be used.

After reading the article, you will be able to write your own Retentioneering, it can be any standardized method for processing user trajectories in the application and beyond, allowing you to see in detail the features of behavior and extract insights from this for the growth of business metrics.

What is Retentioneering and why is it needed?

Initially, our goal was to move Growth Hacking from the world of "digital witchcraft" to the world of numbers, analytics and forecasts. As a result, product analytics is reduced to pure mathematics and programming for those who prefer numbers instead of fantastic stories, and formulas to smart words like β€œrebranding”, β€œrepositioning”, etc., which sound beautiful, but do not help much in practice.

To solve these problems, we needed a framework for analytics through graphs and trajectories, and at the same time a library that simplifies typical analyst routines, as a way to describe regular product analytics tasks that would be understandable to both humans and robots. The library provides the ability to describe user behavior and link it to product business metrics in such a formal and clear language to simplify and automate the routine tasks of developers and analysts, and facilitate their communication with the business.

Retentioneering is a method and analytical software tools that can be adapted and integrated into any digital (and not only) product.

We started working on the product in 2015. Now it is a ready-made, although not yet ideal set of tools for working with data in Python and Pandas, machine learning models with sklearn-like api, tools for interpreting the results of eli5 and shap machine learning models.

It's all wrapped up into a convenient open-source library in the open Github repository - retentioneering-tools. Using the library is not difficult, almost anyone who loves product analytics, but has not written code before, can apply our analytics methods to their data on their own and without spending a lot of time.

Well, a programmer, application creator, or a member of a development or test team who has never done analytics before can start playing with this code and see patterns in using their application without outside help.

User trajectory as a basic element of analysis and methods for its processing

User trajectory is a sequence of user states at certain time points. Moreover, events can be from different data sources, both online and offline. The events that happened to the user are part of his trajectory. Examples:
β€’ pressed the button
β€’ saw a picture
β€’ hit the screen
β€’ received an email
β€’ recommended the product to a friend
β€’ filled out the form
β€’ tapped the screen
β€’ scrolled
β€’ approached the checkout
β€’ ordered a burrito
β€’ ate a burrito
β€’ poisoned by a burrito he ate
β€’ entered the cafe from the back entrance
β€’ entered from the main entrance
β€’ minimized the application
β€’ received a push notification
β€’ stupid in the screen for longer time Π₯
β€’ paid for the order
β€’ redeemed the order
β€’ was denied a loan

If you take the trajectory data of a group of users and study how the transitions work, you can see exactly how their behavior in the application is built. It is convenient to do this through a graph in which the states are nodes and the transitions between states are edges:

Retentioneering: how we wrote open-source product analytics tools in Python and Pandas

"Trajectory" is a very convenient concept - it contains detailed information about all user actions, with the ability to add any additional data to the description of these actions. This makes it a generic object. If you have beautiful and convenient tools that allow you to work with trajectories, then you can find similarities and segment them.

Trajectory segmentation may seem very complicated at first. In a normal situation, this is true - you need to use a comparison of connectivity matrices or alignment of sequences. We managed to find an easier way - to study a large number of trajectories and segment them through clustering.

As it turned out, it is possible to turn a trajectory into a point using continuous representations, for example, TF-IDF. After the transformation, the trajectory becomes a point in space, where the normalized occurrence in the trajectory of various events and transitions between them is plotted along the axes. This thing from a huge thousand- and more-dimensional space (dimS=sum(event types)+sum(ngrams_2 types)), can be projected onto a plane using TSNE. TSNE - transformation, reduces the dimension of space to 2 axes and, if possible, preserves the relative distances between points. Accordingly, it becomes possible on a flat map, a figurative projection map of trajectories, to study how the points of different trajectories were located among themselves. It is analyzed how close or different they were to each other, whether they formed clusters or scattered across the map, etc.:

Retentioneering: how we wrote open-source product analytics tools in Python and Pandas

Retentioneering analytical tools provide the ability to turn complex data and trajectories into a representation that can be compared with each other, and then explore and interpret the result of the transformation.

Speaking of standard trajectory processing methods, we mean three main tools that we have implemented in Retentioneering - graphs, step matrices and trajectory projection maps.

Working with Google Analytics, Firebase and similar analytics systems is quite complicated and not 100% efficient. The problem is a number of limitations for the user, as a result of which the work of the analyst in such systems rests on mouse clicks and selection of slices. Retentioneering makes it possible to work with user trajectories, and not just with funnels, as in Google Analytics, where the level of detail is often reduced to a funnel, albeit built for a certain segment.

Retentioneering and case studies

As an example of using the developed tool, we can cite the case of a large niche service in Russia. This company has an Android mobile app that is popular with customers. The annual turnover from the mobile application was about 7 million rubles, seasonal fluctuations were within 60-130 thousand. The same company also has an application for iOS, and the average check of the user of the "apple" application was higher than the average check of the client using the Android application - 1080 rub. against 1300 rubles.

The company decided to increase the efficiency of the Android application, for which it conducted a thorough analysis. Several dozen hypotheses were formed to increase the effectiveness of the application. After using Retentionneering, it turned out that the problem was in the messages that were shown to new users. They received information about the brand, company benefits and prices. But, as it turned out, the messages were supposed to help the user learn how to work in the application.

Retentioneering: how we wrote open-source product analytics tools in Python and Pandas

This was done, as a result of which the application began to be deleted less, and the increase in conversion to order was 23%. At first, 20 percent of incoming traffic was given to the test, but a few days later, after analyzing the first results and evaluating the trend, they reversed the proportions and, on the contrary, left 20 percent for the control group, and placed eighty percent in the test. A week later, it was decided to sequentially add testing of two more hypotheses. In just seven weeks, the turnover from the Android application increased by one and a half times compared to the previous level.

How to work with Retentioneering?

The first steps are quite simple - we load the library with the pip install retentioneering command. The repository itself contains ready-made examples and data processing cases for some product analytics tasks. The set is constantly updated until it is enough for the first acquaintance. Everyone can take ready-made modules and immediately apply to their tasks - this allows you to immediately set up the process of more detailed analysis and optimization of user trajectories as quickly and efficiently as possible. All this makes it possible to find application usage patterns through understandable code and share this experience with colleagues.

Retentioneering is a tool worth using throughout the lifetime of an app, and here's why:

  • Retentioneering is effective for tracking and continuously optimizing user trajectories and improving business performance. So, new features are often added to ecommerce applications, the impact of which on the product cannot always be predicted correctly. In some cases, there are compatibility issues between new and old features - for example, new ones "cannibalize" existing ones. And in this situation, a constant analysis of trajectories is needed.
  • The situation is similar in working with advertising channels: new traffic sources and advertising creatives are constantly being tested, it is necessary to monitor seasonality, trends and the impact of other events, which leads to the emergence of new classes of problems. It also requires constant monitoring and interpretation of user mechanics.
  • There are a number of factors that constantly affect the operation of the application. For example, new releases from developers: by closing an actual problem, they unwittingly return the old one or create a completely new one. Over time, the number of new releases grows, and the process of tracking bugs needs to be automated, including through the analysis of user trajectories.

Overall, retentioneering is an effective tool. But there is no limit to perfection - it can and should be improved, developed, and new cool products built on its basis. The more active the project community is, the more forks there will be, new interesting options for its use will appear.

More information about Retentioneering tools:

Source: habr.com

Add a comment