How to collect user cohorts in the form of graphs in Grafana [+ docker image with example]

How to collect user cohorts in the form of graphs in Grafana [+ docker image with example]

How we solved the problem of visualizing user cohorts in the Promopult service using Grafana.

Promotional is a powerful service with a large number of users. For 10 years of work, the number of registrations in the system has exceeded one million. Those who have encountered similar services know that this array of users is far from homogeneous.

Someone logged in and "fell asleep" forever. Someone forgot the password and signed up a couple more times in six months. Someone carries money to the cashier, and someone came for free instruments. And it would be nice to get some profit from everyone.

On such large data arrays as we have, it is pointless to analyze the behavior of an individual user and make micro-decisions. But it is possible and necessary to catch trends and work with large groups. What we actually do.

Summary

  1. What is cohort analysis and why is it needed.
  2. How to make cohorts by user registration month in SQL.
  3. How to move cohorts to grafana.

If you already know what cohort analysis is and how to do it in SQL, skip to the last section.

1. What is cohort analysis and why is it needed

Cohort analysis is a method based on comparing different groups (cohorts) of users. Most often, our groups are formed by the week or month in which the user started using the service. From here, the lifetime of the user is calculated, and this is already an indicator on the basis of which a rather complex analysis can be carried out. For example, understand:

  • How does the acquisition channel affect the lifetime of a user?
  • how the use of any feature or service affects the lifetime;
  • how the launch of feature X affected the lifetime compared to last year.

2. How to make cohorts in SQL?

The size of the article and common sense do not allow us to present our real data here - in the test dump, statistics for a year and a half: 1200 users and 53 transactions. So that you can play with this data, we have prepared a docker image with MySQL and Grafana in which you can feel it all yourself. Link to GitHub at the end of the article.

And here we will show the creation of cohorts on a simplified example.

Let's say we have a service. Users register in it and spend money on services. Over time, users fall off. We want to find out how long users live, and how many of them fall off after the 1st and 2nd month of using the service.

To answer these questions, we need to build cohorts by the month of registration. Activity will be measured by spending each month. Instead of expenses, there can be orders, a subscription fee, or any other activity tied to time.

Initial data

The examples are made in MySQL, but there should be no significant differences for other DBMS.

User table - users:

userId
RegistrationDate

1
2019-01-01

2
2019-02-01

3
2019-02-10

4
2019-03-01

Cost table - billing:

userId
Date:
Sum

1
2019-01-02
11

1
2019-02-22
11

2
2019-02-12
12

3
2019-02-11
13

3
2019-03-11
13

4
2019-03-01
14

4
2019-03-02
14

We select all write-offs of users and the date of registration:

SELECT 
  b.userId, 
  b.Date,
  u.RegistrationDate
FROM billing AS b LEFT JOIN users AS u ON b.userId = u.userId

Result:

userId
Date:
RegistrationDate

1
2019-01-02
2019-01-02

1
2019-02-22
2019-01-02

2
2019-02-12
2019-02-01

3
2019-02-11
2019-02-10

3
2019-03-11
2019-02-10

4
2019-03-01
2019-03-01

4
2019-03-02
2019-03-01

We build cohorts by months, for this we convert all dates into months:

DATE_FORMAT(Date, '%Y-%m')

Now we need to know how many months the user has been active - this is the difference between the month of debiting and the month of registration. MySQL has a function PERIOD_DIFF() - the difference between two months. Add PERIOD_DIFF() to the query:

SELECT
    b.userId,
    DATE_FORMAT(b.Date, '%Y-%m') AS BillingMonth,
    DATE_FORMAT(u.RegistrationDate, '%Y-%m') AS RegistrationMonth,
    PERIOD_DIFF(DATE_FORMAT(b.Date, '%Y%m'), DATE_FORMAT(u.RegistrationDate, '%Y%m')) AS MonthsDiff
FROM billing AS b LEFT JOIN users AS u ON b.userId = u.userId

userId
BillingMonth
RegistrationDate
MonthsDiff

1
2019-01
2019-01
0

1
2019-02
2019-01
1

2
2019-02
2019-02
0

3
2019-02
2019-02
0

3
2019-03
2019-02
1

4
2019-03
2019-03
0

4
2019-03
2019-03
0

We count the users activated in each month - we group the records by BillingMonth, RegistrationMonth and MonthsDiff:

SELECT
    COUNT(DISTINCT(b.userId)) AS UsersCount,
    DATE_FORMAT(b.Date, '%Y-%m') AS BillingMonth,
    DATE_FORMAT(u.RegistrationDate, '%Y-%m') AS RegistrationMonth,
    PERIOD_DIFF(DATE_FORMAT(b.Date, '%Y%m'), DATE_FORMAT(u.RegistrationDate, '%Y%m')) AS MonthsDiff
FROM billing AS b LEFT JOIN users AS u ON b.userId = u.userId
GROUP BY BillingMonth, RegistrationMonth, MonthsDiff

Result:

UsersCount
BillingMonth
RegistrationMonth
MonthsDiff

1
2019-01
2019-01
0

1
2019-02
2019-01
1

2
2019-02
2019-02
0

1
2019-03
2019-02
1

1
2019-03
2019-03
0

In January, February and March, one new user appeared each - MonthsDiff = 0. One January user was active and in February - RegistrationMonth = 2019-01, BillingMonth = 2019-02, and one February user was active in March.

On a large data array, patterns are naturally visible better.

How to migrate cohorts to Grafana

We have learned how to form cohorts, but when there are a lot of records, it is no longer easy to analyze them. Records can be exported to Excel and form beautiful tables, but this is not our method!

Cohorts can be shown as an interactive graph in grafana.

To do this, we add another query to convert the data into a suitable format for Grafana:

SELECT
  DATE_ADD(CONCAT(s.RegistrationMonth, '-01'), INTERVAL s.MonthsDiff MONTH) AS time_sec,
  SUM(s.Users) AS value,
  s.RegistrationMonth AS metric
FROM (
  ## старый запрос, Π²ΠΎΠ·Π²Ρ€Π°Ρ‰Π°ΡŽΡ‰ΠΈΠΉ ΠΊΠΎΠ³ΠΎΡ€Ρ‚Ρ‹
  SELECT 
    COUNT(DISTINCT(b.userId)) AS Users, 
    DATE_FORMAT(b.Date, '%Y-%m') AS BillingMonth,
    DATE_FORMAT(u.RegistrationDate, '%Y-%m') AS RegistrationMonth,
    PERIOD_DIFF(DATE_FORMAT(b.Date, '%Y%m'), DATE_FORMAT(u.RegistrationDate, '%Y%m')) AS MonthsDiff
  FROM billing AS b LEFT JOIN users AS u ON b.userId = u.userId
  WHERE
    u.RegistrationDate BETWEEN '2018-01-01' AND CURRENT_DATE
  GROUP BY 
    BillingMonth, RegistrationMonth, MonthsDiff 
) AS s
GROUP BY 
  time_sec, metric

And upload the data to Grafana.

An example chart from demo:

How to collect user cohorts in the form of graphs in Grafana [+ docker image with example]

Touch with hands:

GitHub repository with example is a docker image with MySQL and Grafana that you can run on your computer. The database already has demo data for a year and a half, from January 2018 to July 2019.

Optionally, you can upload your data to this image.

PS Articles about cohort analysis in SQL:

https://chartio.com/resources/tutorials/performing-cohort-analysis-using-mysql/

https://www.holistics.io/blog/calculate-cohort-retention-analysis-with-sql/

Source: habr.com

Add a comment