ProHoster > Blog > Administration > How to collect user cohorts in the form of graphs in Grafana [+ docker image with example]
How to collect user cohorts in the form of graphs in Grafana [+ docker image with example]
How we solved the problem of visualizing user cohorts in the Promopult service using Grafana.
Promotional is a powerful service with a large number of users. For 10 years of work, the number of registrations in the system has exceeded one million. Those who have encountered similar services know that this array of users is far from homogeneous.
Someone logged in and "fell asleep" forever. Someone forgot the password and signed up a couple more times in six months. Someone carries money to the cashier, and someone came for free instruments. And it would be nice to get some profit from everyone.
On such large data arrays as we have, it is pointless to analyze the behavior of an individual user and make micro-decisions. But it is possible and necessary to catch trends and work with large groups. What we actually do.
Summary
What is cohort analysis and why is it needed.
How to make cohorts by user registration month in SQL.
If you already know what cohort analysis is and how to do it in SQL, skip to the last section.
1. What is cohort analysis and why is it needed
Cohort analysis is a method based on comparing different groups (cohorts) of users. Most often, our groups are formed by the week or month in which the user started using the service. From here, the lifetime of the user is calculated, and this is already an indicator on the basis of which a rather complex analysis can be carried out. For example, understand:
How does the acquisition channel affect the lifetime of a user?
how the use of any feature or service affects the lifetime;
how the launch of feature X affected the lifetime compared to last year.
2. How to make cohorts in SQL?
The size of the article and common sense do not allow us to present our real data here - in the test dump, statistics for a year and a half: 1200 users and 53 transactions. So that you can play with this data, we have prepared a docker image with MySQL and Grafana in which you can feel it all yourself. Link to GitHub at the end of the article.
And here we will show the creation of cohorts on a simplified example.
Let's say we have a service. Users register in it and spend money on services. Over time, users fall off. We want to find out how long users live, and how many of them fall off after the 1st and 2nd month of using the service.
To answer these questions, we need to build cohorts by the month of registration. Activity will be measured by spending each month. Instead of expenses, there can be orders, a subscription fee, or any other activity tied to time.
Initial data
The examples are made in MySQL, but there should be no significant differences for other DBMS.
User table - users:
userId
RegistrationDate
1
2019-01-01
2
2019-02-01
3
2019-02-10
4
2019-03-01
Cost table - billing:
userId
Date:
Sum
1
2019-01-02
11
1
2019-02-22
11
2
2019-02-12
12
3
2019-02-11
13
3
2019-03-11
13
4
2019-03-01
14
4
2019-03-02
14
We select all write-offs of users and the date of registration:
SELECT
b.userId,
b.Date,
u.RegistrationDate
FROM billing AS b LEFT JOIN users AS u ON b.userId = u.userId
Result:
userId
Date:
RegistrationDate
1
2019-01-02
2019-01-02
1
2019-02-22
2019-01-02
2
2019-02-12
2019-02-01
3
2019-02-11
2019-02-10
3
2019-03-11
2019-02-10
4
2019-03-01
2019-03-01
4
2019-03-02
2019-03-01
We build cohorts by months, for this we convert all dates into months:
DATE_FORMAT(Date, '%Y-%m')
Now we need to know how many months the user has been active - this is the difference between the month of debiting and the month of registration. MySQL has a function PERIOD_DIFF() - the difference between two months. Add PERIOD_DIFF() to the query:
SELECT
b.userId,
DATE_FORMAT(b.Date, '%Y-%m') AS BillingMonth,
DATE_FORMAT(u.RegistrationDate, '%Y-%m') AS RegistrationMonth,
PERIOD_DIFF(DATE_FORMAT(b.Date, '%Y%m'), DATE_FORMAT(u.RegistrationDate, '%Y%m')) AS MonthsDiff
FROM billing AS b LEFT JOIN users AS u ON b.userId = u.userId
userId
BillingMonth
RegistrationDate
MonthsDiff
1
2019-01
2019-01
0
1
2019-02
2019-01
1
2
2019-02
2019-02
0
3
2019-02
2019-02
0
3
2019-03
2019-02
1
4
2019-03
2019-03
0
4
2019-03
2019-03
0
We count the users activated in each month - we group the records by BillingMonth, RegistrationMonth and MonthsDiff:
SELECT
COUNT(DISTINCT(b.userId)) AS UsersCount,
DATE_FORMAT(b.Date, '%Y-%m') AS BillingMonth,
DATE_FORMAT(u.RegistrationDate, '%Y-%m') AS RegistrationMonth,
PERIOD_DIFF(DATE_FORMAT(b.Date, '%Y%m'), DATE_FORMAT(u.RegistrationDate, '%Y%m')) AS MonthsDiff
FROM billing AS b LEFT JOIN users AS u ON b.userId = u.userId
GROUP BY BillingMonth, RegistrationMonth, MonthsDiff
In January, February and March, one new user appeared each - MonthsDiff = 0. One January user was active and in February - RegistrationMonth = 2019-01, BillingMonth = 2019-02, and one February user was active in March.
On a large data array, patterns are naturally visible better.
How to migrate cohorts to Grafana
We have learned how to form cohorts, but when there are a lot of records, it is no longer easy to analyze them. Records can be exported to Excel and form beautiful tables, but this is not our method!
Cohorts can be shown as an interactive graph in grafana.
To do this, we add another query to convert the data into a suitable format for Grafana:
SELECT
DATE_ADD(CONCAT(s.RegistrationMonth, '-01'), INTERVAL s.MonthsDiff MONTH) AS time_sec,
SUM(s.Users) AS value,
s.RegistrationMonth AS metric
FROM (
## ΡΡΠ°ΡΡΠΉ Π·Π°ΠΏΡΠΎΡ, Π²ΠΎΠ·Π²ΡΠ°ΡΠ°ΡΡΠΈΠΉ ΠΊΠΎΠ³ΠΎΡΡΡ
SELECT
COUNT(DISTINCT(b.userId)) AS Users,
DATE_FORMAT(b.Date, '%Y-%m') AS BillingMonth,
DATE_FORMAT(u.RegistrationDate, '%Y-%m') AS RegistrationMonth,
PERIOD_DIFF(DATE_FORMAT(b.Date, '%Y%m'), DATE_FORMAT(u.RegistrationDate, '%Y%m')) AS MonthsDiff
FROM billing AS b LEFT JOIN users AS u ON b.userId = u.userId
WHERE
u.RegistrationDate BETWEEN '2018-01-01' AND CURRENT_DATE
GROUP BY
BillingMonth, RegistrationMonth, MonthsDiff
) AS s
GROUP BY
time_sec, metric
GitHub repository with example is a docker image with MySQL and Grafana that you can run on your computer. The database already has demo data for a year and a half, from January 2018 to July 2019.
Optionally, you can upload your data to this image.