ื›ื™ืฆื“ ืœืืกื•ืฃ ืงื‘ื•ืฆื•ืช ืžืฉืชืžืฉื™ื ื›ื’ืจืคื™ื ื‘ื’ืจืืคืื ื” [+ ืชืžื•ื ืช docker ืขื ื“ื•ื’ืžื”]

ื›ื™ืฆื“ ืœืืกื•ืฃ ืงื‘ื•ืฆื•ืช ืžืฉืชืžืฉื™ื ื›ื’ืจืคื™ื ื‘ื’ืจืืคืื ื” [+ ืชืžื•ื ืช docker ืขื ื“ื•ื’ืžื”]

ื›ื™ืฆื“ ืคืชืจื ื• ืืช ื‘ืขื™ื™ืช ื”ื“ืžื™ื™ืช ืงื‘ื•ืฆื•ืช ืฉืœ ืžืฉืชืžืฉื™ื ื‘ืฉื™ืจื•ืช ืคืจื•ืžื•ืคื•ืœื˜ ื‘ืืžืฆืขื•ืช Grafana.

ืœืงื“ื - ืฉื™ืจื•ืช ืจื‘ ืขื•ืฆืžื” ืขื ืžืกืคืจ ืจื‘ ืฉืœ ืžืฉืชืžืฉื™ื. ื‘ืžื”ืœืš 10 ืฉื ื•ืช ื”ืคืขื™ืœื•ืช, ืขืœื” ืžืกืคืจ ื”ืจื™ืฉื•ืžื™ื ื‘ืžืขืจื›ืช ืขืœ ืžื™ืœื™ื•ืŸ. ืžื™ ืฉื ืชืงืœ ื‘ืฉื™ืจื•ืชื™ื ื“ื•ืžื™ื ื™ื•ื“ืข ืฉืžืขืจืš ื”ืžืฉืชืžืฉื™ื ื”ื–ื” ืจื—ื•ืง ืžืœื”ื™ื•ืช ื”ื•ืžื•ื’ื ื™.

ืžื™ืฉื”ื• ื ืจืฉื ื•"ื ืจื“ื" ืœื ืฆื—. ืžื™ืฉื”ื• ืฉื›ื— ืืช ื”ืกื™ืกืžื” ืฉืœื• ื•ื ืจืฉื ืขื•ื“ ื›ืžื” ืคืขืžื™ื ื‘ืžื”ืœืš ืฉื™ืฉื” ื—ื•ื“ืฉื™ื. ืžื™ืฉื”ื• ืžื‘ื™ื ื›ืกืฃ ืœืงื•ืคื”, ื•ืžื™ืฉื”ื• ื‘ื ื‘ืฉื‘ื™ืœ ื—ื™ื ื ืžื›ืฉื™ืจื™ื. ื•ื–ื” ื™ื”ื™ื” ื ื—ืžื“ ืœืงื‘ืœ ืงืฆืช ืจื•ื•ื— ืžื›ื•ืœื.

ืขืœ ืžืขืจื›ื™ ื ืชื•ื ื™ื ื›ื” ื’ื“ื•ืœื™ื ื›ืžื• ืฉืœื ื•, ื ื™ืชื•ื— ื”ืชื ื”ื’ื•ืชื• ืฉืœ ืžืฉืชืžืฉ ื‘ื•ื“ื“ ื•ืงื‘ืœืช ืžื™ืงืจื•-ื”ื—ืœื˜ื•ืช ื”ื•ื ื—ืกืจ ื˜ืขื. ืื‘ืœ ืœืชืคื•ืก ื˜ืจื ื“ื™ื ื•ืœืขื‘ื•ื“ ืขื ืงื‘ื•ืฆื•ืช ื’ื“ื•ืœื•ืช ื–ื” ืืคืฉืจื™ ื•ื”ื›ืจื—ื™. ื•ื–ื” ื‘ื“ื™ื•ืง ืžื” ืฉืื ื—ื ื• ืขื•ืฉื™ื.

ืกื™ื›ื•ื

  1. ืžื”ื• ื ื™ืชื•ื— ืขื•ืงื‘ื•ืช ื•ืžื“ื•ืข ื”ื•ื ื ื—ื•ืฅ?
  2. ื›ื™ืฆื“ ืœื™ืฆื•ืจ ืงื‘ื•ืฆื•ืช ืœืคื™ ื—ื•ื“ืฉ ืฉืœ ืจื™ืฉื•ื ืžืฉืชืžืฉ ื‘-SQL.
  3. ืื™ืš ืœื”ืขื‘ื™ืจ ืงื‘ื•ืฆื•ืช ืœ ื’ืจืคื ื”.

ืื ืืชื” ื›ื‘ืจ ื™ื•ื“ืข ืžื” ื–ื” ื ื™ืชื•ื— ืขื•ืงื‘ื•ืช ื•ื›ื™ืฆื“ ืœืขืฉื•ืช ื–ืืช ื‘-SQL, ื“ืœื’ ื™ืฉื™ืจื•ืช ืœืกืขื™ืฃ ื”ืื—ืจื•ืŸ.

1. ืžื”ื• ื ื™ืชื•ื— ืขื•ืงื‘ื•ืช ื•ืžื“ื•ืข ื”ื•ื ื ื—ื•ืฅ?

ื ื™ืชื•ื— ืขื•ืงื‘ื•ืช ื”ื•ื ืฉื™ื˜ื” ื”ืžื‘ื•ืกืกืช ืขืœ ื”ืฉื•ื•ืื” ื‘ื™ืŸ ืงื‘ื•ืฆื•ืช (ืงื•ื”ื•ืจื˜ื•ืช) ืฉื•ื ื•ืช ืฉืœ ืžืฉืชืžืฉื™ื. ืœืจื•ื‘, ื”ืงื‘ื•ืฆื•ืช ืฉืœื ื• ื ื•ืฆืจื•ืช ืœืคื™ ื”ืฉื‘ื•ืข ืื• ื”ื—ื•ื“ืฉ ืฉื‘ื• ื”ืžืฉืชืžืฉ ื”ืชื—ื™ืœ ืœื”ืฉืชืžืฉ ื‘ืฉื™ืจื•ืช. ืžื›ืืŸ ืžื—ื•ืฉื‘ ืžืฉืš ื”ื—ื™ื™ื ืฉืœ ื”ืžืฉืชืžืฉ, ื•ื–ื”ื• ืื™ื ื“ื™ืงื˜ื•ืจ ืฉืขืœ ื‘ืกื™ืกื• ื ื™ืชืŸ ืœื‘ืฆืข ื ื™ืชื•ื— ืžื•ืจื›ื‘ ืœืžื“ื™. ืœื“ื•ื’ืžื”, ื”ื‘ืŸ:

  • ื›ื™ืฆื“ ืžืฉืคื™ืข ืขืจื•ืฅ ื”ืจื›ื™ืฉื” ืขืœ ื—ื™ื™ ื”ืžืฉืชืžืฉ;
  • ื›ื™ืฆื“ ื”ืฉื™ืžื•ืฉ ื‘ื›ืœ ืชื›ื•ื ื” ืื• ืฉื™ืจื•ืช ืžืฉืคื™ืข ืขืœ ื—ื™ื™ื•;
  • ื›ื™ืฆื“ ื”ื”ืฉืงื” ืฉืœ ืชื›ื•ื ื” X ื”ืฉืคื™ืขื” ืขืœ ืžืฉืš ื”ื—ื™ื™ื ื‘ื”ืฉื•ื•ืื” ืœืฉื ื” ืฉืขื‘ืจื”.

2. ืื™ืš ืขื•ืฉื™ื ืงื‘ื•ืฆื•ืช ื‘-SQL?

ื’ื•ื“ืœ ื”ื›ืชื‘ื” ื•ื”ืฉื›ืœ ื”ื™ืฉืจ ืœื ืžืืคืฉืจื™ื ืœื ื• ืœื”ืฆื™ื’ ื›ืืŸ ืืช ื”ื ืชื•ื ื™ื ื”ืืžื™ืชื™ื™ื ืฉืœื ื• - ื‘ืžื–ื‘ืœื” ืฉืœ ื”ื‘ื“ื™ืงื•ืช, ืกื˜ื˜ื™ืกื˜ื™ืงื” ืฉืœ ืฉื ื” ื•ื—ืฆื™: 1200 ืžืฉืชืžืฉื™ื ื•-53 ืขืกืงืื•ืช. ื›ื“ื™ ืฉืชื•ื›ืœื• ืœืฉื—ืง ืขื ื”ื ืชื•ื ื™ื ื”ืืœื”, ื”ื›ื ื• ืชืžื•ื ืช docker ืขื MySQL ื•-Grafana ื‘ื” ืชื•ื›ืœื• ืœื—ื•ื•ืช ื”ื›ืœ ื‘ืขืฆืžื›ื. ืงื™ืฉื•ืจ ืœ-GitHub ื‘ืกื•ืฃ ื”ืžืืžืจ.

ื•ื›ืืŸ ื ืจืื” ืืช ื™ืฆื™ืจืช ื”ืงื‘ื•ืฆื•ืช ื‘ืืžืฆืขื•ืช ื“ื•ื’ืžื” ืคืฉื•ื˜ื”.

ื ื ื™ื— ืฉื™ืฉ ืœื ื• ืฉื™ืจื•ืช. ืžืฉืชืžืฉื™ื ื ืจืฉืžื™ื ืฉื ื•ืžื•ืฆื™ืื™ื ื›ืกืฃ ืขืœ ืฉื™ืจื•ืชื™ื. ืขื ื”ื–ืžืŸ, ืžืฉืชืžืฉื™ื ื ื•ืฉืจื™ื. ืื ื• ืจื•ืฆื™ื ืœื’ืœื•ืช ื›ืžื” ื–ืžืŸ ื—ื™ื™ื ื”ืžืฉืชืžืฉื™ื, ื•ื›ืžื” ืžื”ื ื ื•ืคืœื™ื ืœืื—ืจ ื”ื—ื•ื“ืฉ ื”ืจืืฉื•ืŸ ื•ื”ืฉื ื™ ืฉืœ ื”ืฉื™ืžื•ืฉ ื‘ืฉื™ืจื•ืช.

ื›ื“ื™ ืœืขื ื•ืช ืขืœ ืฉืืœื•ืช ืืœื•, ืขืœื™ื ื• ืœื‘ื ื•ืช ืงื‘ื•ืฆื•ืช ืขืœ ืกืžืš ื—ื•ื“ืฉ ื”ืจื™ืฉื•ื. ืื ื• ื ืžื“ื•ื“ ืคืขื™ืœื•ืช ืœืคื™ ื”ื•ืฆืื•ืช ื‘ื›ืœ ื—ื•ื“ืฉ. ื‘ืžืงื•ื ื”ื•ืฆืื•ืช, ืขืฉื•ื™ื•ืช ืœื”ื™ื•ืช ื”ื–ืžื ื•ืช, ื“ืžื™ ืžื ื•ื™ ืื• ื›ืœ ืคืขื™ืœื•ืช ืื—ืจืช ื”ืžื‘ื•ืกืกืช ืขืœ ื–ืžืŸ.

ื ืชื•ื ื™ื ื’ื•ืœืžื™ื™ื

ื”ื“ื•ื’ืžืื•ืช ื ืขืฉื• ื‘-MySQL, ืืš ืขื‘ื•ืจ DBMSs ืื—ืจื™ื ืœื ืืžื•ืจื™ื ืœื”ื™ื•ืช ื”ื‘ื“ืœื™ื ืžืฉืžืขื•ืชื™ื™ื.

ื˜ื‘ืœืช ืžืฉืชืžืฉื™ื - ืžืฉืชืžืฉื™ื:

ืชืขื•ื“ืช ื–ื”ื•ืช ืฉืœ ื”ืžืฉืชืžืฉ
ืชืืจื™ืš ืจื™ืฉื•ื

1
2019-01-01

2
2019-02-01

3
2019-02-10

4
2019-03-01

ื˜ื‘ืœืช ืขืœื•ื™ื•ืช - ื—ื™ื•ื‘:

ืชืขื•ื“ืช ื–ื”ื•ืช ืฉืœ ื”ืžืฉืชืžืฉ
ืชึทืึฒืจึดื™ืš
ืกื›ื•ื

1
2019-01-02
11

1
2019-02-22
11

2
2019-02-12
12

3
2019-02-11
13

3
2019-03-11
13

4
2019-03-01
14

4
2019-03-02
14

ื‘ื—ืจ ืืช ื›ืœ ืžื—ื™ืงื•ืช ื”ืžืฉืชืžืฉ ื•ืชืืจื™ืš ื”ืจื™ืฉื•ื:

SELECT 
  b.userId, 
  b.Date,
  u.RegistrationDate
FROM billing AS b LEFT JOIN users AS u ON b.userId = u.userId

ืชื•ืฆืื”:

ืชืขื•ื“ืช ื–ื”ื•ืช ืฉืœ ื”ืžืฉืชืžืฉ
ืชึทืึฒืจึดื™ืš
ืชืืจื™ืš ืจื™ืฉื•ื

1
2019-01-02
2019-01-02

1
2019-02-22
2019-01-02

2
2019-02-12
2019-02-01

3
2019-02-11
2019-02-10

3
2019-03-11
2019-02-10

4
2019-03-01
2019-03-01

4
2019-03-02
2019-03-01

ืื ื• ื‘ื•ื ื™ื ืงื‘ื•ืฆื•ืช ืœืคื™ ื—ื•ื“ืฉ; ืœืฉื ื›ืš, ืื ื• ืžืžื™ืจื™ื ืืช ื›ืœ ื”ืชืืจื™ื›ื™ื ืœื—ื•ื“ืฉื™ื:

DATE_FORMAT(Date, '%Y-%m')

ืขื›ืฉื™ื• ืฆืจื™ืš ืœื“ืขืช ื›ืžื” ื—ื•ื“ืฉื™ื ื”ืžืฉืชืžืฉ ื”ื™ื” ืคืขื™ืœ - ื–ื” ื”ื”ื‘ื“ืœ ื‘ื™ืŸ ื—ื•ื“ืฉ ื”ื—ื™ื•ื‘ ืœื—ื•ื“ืฉ ื”ื”ืจืฉืžื”. ืœ-MySQL ื™ืฉ ืคื•ื ืงืฆื™ื” PERIOD_DIFF() - ื”ื”ื‘ื“ืœ ื‘ื™ืŸ ื—ื•ื“ืฉื™ื™ื. ื”ื•ืกืฃ PERIOD_DIFF() ืœื‘ืงืฉื”:

SELECT
    b.userId,
    DATE_FORMAT(b.Date, '%Y-%m') AS BillingMonth,
    DATE_FORMAT(u.RegistrationDate, '%Y-%m') AS RegistrationMonth,
    PERIOD_DIFF(DATE_FORMAT(b.Date, '%Y%m'), DATE_FORMAT(u.RegistrationDate, '%Y%m')) AS MonthsDiff
FROM billing AS b LEFT JOIN users AS u ON b.userId = u.userId

ืชืขื•ื“ืช ื–ื”ื•ืช ืฉืœ ื”ืžืฉืชืžืฉ
ื—ื•ื“ืฉ ื”ื—ื™ื•ื‘
ืชืืจื™ืš ืจื™ืฉื•ื
MonthsDiff

1
2019-01
2019-01
0

1
2019-02
2019-01
1

2
2019-02
2019-02
0

3
2019-02
2019-02
0

3
2019-03
2019-02
1

4
2019-03
2019-03
0

4
2019-03
2019-03
0

ืื ื• ืกื•ืคืจื™ื ืืช ื”ืžืฉืชืžืฉื™ื ืฉื”ื•ืคืขืœื• ื‘ื›ืœ ื—ื•ื“ืฉ - ืื ื• ืžืงื‘ืฆื™ื ืืช ื”ืจืฉื•ืžื•ืช ืœืคื™ ื—ื•ื“ืฉ ื—ื™ื•ื‘, ื—ื•ื“ืฉ ื”ืจืฉืžื” ื•ืฉื™ื ื•ื™ ื—ื•ื“ืฉื™:

SELECT
    COUNT(DISTINCT(b.userId)) AS UsersCount,
    DATE_FORMAT(b.Date, '%Y-%m') AS BillingMonth,
    DATE_FORMAT(u.RegistrationDate, '%Y-%m') AS RegistrationMonth,
    PERIOD_DIFF(DATE_FORMAT(b.Date, '%Y%m'), DATE_FORMAT(u.RegistrationDate, '%Y%m')) AS MonthsDiff
FROM billing AS b LEFT JOIN users AS u ON b.userId = u.userId
GROUP BY BillingMonth, RegistrationMonth, MonthsDiff

ืชื•ืฆืื”:

ืกืคื™ืจืช ืžืฉืชืžืฉื™ื
ื—ื•ื“ืฉ ื”ื—ื™ื•ื‘
ื—ื•ื“ืฉ ื”ืจืฉืžื”
MonthsDiff

1
2019-01
2019-01
0

1
2019-02
2019-01
1

2
2019-02
2019-02
0

1
2019-03
2019-02
1

1
2019-03
2019-03
0

ื‘ื—ื•ื“ืฉื™ื ื™ื ื•ืืจ, ืคื‘ืจื•ืืจ ื•ืžืจืฅ, ื”ื•ืคื™ืข ืžืฉืชืžืฉ ื—ื“ืฉ ืื—ื“ ื›ืœ ืื—ื“ - MonthsDiff = 0. ืžืฉืชืžืฉ ืื—ื“ ื‘ื™ื ื•ืืจ ื”ื™ื” ืคืขื™ืœ ื‘ืคื‘ืจื•ืืจ - RegistrationMonth = 2019-01, BillingMonth = 2019-02, ื•ืžืฉืชืžืฉ ืื—ื“ ื‘ืคื‘ืจื•ืืจ ื”ื™ื” ืคืขื™ืœ ื‘ืžืจืฅ.

ื‘ืื•ืคืŸ ื˜ื‘ืขื™, ื“ืคื•ืกื™ื ื ืจืื™ื ื˜ื•ื‘ ื™ื•ืชืจ ื‘ืžืขืจืš ื ืชื•ื ื™ื ื’ื“ื•ืœ.

ืื™ืš ืžืขื‘ื™ืจื™ื ืงื‘ื•ืฆื•ืช ืœื’ืจืคืื ื”

ืœืžื“ื ื• ืื™ืš ื™ื•ืฆืจื™ื ืงื‘ื•ืฆื•ืช, ืื‘ืœ ื›ืฉื™ืฉ ื”ืจื‘ื” ืจืฉื•ืžื•ืช, ื›ื‘ืจ ืœื ืงืœ ืœื ืชื— ืื•ืชื. ื ื™ืชืŸ ืœื™ื™ืฆื ืจืฉื•ืžื•ืช ืœืืงืกืœ ื•ืœื™ืฆื•ืจ ืœื˜ื‘ืœืื•ืช ื™ืคื•ืช, ืื‘ืœ ื–ื• ืœื ื”ืฉื™ื˜ื” ืฉืœื ื•!

ื ื™ืชืŸ ืœื”ืฆื™ื’ ืงื‘ื•ืฆื•ืช ืงื‘ื•ืฆื•ืช ื›ืชืจืฉื™ื ืื™ื ื˜ืจืืงื˜ื™ื‘ื™ ื‘ ื’ืจืคื ื”.

ืœืฉื ื›ืš, ืื ื• ืžื•ืกื™ืคื™ื ืฉืื™ืœืชื” ื ื•ืกืคืช ืœื”ืžืจืช ื”ื ืชื•ื ื™ื ืœืคื•ืจืžื˜ ื”ืžืชืื™ื ืœ-Grafana:

SELECT
  DATE_ADD(CONCAT(s.RegistrationMonth, '-01'), INTERVAL s.MonthsDiff MONTH) AS time_sec,
  SUM(s.Users) AS value,
  s.RegistrationMonth AS metric
FROM (
  ## ัั‚ะฐั€ั‹ะน ะทะฐะฟั€ะพั, ะฒะพะทะฒั€ะฐั‰ะฐัŽั‰ะธะน ะบะพะณะพั€ั‚ั‹
  SELECT 
    COUNT(DISTINCT(b.userId)) AS Users, 
    DATE_FORMAT(b.Date, '%Y-%m') AS BillingMonth,
    DATE_FORMAT(u.RegistrationDate, '%Y-%m') AS RegistrationMonth,
    PERIOD_DIFF(DATE_FORMAT(b.Date, '%Y%m'), DATE_FORMAT(u.RegistrationDate, '%Y%m')) AS MonthsDiff
  FROM billing AS b LEFT JOIN users AS u ON b.userId = u.userId
  WHERE
    u.RegistrationDate BETWEEN '2018-01-01' AND CURRENT_DATE
  GROUP BY 
    BillingMonth, RegistrationMonth, MonthsDiff 
) AS s
GROUP BY 
  time_sec, metric

ื•ืชืขืœื” ืืช ื”ื ืชื•ื ื™ื ืœื’ืจืคื ื”.

ื’ืจืฃ ืœื“ื•ื’ืžื” ืž ื”ื“ื’ืžื”:

ื›ื™ืฆื“ ืœืืกื•ืฃ ืงื‘ื•ืฆื•ืช ืžืฉืชืžืฉื™ื ื›ื’ืจืคื™ื ื‘ื’ืจืืคืื ื” [+ ืชืžื•ื ืช docker ืขื ื“ื•ื’ืžื”]

ืœื’ืขืช ืขื ื”ื™ื“ื™ื™ื:

ืžืื’ืจ GitHub ืขื ื“ื•ื’ืžื” ื”ื™ื ืชืžื•ื ืช docker ืขื MySQL ื•-Grafana ืฉืชื•ื›ืœ ืœื”ืคืขื™ืœ ื‘ืžื—ืฉื‘ ืฉืœืš. ื”ืžืื’ืจ ื›ื‘ืจ ืžื›ื™ืœ ื ืชื•ื ื™ ื”ื“ื’ืžื” ื‘ืžืฉืš ืฉื ื” ื•ื—ืฆื™, ืžื™ื ื•ืืจ 2018 ืขื“ ื™ื•ืœื™ 2019.

ืื ืชืจืฆื”, ืชื•ื›ืœ ืœื˜ืขื•ืŸ ืืช ื”ื ืชื•ื ื™ื ืฉืœืš ืœืชืžื•ื ื” ื–ื•.

ื .ื‘ ืžืืžืจื™ื ืขืœ ื ื™ืชื•ื— ืขื•ืงื‘ื•ืช ื‘-SQL:

https://chartio.com/resources/tutorials/performing-cohort-analysis-using-mysql/

https://www.holistics.io/blog/calculate-cohort-retention-analysis-with-sql/

ืžืงื•ืจ: www.habr.com

ื”ื•ืกืคืช ืชื’ื•ื‘ื”