Saiki, meh kabeh perusahaan ing donya ngumpulake statistik babagan tumindak pangguna ing sumber web. Motivasi jelas - perusahaan pengin ngerti carane produk / situs web digunakake lan luwih ngerti pangguna. Mesthi, ana akeh alat ing pasar kanggo ngatasi masalah iki - saka sistem analytics sing nyedhiyakake data ing wangun dashboard lan grafik (contone,
Nanging kita nemokake masalah sing durung rampung. Dadi lair
Yagene kita kudu ngembangake layanan kita dhewe?
Iku taun nineties, kita slamet minangka paling kita bisa. 2019, kita ngembangake API Platform Data Pelanggan Pertama kSense, sing ngidini nglumpukake data saka macem-macem sumber (iklan Facebook, Stripe, Salesforce, Google play, Google Analytics, etc.) kanggo analisis data sing luwih trep, ngenali dependensi, lsp. Kita wis weruh manawa akeh pangguna nggunakake platform analytics data, khususe Google Analytics (sabanjure diarani GA). Kita ngomong karo sawetara pangguna lan nemokake yen dheweke butuh data analytics produk, sing ditampa nggunakake GA, nanging
Dheweke nginstal piksel javascript Segmen ing sumber web lan data prilaku pangguna dimuat menyang database tartamtu (contone Postgres). Nanging Segmen uga duwe minus - rega. Contone, yen sumber web duwe 90,000 MTU (pangguna sing dilacak saben wulan), sampeyan kudu mbayar ~ $ 1,000 saben wulan menyang kasir. Ana uga masalah katelu - sawetara ekstensi browser (kayata AdBlock) mblokir koleksi analytics. Panjaluk http saka browser dikirim menyang domain GA lan Segmen. Adhedhasar kepinginan para klien, kita wis nggawe layanan analytics sing ngumpulake data lengkap (tanpa sampling), gratis lan bisa nggarap infrastruktur kita dhewe.
Carane layanan dianggo
Layanan kasebut dumadi saka telung bagean: piksel javascript (sing banjur ditulis maneh dadi typescript), bagean server sing diimplementasikake ing basa GO, lan direncanakake nggunakake Redshift lan BigQuery minangka basis data ing omah (mengko ditambahake dhukungan kanggo Postgres. , ClickHouse lan Snowflake).
Struktur acara GA lan Segmen mutusake supaya ora diganti. Kabeh sing dibutuhake yaiku duplikat kabeh acara saka sumber web ing ngendi piksel dipasang ing backend kita. Ternyata, iki gampang ditindakake. Piksel Javascript ngganti metode perpustakaan GA asli kanthi cara anyar sing nggawe duplikat acara kasebut ing sistem kita.
//'ga' - ΡΡΠ°Π½Π΄Π°ΡΡΠ½ΠΎΠ΅ Π½Π°Π·Π²Π°Π½ΠΈΠ΅ ΠΏΠ΅ΡΠ΅ΠΌΠ΅Π½Π½ΠΎΠΉ Google Analytics
if (window.ga) {
ga(tracker => {
var originalSendHitTask = tracker.get('sendHitTask');
tracker.set('sendHitTask', (model) => {
var payLoad = model.get('hitPayload');
//ΠΎΡΠΏΡΠ°Π²ΠΊΠ° ΠΎΡΠΈΠ³ΠΈΠ½Π°Π»ΡΠ½ΠΎΠ³ΠΎ ΡΠΎΠ±ΡΡΠΈΡ Π² GA
originalSendHitTask(model);
let jsonPayload = this.parseQuery(payLoad);
//ΠΎΡΠΏΡΠ°Π²ΠΊΠ° ΡΠΎΠ±ΡΡΠΈΡ Π² Π½Π°Ρ ΡΠ΅ΡΠ²ΠΈΡ
this.send3p('ga', jsonPayload);
});
});
}
Kanthi piksel Segmen, kabeh iku prasaja, wis cara middleware, lan kita digunakake salah siji saka wong-wong mau.
//'analytics' - ΡΡΠ°Π½Π΄Π°ΡΡΠ½ΠΎΠ΅ Π½Π°Π·Π²Π°Π½ΠΈΠ΅ ΠΏΠ΅ΡΠ΅ΠΌΠ΅Π½Π½ΠΎΠΉ Segment
if (window.analytics) {
if (window.analytics.addSourceMiddleware) {
window.analytics.addSourceMiddleware(chain => {
try {
//Π΄ΡΠ±Π»ΠΈΡΠΎΠ²Π°Π½ΠΈΠ΅ ΡΠΎΠ±ΡΡΠΈΡ Π² Π½Π°Ρ ΡΠ΅ΡΠ²ΠΈΡ
this.send3p('ajs', chain.payload);
} catch (e) {
LOG.warn('Failed to send an event', e)
}
//ΠΎΡΠΏΡΠ°Π²ΠΊΠ° ΠΎΡΠΈΠ³ΠΈΠ½Π°Π»ΡΠ½ΠΎΠ³ΠΎ ΡΠΎΠ±ΡΡΠΈΡ Π² Segment
chain.next(chain.payload);
});
} else {
LOG.warn("Invalid interceptor state. Analytics js initialized, but not completely");
}
} else {
LOG.warn('Analytics.js listener is not set.');
}
Saliyane nyalin acara, kita nambahake kemampuan kanggo ngirim json sing sewenang-wenang:
//ΠΡΠΏΡΠ°Π²ΠΊΠ° ΡΠΎΠ±ΡΡΠΈΠΉ Ρ ΠΏΡΠΎΠΈΠ·Π²ΠΎΠ»ΡΠ½ΡΠΌ json ΠΎΠ±ΡΠ΅ΠΊΡΠΎΠΌ
eventN.track('product_page_view', {
product_id: '1e48fb70-ef12-4ea9-ab10-fd0b910c49ce',
product_price: 399.99,
price_currency: 'USD'
product_release_start: '2020-09-25T12:38:27.763000Z'
});
Sabanjure, ayo ngomong babagan sisih server. Backend kudu nampa panjalukan http, isi karo informasi tambahan, contone, geodata (thanks
//Π²Ρ
ΠΎΠ΄ΡΡΠΈΠΉ json
{
"field_1": {
"sub_field_1": "text1",
"sub_field_2": 100
},
"field_2": "text2",
"field_3": {
"sub_field_1": {
"sub_sub_field_1": "2020-09-25T12:38:27.763000Z"
}
}
}
//ΡΠ΅Π·ΡΠ»ΡΡΠ°Ρ
{
"field_1_sub_field_1": "text1",
"field_1_sub_field_2": 100,
"field_2": "text2",
"field_3_sub_field_1_sub_sub_field_1": "2020-09-25T12:38:27.763000Z"
}
Nanging, array saiki mung diowahi dadi strings. ora kabeh database relasional ndhukung kolom bola-bali. Sampeyan uga bisa ngganti jeneng lapangan utawa mbusak kanthi nggunakake aturan pemetaan opsional. Padha ngidini sampeyan ngganti skema data, yen perlu, utawa ngirim siji jinis data menyang liyane. Contone, yen kolom json ngemot string kanthi stempel wektu (field_3_sub_field_1_sub_sub_field_1 saka conto ing ndhuwur), banjur kanggo nggawe lapangan ing database karo jinis timestamp, sampeyan kudu nulis aturan pemetaan ing konfigurasi. Ing tembung liyane, jinis data lapangan ditemtokake pisanan dening Nilai json, lan banjur aturan casting jinis (yen diatur) diterapake. Kita wis nemtokake 4 jinis data utama: STRING, FLOAT64, INT64 lan TIMESTAMP. Aturan pemetaan lan casting katon kaya iki:
rules:
- "/field_1/subfield_1 -> " #ΠΏΡΠ°Π²ΠΈΠ»ΠΎ ΡΠ΄Π°Π»Π΅Π½ΠΈΡ ΠΏΠΎΠ»Ρ
- "/field_2/subfield_1 -> /field_10/subfield_1" #ΠΏΡΠ°Π²ΠΈΠ»ΠΎ ΠΏΠ΅ΡΠ΅Π½ΠΎΡΠ° ΠΏΠΎΠ»Ρ
- "/field_3/subfield_1/subsubfield_1 -> (timestamp) /field_20" #ΠΏΡΠ°Π²ΠΈΠ»ΠΎ ΠΏΠ΅ΡΠ΅Π½ΠΎΡΠ° ΠΏΠΎΠ»Ρ ΠΈ ΠΏΡΠΈΠ²Π΅Π΄Π΅Π½ΠΈΡ ΡΠΈΠΏΠ°
Algoritma kanggo nemtokake jinis data:
- Ngonversi json struct kanggo flat struct
- nemtokake jinis data kolom kanthi nilai
- nglamar aturan pemetaan lan jinis casting
Banjur saka struktur json mlebu:
{
"product_id": "1e48fb70-ef12-4ea9-ab10-fd0b910c49ce",
"product_price": 399.99,
"price_currency": "USD",
"product_type": "supplies",
"product_release_start": "2020-09-25T12:38:27.763000Z",
"images": {
"main": "picture1",
"sub": "picture2"
}
}
skema data bakal dijupuk:
"product_id" character varying,
"product_price" numeric (38,18),
"price_currency" character varying,
"product_type" character varying,
"product_release_start" timestamp,
"images_main" character varying,
"images_sub" character varying
Kita uga mikir manawa pangguna kudu bisa nyiyapake partisi utawa pamisah data ing basis data miturut kritΓ©ria liyane lan ngetrapake kemampuan kanggo nyetel jeneng tabel minangka konstanta utawa
tableName: '{{.product_type}}_{{._timestamp.Format "2006_01"}}'
Nanging, struktur acara sing mlebu bisa diganti nalika runtime. Kita wis ngetrapake algoritma kanggo mriksa prabΓ©dan antarane struktur tabel sing ana lan struktur acara sing bakal teka. Yen prabΓ©dan ditemokake, tabel bakal dianyari karo lapangan anyar. Kanggo nindakake iki, gunakake pitakon SQL patch:
#ΠΡΠΈΠΌΠ΅Ρ Π΄Π»Ρ Postgres
ALTER TABLE "schema"."table" ADD COLUMN new_column character varying
arsitektur
Apa sampeyan kudu nulis acara menyang sistem file, lan ora mung nulis langsung menyang database? Basis data ora tansah nuduhake kinerja dhuwur kanthi jumlah sisipan sing akeh (
Open Source lan rencana mangsa
Ing sawetara titik, layanan kasebut dadi kaya produk lengkap lan kita mutusake kanggo nyelehake ing Open Source. Saiki, integrasi karo Postgres, ClickHouse, BigQuery, Redshift, S3, Snowflake wis dileksanakake. Kabeh integrasi ndhukung mode muat data batch lan streaming. Dhukungan ditambahake kanggo panjaluk liwat API.
Skema integrasi saiki katon kaya iki:
Sanajan layanan kasebut bisa digunakake kanthi mandiri (contone, nggunakake Docker), kita uga duwe
β
β
β
Kita bakal bungah yen EventNative bakal mbantu ngatasi masalah sampeyan!
Mung pangguna pangguna sing bisa melu survey.
Sistem koleksi statistik apa sing digunakake ing perusahaan sampeyan
-
48,0%Google Analytics 12
-
4,0%Segmen 1
-
16,0%Liyane (tulis ing komentar) 4
-
32,0%Dilaksanakake layanan sampeyan8
25 pangguna milih. 6 kedhaftar abstained.
Source: www.habr.com