A halin yanzu, kusan kowane kamfani a duniya yana tattara ƙididdiga game da ayyukan mai amfani akan albarkatun yanar gizo. Ƙa'idar a bayyane yake - kamfanoni suna son sanin yadda ake amfani da samfurin su / gidan yanar gizon su kuma su fahimci masu amfani da su. Tabbas, akwai kayan aikin da yawa a kasuwa don magance wannan matsala - daga tsarin nazari waɗanda ke ba da bayanai a cikin nau'ikan dashboards da jadawalai (misali.
Amma mun sami matsalar da ba a warware ta ba tukuna. Haka aka haife shi
Me ya sa za mu ci gaba da hidimarmu?
Shekaru casa'in ne, mun tsira gwargwadon iyawarmu. 2019, mun haɓaka dandali na Bayanan Abokin Ciniki na Farko na API kSense, wanda ya ba da damar tattara bayanai daga tushe daban-daban ( tallace-tallace na Facebook, Stripe, Salesforce, Google play, Google Analytics, da dai sauransu) don ƙarin nazarin bayanai masu dacewa, gano masu dogara, da dai sauransu. Mun lura cewa yawancin masu amfani suna amfani da dandalin mu don nazarin bayanai musamman Google Analytics (nan gaba GA). Mun yi magana da wasu masu amfani kuma mun gano cewa suna buƙatar bayanan nazari don samfurin su wanda suke karɓa ta amfani da GA, amma
Sun shigar da pixel javascript na Segment akan albarkatun yanar gizon su kuma an ɗora bayanai game da halayen masu amfani da su cikin ƙayyadaddun bayanai (misali Postgres). Amma Segment kuma yana da raunin sa - farashin. Misali, idan albarkatun yanar gizon yana da MTU 90,000 (masu amfani da sa ido kowane wata), to kuna buƙatar biya ~ 1,000 $ kowane wata ga mai karɓar kuɗi. Hakanan an sami matsala ta uku - wasu kari na bincike (kamar AdBlock) sun toshe tarin nazari saboda... An aika buƙatun http daga mai binciken zuwa ga yankunan GA da Yanki. Dangane da buri na abokan cinikinmu, mun ƙirƙiri sabis na nazari wanda ke tattara cikakken saitin bayanai (ba tare da samfuri ba), kyauta ne kuma yana iya aiki akan abubuwan more rayuwa na mu.
Yadda sabis ɗin ke aiki
Sabis ɗin ya ƙunshi sassa uku: pixel javascript (wanda daga baya muka sake rubutawa a cikin nau'in rubutu), ana aiwatar da sashin uwar garken a cikin yaren GO, kuma an tsara shi don amfani da Redshift da BigQuery azaman bayanan cikin gida (daga baya sun ƙara tallafi don Postgres, ClickHouse da Snowflake).
An yanke shawarar barin tsarin GA da abubuwan da suka faru ba su canza ba. Duk abin da ake buƙata shine kwafin duk abubuwan da suka faru daga tushen yanar gizon inda aka shigar da pixel zuwa ƙarshen mu. Kamar yadda ya fito, wannan ba shi da wahala a yi. pixel Javascript ya mamaye ainihin hanyar ɗakin karatu na GA tare da sabo, wanda ya kwafi abin da ya faru a cikin tsarin mu.
//'ga' - стандартное название переменной Google Analytics
if (window.ga) {
ga(tracker => {
var originalSendHitTask = tracker.get('sendHitTask');
tracker.set('sendHitTask', (model) => {
var payLoad = model.get('hitPayload');
//отправка оригинального события в GA
originalSendHitTask(model);
let jsonPayload = this.parseQuery(payLoad);
//отправка события в наш сервис
this.send3p('ga', jsonPayload);
});
});
}
Tare da pixel Segment komai ya fi sauƙi; yana da hanyoyin tsakiya, ɗayan wanda muka yi amfani da su.
//'analytics' - стандартное название переменной Segment
if (window.analytics) {
if (window.analytics.addSourceMiddleware) {
window.analytics.addSourceMiddleware(chain => {
try {
//дублирование события в наш сервис
this.send3p('ajs', chain.payload);
} catch (e) {
LOG.warn('Failed to send an event', e)
}
//отправка оригинального события в Segment
chain.next(chain.payload);
});
} else {
LOG.warn("Invalid interceptor state. Analytics js initialized, but not completely");
}
} else {
LOG.warn('Analytics.js listener is not set.');
}
Baya ga kwafin abubuwan da suka faru, mun ƙara ikon aika json na sabani:
//Отправка событий с произвольным json объектом
eventN.track('product_page_view', {
product_id: '1e48fb70-ef12-4ea9-ab10-fd0b910c49ce',
product_price: 399.99,
price_currency: 'USD'
product_release_start: '2020-09-25T12:38:27.763000Z'
});
Na gaba, bari muyi magana game da sashin uwar garken. Mai baya yakamata ya karɓi buƙatun http, cika su da ƙarin bayani, misali, bayanan geo (na gode
//входящий json
{
"field_1": {
"sub_field_1": "text1",
"sub_field_2": 100
},
"field_2": "text2",
"field_3": {
"sub_field_1": {
"sub_sub_field_1": "2020-09-25T12:38:27.763000Z"
}
}
}
//результат
{
"field_1_sub_field_1": "text1",
"field_1_sub_field_2": 100,
"field_2": "text2",
"field_3_sub_field_1_sub_sub_field_1": "2020-09-25T12:38:27.763000Z"
}
Koyaya, tsararru a halin yanzu ana canza su kawai zuwa kirtani saboda Ba duk bayanan da ke da alaƙa suna goyan bayan filayen maimaitawa ba. Hakanan yana yiwuwa a canza sunaye ko share su ta amfani da ƙa'idodin taswira na zaɓi. Suna ba ku damar canza tsarin bayanai idan ya cancanta ko canza nau'in bayanai ɗaya zuwa wani. Misali, idan filin json ya ƙunshi kirtani mai tambarin lokaci (filin_3_sub_filin_1_sub_sub_filin_1 daga misalin da ke sama), sannan don ƙirƙirar filin a cikin ma'ajin bayanai tare da nau'in tambarin lokaci, kuna buƙatar rubuta ƙa'idar taswira a cikin tsari. A wasu kalmomi, ana ƙaddamar da nau'in bayanan filin da farko ta ƙimar json, sannan kuma ana amfani da nau'in tsarin simintin (idan an daidaita). Mun gano manyan nau'ikan bayanai guda 4: STRING, FLOAT64, INT64 da TIMESTAMP. Dokokin yin taswira da nau'in simintin gyare-gyare sunyi kama da haka:
rules:
- "/field_1/subfield_1 -> " #правило удаления поля
- "/field_2/subfield_1 -> /field_10/subfield_1" #правило переноса поля
- "/field_3/subfield_1/subsubfield_1 -> (timestamp) /field_20" #правило переноса поля и приведения типа
Algorithm don tantance nau'in bayanai:
- canza tsarin json zuwa tsarin lebur
- kayyade nau'in bayanai na filayen ta dabi'u
- yin amfani da taswira da rubuta dokokin simintin gyare-gyare
Sannan daga tsarin json mai shigowa:
{
"product_id": "1e48fb70-ef12-4ea9-ab10-fd0b910c49ce",
"product_price": 399.99,
"price_currency": "USD",
"product_type": "supplies",
"product_release_start": "2020-09-25T12:38:27.763000Z",
"images": {
"main": "picture1",
"sub": "picture2"
}
}
za a samu tsarin bayanai:
"product_id" character varying,
"product_price" numeric (38,18),
"price_currency" character varying,
"product_type" character varying,
"product_release_start" timestamp,
"images_main" character varying,
"images_sub" character varying
Mun kuma yi tunanin cewa mai amfani ya kamata ya iya saita partitioning ko rarraba bayanai a cikin database bisa ga wasu sharudda da kuma aiwatar da ikon saita sunan tebur tare da akai-akai ko.
tableName: '{{.product_type}}_{{._timestamp.Format "2006_01"}}'
Koyaya, tsarin abubuwan da ke shigowa na iya canzawa a lokacin aiki. Mun aiwatar da algorithm don bincika bambanci tsakanin tsarin tebur ɗin da ake da shi da kuma tsarin taron mai shigowa. Idan aka sami bambanci, za a sabunta teburin tare da sabbin filayen. Don yin wannan, yi amfani da facin SQL tambaya:
#Пример для Postgres
ALTER TABLE "schema"."table" ADD COLUMN new_column character varying
gine
Me yasa kuke buƙatar rubuta abubuwan da suka faru zuwa tsarin fayil, kuma ba kawai rubuta su kai tsaye zuwa bayanan ba? Databases ba koyaushe suna aiki da kyau lokacin da ake mu'amala da adadi mai yawa na sakawa (
Bude Source da tsare-tsare na gaba
A wani lokaci, sabis ɗin ya fara kama da cikakken samfurin kuma mun yanke shawarar sakin shi zuwa Buɗe Source. A halin yanzu, an aiwatar da haɗin kai tare da Postgres, ClickHouse, BigQuery, Redshift, S3, Snowflake. Duk haɗe-haɗe suna goyan bayan nau'ikan tsari da hanyoyin yawo na lodin bayanai. Ƙara tallafi don buƙatun ta API.
Tsarin haɗin kai na yanzu yayi kama da haka:
Kodayake ana iya amfani da sabis ɗin kai tsaye (misali ta amfani da Docker), muna kuma da
Za mu yi farin ciki idan EventNative ya taimaka warware matsalolin ku!
Masu amfani da rajista kawai za su iya shiga cikin binciken.
Wane tsarin tarin ƙididdiga ne ake amfani da shi a cikin kamfanin ku?
-
48,0%Google Analytics12
-
4,0%Kashi na 1
-
16,0%Wani (rubuta a cikin sharhi)4
-
32,0%An aiwatar da sabis ɗin ku8
Masu amfani 25 sun kada kuri'a. Masu amfani 6 sun kaurace.
source: www.habr.com