Tarihin buɗaɗɗen tushen mu: yadda muka yi sabis ɗin nazari a cikin Go kuma muka sanya shi a fili

A halin yanzu, kusan kowane kamfani a duniya yana tattara ƙididdiga game da ayyukan mai amfani akan albarkatun yanar gizo. Ƙa'idar a bayyane yake - kamfanoni suna son sanin yadda ake amfani da samfurin su / gidan yanar gizon su kuma su fahimci masu amfani da su. Tabbas, akwai kayan aikin da yawa a kasuwa don magance wannan matsala - daga tsarin nazari waɗanda ke ba da bayanai a cikin nau'ikan dashboards da jadawalai (misali. Google Analytics) zuwa Platform na Abokin Ciniki, wanda ke ba ku damar tattarawa da tara bayanai daga tushe daban-daban a kowane ɗakin ajiya (misali. kashi).

Amma mun sami matsalar da ba a warware ta ba tukuna. Haka aka haife shi EventNative - sabis na nazari na tushen buɗe ido. Karanta game da dalilin da ya sa muka yanke shawarar haɓaka sabis na kanmu, abin da ya ba mu, da abin da sakamakon ƙarshe ya kasance (tare da guda na code).

Tarihin buɗaɗɗen tushen mu: yadda muka yi sabis ɗin nazari a cikin Go kuma muka sanya shi a fili

Me ya sa za mu ci gaba da hidimarmu?

Shekaru casa'in ne, mun tsira gwargwadon iyawarmu. 2019, mun haɓaka dandali na Bayanan Abokin Ciniki na Farko na API kSense, wanda ya ba da damar tattara bayanai daga tushe daban-daban ( tallace-tallace na Facebook, Stripe, Salesforce, Google play, Google Analytics, da dai sauransu) don ƙarin nazarin bayanai masu dacewa, gano masu dogara, da dai sauransu. Mun lura cewa yawancin masu amfani suna amfani da dandalin mu don nazarin bayanai musamman Google Analytics (nan gaba GA). Mun yi magana da wasu masu amfani kuma mun gano cewa suna buƙatar bayanan nazari don samfurin su wanda suke karɓa ta amfani da GA, amma Bayanan samfurori na Google kuma ga mutane da yawa, GA User interface ba shine ma'auni na dacewa ba. Mun sami isassun tattaunawa tare da masu amfani da mu kuma mun gane cewa da yawa kuma suna amfani da dandamalin Segment (wanda, a hanya, shine kawai sauran ranar. an sayar da shi kan dala biliyan 3.2).

Sun shigar da pixel javascript na Segment akan albarkatun yanar gizon su kuma an ɗora bayanai game da halayen masu amfani da su cikin ƙayyadaddun bayanai (misali Postgres). Amma Segment kuma yana da raunin sa - farashin. Misali, idan albarkatun yanar gizon yana da MTU 90,000 (masu amfani da sa ido kowane wata), to kuna buƙatar biya ~ 1,000 $ kowane wata ga mai karɓar kuɗi. Hakanan an sami matsala ta uku - wasu kari na bincike (kamar AdBlock) sun toshe tarin nazari saboda... An aika buƙatun http daga mai binciken zuwa ga yankunan GA da Yanki. Dangane da buri na abokan cinikinmu, mun ƙirƙiri sabis na nazari wanda ke tattara cikakken saitin bayanai (ba tare da samfuri ba), kyauta ne kuma yana iya aiki akan abubuwan more rayuwa na mu.

Yadda sabis ɗin ke aiki

Sabis ɗin ya ƙunshi sassa uku: pixel javascript (wanda daga baya muka sake rubutawa a cikin nau'in rubutu), ana aiwatar da sashin uwar garken a cikin yaren GO, kuma an tsara shi don amfani da Redshift da BigQuery azaman bayanan cikin gida (daga baya sun ƙara tallafi don Postgres, ClickHouse da Snowflake).

An yanke shawarar barin tsarin GA da abubuwan da suka faru ba su canza ba. Duk abin da ake buƙata shine kwafin duk abubuwan da suka faru daga tushen yanar gizon inda aka shigar da pixel zuwa ƙarshen mu. Kamar yadda ya fito, wannan ba shi da wahala a yi. pixel Javascript ya mamaye ainihin hanyar ɗakin karatu na GA tare da sabo, wanda ya kwafi abin da ya faru a cikin tsarin mu.

//'ga' - стандартное название переменной Google Analytics
if (window.ga) {
    ga(tracker => {
        var originalSendHitTask = tracker.get('sendHitTask');
        tracker.set('sendHitTask', (model) => {
            var payLoad = model.get('hitPayload');
            //отправка оригинального события в GA
            originalSendHitTask(model);
            let jsonPayload = this.parseQuery(payLoad);
            //отправка события в наш сервис
            this.send3p('ga', jsonPayload);
        });
    });
}

Tare da pixel Segment komai ya fi sauƙi; yana da hanyoyin tsakiya, ɗayan wanda muka yi amfani da su.


//'analytics' - стандартное название переменной Segment
if (window.analytics) {
    if (window.analytics.addSourceMiddleware) {
        window.analytics.addSourceMiddleware(chain => {
            try {
		//дублирование события в наш сервис
                this.send3p('ajs', chain.payload);
            } catch (e) {
                LOG.warn('Failed to send an event', e)
            }
	    //отправка оригинального события в Segment
            chain.next(chain.payload);
        });
    } else {
        LOG.warn("Invalid interceptor state. Analytics js initialized, but not completely");
    }
} else {
    LOG.warn('Analytics.js listener is not set.');
}

Baya ga kwafin abubuwan da suka faru, mun ƙara ikon aika json na sabani:


//Отправка событий с произвольным json объектом
eventN.track('product_page_view', {
    product_id: '1e48fb70-ef12-4ea9-ab10-fd0b910c49ce',
    product_price: 399.99,
    price_currency: 'USD'
    product_release_start: '2020-09-25T12:38:27.763000Z'
});

Na gaba, bari muyi magana game da sashin uwar garken. Mai baya yakamata ya karɓi buƙatun http, cika su da ƙarin bayani, misali, bayanan geo (na gode maxmind don wannan) da kuma rubuta shi a cikin database. Mun so mu sa sabis ɗin ya dace sosai don a iya amfani da shi tare da ƙaramin tsari. Mun aiwatar da aikin tantance tsarin bayanai dangane da tsarin taron json mai shigowa. Nau'in bayanai ana bayyana su ta dabi'u. Abubuwan gida suna lalacewa kuma an rage su zuwa tsari mai faɗi:

//входящий json
{
  "field_1":  {
    "sub_field_1": "text1",
    "sub_field_2": 100
  },
  "field_2": "text2",
  "field_3": {
    "sub_field_1": {
      "sub_sub_field_1": "2020-09-25T12:38:27.763000Z"
    }
  }
}

//результат
{
  "field_1_sub_field_1":  "text1",
  "field_1_sub_field_2":  100,
  "field_2": "text2",
  "field_3_sub_field_1_sub_sub_field_1": "2020-09-25T12:38:27.763000Z"
}

Koyaya, tsararru a halin yanzu ana canza su kawai zuwa kirtani saboda Ba duk bayanan da ke da alaƙa suna goyan bayan filayen maimaitawa ba. Hakanan yana yiwuwa a canza sunaye ko share su ta amfani da ƙa'idodin taswira na zaɓi. Suna ba ku damar canza tsarin bayanai idan ya cancanta ko canza nau'in bayanai ɗaya zuwa wani. Misali, idan filin json ya ƙunshi kirtani mai tambarin lokaci (filin_3_sub_filin_1_sub_sub_filin_1 daga misalin da ke sama), sannan don ƙirƙirar filin a cikin ma'ajin bayanai tare da nau'in tambarin lokaci, kuna buƙatar rubuta ƙa'idar taswira a cikin tsari. A wasu kalmomi, ana ƙaddamar da nau'in bayanan filin da farko ta ƙimar json, sannan kuma ana amfani da nau'in tsarin simintin (idan an daidaita). Mun gano manyan nau'ikan bayanai guda 4: STRING, FLOAT64, INT64 da TIMESTAMP. Dokokin yin taswira da nau'in simintin gyare-gyare sunyi kama da haka:

rules:
  - "/field_1/subfield_1 -> " #правило удаления поля
  - "/field_2/subfield_1 -> /field_10/subfield_1" #правило переноса поля
  - "/field_3/subfield_1/subsubfield_1 -> (timestamp) /field_20" #правило переноса поля и приведения типа

Algorithm don tantance nau'in bayanai:

  • canza tsarin json zuwa tsarin lebur
  • kayyade nau'in bayanai na filayen ta dabi'u
  • yin amfani da taswira da rubuta dokokin simintin gyare-gyare

Sannan daga tsarin json mai shigowa:

{
    "product_id":  "1e48fb70-ef12-4ea9-ab10-fd0b910c49ce",
    "product_price": 399.99,
    "price_currency": "USD",
    "product_type": "supplies",
    "product_release_start": "2020-09-25T12:38:27.763000Z",
    "images": {
      "main": "picture1",
      "sub":  "picture2"
    }
}

za a samu tsarin bayanai:

"product_id" character varying,
"product_price" numeric (38,18),
"price_currency" character varying,
"product_type" character varying,
"product_release_start" timestamp,
"images_main" character varying,
"images_sub" character varying

Mun kuma yi tunanin cewa mai amfani ya kamata ya iya saita partitioning ko rarraba bayanai a cikin database bisa ga wasu sharudda da kuma aiwatar da ikon saita sunan tebur tare da akai-akai ko. magana a cikin tsari. A cikin misalin da ke ƙasa, za a adana taron zuwa tebur tare da ƙididdige suna dangane da ƙimar samfuran_type da filayen _timestamp (misali. kayayyaki_2020_10):

tableName: '{{.product_type}}_{{._timestamp.Format "2006_01"}}'

Koyaya, tsarin abubuwan da ke shigowa na iya canzawa a lokacin aiki. Mun aiwatar da algorithm don bincika bambanci tsakanin tsarin tebur ɗin da ake da shi da kuma tsarin taron mai shigowa. Idan aka sami bambanci, za a sabunta teburin tare da sabbin filayen. Don yin wannan, yi amfani da facin SQL tambaya:

#Пример для Postgres
ALTER TABLE "schema"."table" ADD COLUMN new_column character varying

gine

Tarihin buɗaɗɗen tushen mu: yadda muka yi sabis ɗin nazari a cikin Go kuma muka sanya shi a fili

Me yasa kuke buƙatar rubuta abubuwan da suka faru zuwa tsarin fayil, kuma ba kawai rubuta su kai tsaye zuwa bayanan ba? Databases ba koyaushe suna aiki da kyau lokacin da ake mu'amala da adadi mai yawa na sakawa (Shawarwari na Postgres). Don yin wannan, Logger yana rubuta abubuwan da ke shigowa zuwa fayil kuma a cikin goroutine daban (thread) Mai karanta fayil yana karanta fayil ɗin, sannan an canza bayanan kuma an ƙayyade. Bayan mai sarrafa tebur ya tabbatar da cewa tsarin tebur ɗin ya kasance na zamani, za a rubuta bayanan zuwa ma'ajin bayanai a cikin tsari ɗaya. Daga baya, mun kara da ikon rubuta bayanai kai tsaye zuwa rumbun adana bayanai, amma muna amfani da wannan yanayin don abubuwan da ba su da yawa - alal misali, canzawa.

Bude Source da tsare-tsare na gaba

A wani lokaci, sabis ɗin ya fara kama da cikakken samfurin kuma mun yanke shawarar sakin shi zuwa Buɗe Source. A halin yanzu, an aiwatar da haɗin kai tare da Postgres, ClickHouse, BigQuery, Redshift, S3, Snowflake. Duk haɗe-haɗe suna goyan bayan nau'ikan tsari da hanyoyin yawo na lodin bayanai. Ƙara tallafi don buƙatun ta API.

Tsarin haɗin kai na yanzu yayi kama da haka:

Tarihin buɗaɗɗen tushen mu: yadda muka yi sabis ɗin nazari a cikin Go kuma muka sanya shi a fili

Kodayake ana iya amfani da sabis ɗin kai tsaye (misali ta amfani da Docker), muna kuma da sigar da aka shirya, wanda zaku iya saita haɗin kai tare da ɗakunan ajiya na bayanai, ƙara CNAME zuwa yankin ku kuma duba ƙididdiga akan adadin abubuwan da suka faru. Shirye-shiryen mu na gaggawa shine ƙara ikon tattara ba kawai ƙididdiga daga albarkatun yanar gizo ba, har ma da bayanai daga tushen bayanan waje da adana su zuwa kowane ajiyar da kuka zaɓa!

→ GitHub
→ Rubutun
→ slack

Za mu yi farin ciki idan EventNative ya taimaka warware matsalolin ku!

Masu amfani da rajista kawai za su iya shiga cikin binciken. Shigadon Allah.

Wane tsarin tarin ƙididdiga ne ake amfani da shi a cikin kamfanin ku?

  • 48,0%Google Analytics12

  • 4,0%Kashi na 1

  • 16,0%Wani (rubuta a cikin sharhi)4

  • 32,0%An aiwatar da sabis ɗin ku8

Masu amfani 25 sun kada kuri'a. Masu amfani 6 sun kaurace.

source: www.habr.com

Add a comment