SRE: Binciken Ayyuka. Saita hanyar yin amfani da sabar gidan yanar gizo mai sauƙi a cikin Go

Binciken ayyuka da daidaitawa kayan aiki ne mai ƙarfi don tabbatar da yarda da aiki ga abokan ciniki.

Ana iya amfani da nazarin ayyuka don bincika ƙulla a cikin shirin ta amfani da tsarin kimiyya don gwada gwaje-gwajen daidaitawa. Wannan labarin yana bayyana tsarin gaba ɗaya don nazarin aiki da kunnawa, ta amfani da sabar gidan yanar gizo ta Go a matsayin misali.

Go yana da kyau musamman a nan saboda yana da kayan aikin tantancewa pprof a cikin daidaitaccen ɗakin karatu.

SRE: Binciken Ayyuka. Saita hanyar yin amfani da sabar gidan yanar gizo mai sauƙi a cikin Go

Taswirar

Bari mu ƙirƙiri jerin taƙaitaccen bayani don nazarin tsarin mu. Za mu yi ƙoƙari mu yi amfani da wasu bayanai don yanke shawara maimakon yin canje-canje bisa fahimta ko zato. Don yin wannan, za mu yi haka:

  • Mun ƙayyade iyakokin ingantawa (bukatun);
  • Muna lissafin nauyin ma'amala don tsarin;
  • Muna yin gwajin (ƙirƙirar bayanai);
  • Muna lura;
  • Muna nazari - duk an cika buƙatun?
  • Mun kafa shi a kimiyance, mu yi hasashe;
  • Muna yin gwaji don gwada wannan hasashe.

SRE: Binciken Ayyuka. Saita hanyar yin amfani da sabar gidan yanar gizo mai sauƙi a cikin Go

Sauƙaƙan Ginin Sabar HTTP

Don wannan labarin za mu yi amfani da ƙaramin sabar HTTP a Golang. Ana iya samun duk lambar daga wannan labarin a nan.

Aikace-aikacen da ake nazarin sabar HTTP ce da ke yin zaɓen Postgresql don kowace buƙata. Bugu da ƙari, akwai Prometheus, node_exporter da Grafana don tattarawa da nuna ma'aunin aikace-aikace da tsarin.

SRE: Binciken Ayyuka. Saita hanyar yin amfani da sabar gidan yanar gizo mai sauƙi a cikin Go

Don sauƙaƙa, mun yi la'akari da cewa don ƙima a kwance (da sauƙaƙe ƙididdigewa) kowane sabis da bayanai ana tura su tare:

SRE: Binciken Ayyuka. Saita hanyar yin amfani da sabar gidan yanar gizo mai sauƙi a cikin Go

Ƙayyadaddun manufa

A wannan mataki, mun yanke shawara akan burin. Menene muke ƙoƙarin bincika? Ta yaya za mu san lokacin da ya ƙare? A cikin wannan labarin, za mu yi tunanin cewa muna da abokan ciniki kuma sabis ɗinmu zai aiwatar da buƙatun 10 a sakan daya.

В Google SRE Littafin An tattauna hanyoyin zaɓi da ƙirar ƙira dalla-dalla. Bari mu yi haka kuma mu gina samfura:

  • Latency: 99% na buƙatun yakamata a kammala a cikin ƙasa da 60ms;
  • Farashin: Ya kamata sabis ɗin ya cinye mafi ƙarancin adadin kuɗin da muke tunanin zai yiwu. Don yin wannan, muna ƙara yawan kayan aiki;
  • Tsare-tsare iya aiki: Yana buƙatar fahimta da rubuta adadin lokuta na aikace-aikacen da za a gudanar da su, gami da aikin ƙirƙira gabaɗaya, da kuma lokuta nawa ne za a buƙaci don cika nauyin farko da buƙatun samarwa. redundancy n+1.

Latency na iya buƙatar haɓakawa ban da bincike, amma ana buƙatar tantance abubuwan da ake samarwa a fili. Lokacin amfani da tsarin SRE SLO, buƙatar jinkiri ta fito daga abokin ciniki ko kasuwanci, wanda mai samfurin ke wakilta. Kuma sabis ɗinmu zai cika wannan wajibi tun daga farko ba tare da wani saiti ba!

Kafa yanayin gwaji

Tare da taimakon yanayin gwaji, za mu iya sanya nauyin da aka auna akan tsarin mu. Don bincike, za a samar da bayanai kan aikin sabis na gidan yanar gizo.

lodin ciniki

Wannan muhalli yana amfani Kayan lambu don ƙirƙirar ƙimar buƙatar HTTP ta al'ada har sai an tsaya:

$ make load-test LOAD_TEST_RATE=50
echo "POST http://localhost:8080" | vegeta attack -body tests/fixtures/age_no_match.json -rate=50 -duration=0 | tee results.bin | vegeta report

Lura

Za a yi amfani da nauyin ma'amala a lokacin aiki. Baya ga ma'aunin aikace-aikacen (yawan buƙatun, latencies martani) da tsarin aiki (memory, CPU, IOPS), za a ƙaddamar da bayanan aikace-aikacen don fahimtar inda yake da matsaloli, da kuma yadda ake cinye lokacin CPU.

Bayanan martaba

Profiling nau'in ma'auni ne wanda ke ba ka damar ganin inda lokacin CPU ke tafiya lokacin da aikace-aikacen ke gudana. Yana ba ku damar sanin ainihin inda kuma nawa ake kashe lokacin sarrafawa:

SRE: Binciken Ayyuka. Saita hanyar yin amfani da sabar gidan yanar gizo mai sauƙi a cikin Go

Ana iya amfani da wannan bayanan yayin bincike don samun fahimtar ɓata lokacin CPU da ayyukan da ba dole ba. Go (pprof) zai iya samar da bayanan martaba kuma ya hango su azaman jadawali na harshen wuta ta amfani da daidaitattun kayan aikin. Zan yi magana game da amfaninsu da jagorar saitin su daga baya a cikin labarin.

Kisa, lura, bincike.

Bari mu gudanar da gwaji. Za mu yi, lura da nazari har sai mun gamsu da aikin. Bari mu zaɓi ƙima mara ƙarancin ƙarfi don amfani da shi don samun sakamakon abubuwan lura na farko. A kowane mataki na gaba za mu ƙara kaya tare da wani nau'i mai mahimmanci, wanda aka zaɓa tare da wasu bambancin. Ana yin kowace gwajin gwaji tare da adadin buƙatun da aka daidaita: make load-test LOAD_TEST_RATE=X.

buƙatun 50 a sakan daya

SRE: Binciken Ayyuka. Saita hanyar yin amfani da sabar gidan yanar gizo mai sauƙi a cikin Go

Kula da manyan hotuna biyu. Hagu na sama yana nuna cewa aikace-aikacenmu yana aiwatar da buƙatun 50 a sakan daya (yana tunanin) kuma saman dama yana nuna tsawon kowane buƙatun. Duk sigogin biyu suna taimaka mana duba da bincika ko muna cikin iyakokin ayyukanmu ko a'a. Layin ja akan jadawali HTTP Buƙatar Latency yana nuna SLO a 60ms. Layin yana nuna cewa mun yi ƙasa da iyakar lokacin amsawa.

Mu kalli bangaren farashi:

Bukatun 10000 a sakan daya / buƙatun 50 kowane sabar = sabar 200 + 1

Har yanzu muna iya inganta wannan adadi.

buƙatun 500 a sakan daya

Abubuwa masu ban sha'awa sun fara faruwa lokacin da kaya ya kai buƙatun 500 a sakan daya:

SRE: Binciken Ayyuka. Saita hanyar yin amfani da sabar gidan yanar gizo mai sauƙi a cikin Go

Bugu da ƙari, a cikin jadawali na hagu na sama za ku iya ganin cewa aikace-aikacen yana rikodin nauyin al'ada. Idan ba haka ba, akwai matsala akan uwar garken da aikace-aikacen ke gudana. Hoton latency na amsawa yana sama a hannun dama, yana nuna cewa buƙatun 500 a cikin sakan daya ya haifar da jinkiri na 25-40ms. Kashi 99th har yanzu yana dacewa da kyau cikin 60ms SLO da aka zaɓa a sama.

Dangane da farashi:

Bukatun 10000 a sakan daya / buƙatun 500 kowane sabar = sabar 20 + 1

Har yanzu ana iya inganta komai.

buƙatun 1000 a sakan daya

SRE: Binciken Ayyuka. Saita hanyar yin amfani da sabar gidan yanar gizo mai sauƙi a cikin Go

Babban ƙaddamarwa! Aikace-aikacen ya nuna cewa yana aiwatar da buƙatun 1000 a sakan daya, amma SLO ya keta iyakokin latency. Ana iya ganin wannan a layi p99 a cikin jadawali na dama na sama. Duk da cewa layin p100 ya fi girma, ainihin jinkirin ya fi girma fiye da matsakaicin 60ms. Bari mu nutse cikin bayanan martaba don gano ainihin abin da aikace-aikacen yake yi.

Bayanan martaba

Don bayanin martaba, mun saita nauyin zuwa buƙatun 1000 a sakan daya, sannan a yi amfani da su pprof don ɗaukar bayanai don gano inda aikace-aikacen ke kashe lokacin CPU. Ana iya yin wannan ta kunna ƙarshen HTTP pprof, sannan, a ƙarƙashin kaya, adana sakamakon ta amfani da curl:

$ curl http://localhost:8080/debug/pprof/profile?seconds=29 > cpu.1000_reqs_sec_no_optimizations.prof

Ana iya nuna sakamakon kamar haka:

$ go tool pprof -http=:12345 cpu.1000_reqs_sec_no_optimizations.prof

SRE: Binciken Ayyuka. Saita hanyar yin amfani da sabar gidan yanar gizo mai sauƙi a cikin Go

Jadawalin yana nuna inda da nawa aikace-aikacen ke kashe lokacin CPU. Daga bayanin daga Brendan Gregg:

Axis X shine yawan bayanan martaba, an jera su ta haruffa (wannan ba lokaci bane), axis Y yana nuna zurfin tari, ana ƙirgawa daga sifili a [sama]. Kowane rectangular firam ne mai tari. Faɗin firam ɗin, sau da yawa yana kasancewa a cikin tari. Abin da ke saman yana gudana akan CPU, kuma abin da ke ƙasa shine abubuwan yara. Launuka yawanci ba sa nufin komai, amma ana zaɓe su kawai a bazuwar don bambanta firam ɗin.

Analysis - hasashe

Don kunnawa, za mu mai da hankali kan ƙoƙarin nemo ɓataccen lokacin CPU. Za mu nemo manyan hanyoyin kashe kuɗi marasa amfani kuma mu cire su. Da kyau, idan aka ba da bayanin martaba yana bayyana daidai inda ainihin aikace-aikacen ke kashe lokacin sarrafa na'urar, ƙila za ku yi shi sau da yawa, kuma kuna buƙatar canza lambar tushen aikace-aikacen, sake gudanar da gwaje-gwajen kuma ku ga cewa aikin ya kusanci abin da ake nufi.

Bayan shawarwarin Brendan Gregg, za mu karanta ginshiƙi daga sama zuwa ƙasa. Kowane layi yana nuna madaidaicin firam (kiran aiki). Layi na farko shine wurin shiga cikin shirin, iyayen duk sauran kira (wato, duk sauran kiran za su kasance a kan tari). Layi na gaba ya riga ya bambanta:

SRE: Binciken Ayyuka. Saita hanyar yin amfani da sabar gidan yanar gizo mai sauƙi a cikin Go

Idan ka shawagi siginan kwamfuta a kan sunan aiki a kan jadawali, jimlar lokacin da ya kasance akan tari yayin gyara za a nuna. Ayyukan HTTPServe yana can 65% na lokacin, sauran ayyukan lokacin gudu runtime.mcall, mstart и gc, ya ɗauki sauran lokacin. Gaskiya mai daɗi: 5% na jimlar lokacin ana kashe shi akan tambayoyin DNS:

SRE: Binciken Ayyuka. Saita hanyar yin amfani da sabar gidan yanar gizo mai sauƙi a cikin Go

Adireshin da shirin ke nema na Postgresql ne. Danna kan FindByAge:

SRE: Binciken Ayyuka. Saita hanyar yin amfani da sabar gidan yanar gizo mai sauƙi a cikin Go

Wani abin sha'awa, shirin ya nuna cewa, bisa ƙa'ida, akwai manyan hanyoyin guda uku waɗanda ke ƙara jinkiri: buɗewa da rufe hanyoyin sadarwa, neman bayanai, da haɗawa da ma'ajin bayanai. Jadawalin yana nuna cewa buƙatun DNS, buɗewa da haɗin haɗin gwiwa suna ɗaukar kusan 13% na jimlar lokacin aiwatarwa.

Hasashe: Sake amfani da haɗin kai ta amfani da haɗawa ya kamata ya rage lokacin buƙatun HTTP guda ɗaya, yana ba da damar mafi girma kayan aiki da ƙarancin latency..

Saita aikace-aikacen - gwaji

Muna sabunta lambar tushe, gwada cire haɗin kai zuwa Postgresql don kowace buƙata. Zaɓin farko shine amfani tafkin haɗin gwiwa a matakin aikace-aikace. A cikin wannan gwaji mun mu saita shi haɗin haɗin gwiwa ta amfani da direban sql don tafiya:

db, err := sql.Open("postgres", dbConnectionString)
db.SetMaxOpenConns(8)

if err != nil {
   return nil, err
}

Kisa, lura, bincike

Bayan sake kunna gwajin tare da buƙatun 1000 a sakan daya, a bayyane yake cewa matakan latency na p99 sun dawo al'ada tare da SLO na 60ms!

Menene farashin?

Bukatun 10000 a sakan daya / buƙatun 1000 kowane sabar = sabar 10 + 1

Bari mu yi shi ma mafi kyau!

buƙatun 2000 a sakan daya

SRE: Binciken Ayyuka. Saita hanyar yin amfani da sabar gidan yanar gizo mai sauƙi a cikin Go

Sau biyu nauyin yana nuna abu iri ɗaya, jadawali na hagu na sama yana nuna cewa aikace-aikacen yana sarrafa buƙatun 2000 a sakan daya, p100 yana ƙasa da 60ms, p99 yana gamsar da SLO.

Dangane da farashi:

Bukatun 10000 a sakan daya / buƙatun 2000 kowane sabar = sabar 5 + 1

buƙatun 3000 a sakan daya

SRE: Binciken Ayyuka. Saita hanyar yin amfani da sabar gidan yanar gizo mai sauƙi a cikin Go

Anan aikace-aikacen na iya aiwatar da buƙatun 3000 tare da latency p99 na ƙasa da 60ms. Ba a keta SLO ba, kuma ana karɓar kuɗin kamar haka:

Bukatun 10000 a sakan daya / kowane buƙatun 3000 a kowane sabar = sabar 4 + 1 (Mawallafin ya tattara, kusan mai fassara)

Bari mu gwada wani zagaye na bincike.

Analysis - hasashe

Muna tattarawa da nuna sakamakon gyara aikace-aikacen a buƙatun 3000 a sakan daya:

SRE: Binciken Ayyuka. Saita hanyar yin amfani da sabar gidan yanar gizo mai sauƙi a cikin Go

Har yanzu 6% na lokacin ana kashewa akan kafa haɗin gwiwa. Ƙaddamar da tafkin ya inganta aiki, amma har yanzu kuna iya ganin cewa aikace-aikacen yana ci gaba da aiki akan ƙirƙirar sababbin haɗi zuwa bayanan bayanai.

Hasashe: Haɗin kai, duk da kasancewar wurin tafki, har yanzu ana sauke kuma an tsaftace su, don haka aikace-aikacen yana buƙatar sake saita su. Saita adadin haɗin haɗin da ke jiran zuwa girman tafkin ya kamata ya taimaka tare da latency ta rage lokacin da aikace-aikacen ke kashewa don ƙirƙirar haɗin..

Saita aikace-aikacen - gwaji

Ana ƙoƙarin shigarwa MaxIdleConns daidai da girman tafkin (wanda kuma aka kwatanta a nan):

db, err := sql.Open("postgres", dbConnectionString)
db.SetMaxOpenConns(8)
db.SetMaxIdleConns(8)
if err != nil {
   return nil, err
}

Kisa, lura, bincike

buƙatun 3000 a sakan daya

SRE: Binciken Ayyuka. Saita hanyar yin amfani da sabar gidan yanar gizo mai sauƙi a cikin Go

p99 kasa da 60ms tare da ƙarancin p100!

SRE: Binciken Ayyuka. Saita hanyar yin amfani da sabar gidan yanar gizo mai sauƙi a cikin Go

Duba jadawali na harshen wuta yana nuna cewa haɗin ba a iya gani! Bari mu bincika daki-daki pg(*conn).query - mu kuma ba mu lura da an kafa haɗin a nan ba.

SRE: Binciken Ayyuka. Saita hanyar yin amfani da sabar gidan yanar gizo mai sauƙi a cikin Go

ƙarshe

Binciken ayyuka yana da mahimmanci don fahimtar cewa ana biyan tsammanin abokin ciniki da abubuwan da ba su aiki ba. Bincike ta hanyar kwatanta abubuwan lura tare da tsammanin abokin ciniki zai iya taimakawa wajen ƙayyade abin da aka yarda da abin da ba haka ba. Go yana ba da kayan aiki masu ƙarfi waɗanda aka gina a cikin daidaitaccen ɗakin karatu waɗanda ke yin bincike mai sauƙi da sauƙi.

source: www.habr.com

Add a comment