ProHoster > Блог > Gudanarwa > SRE: Binciken Ayyuka. Saita hanyar yin amfani da sabar gidan yanar gizo mai sauƙi a cikin Go
SRE: Binciken Ayyuka. Saita hanyar yin amfani da sabar gidan yanar gizo mai sauƙi a cikin Go
Binciken ayyuka da daidaitawa kayan aiki ne mai ƙarfi don tabbatar da yarda da aiki ga abokan ciniki.
Ana iya amfani da nazarin ayyuka don bincika ƙulla a cikin shirin ta amfani da tsarin kimiyya don gwada gwaje-gwajen daidaitawa. Wannan labarin yana bayyana tsarin gaba ɗaya don nazarin aiki da kunnawa, ta amfani da sabar gidan yanar gizo ta Go a matsayin misali.
Go yana da kyau musamman a nan saboda yana da kayan aikin tantancewa pprof a cikin daidaitaccen ɗakin karatu.
Taswirar
Bari mu ƙirƙiri jerin taƙaitaccen bayani don nazarin tsarin mu. Za mu yi ƙoƙari mu yi amfani da wasu bayanai don yanke shawara maimakon yin canje-canje bisa fahimta ko zato. Don yin wannan, za mu yi haka:
Mun ƙayyade iyakokin ingantawa (bukatun);
Muna lissafin nauyin ma'amala don tsarin;
Muna yin gwajin (ƙirƙirar bayanai);
Muna lura;
Muna nazari - duk an cika buƙatun?
Mun kafa shi a kimiyance, mu yi hasashe;
Muna yin gwaji don gwada wannan hasashe.
Sauƙaƙan Ginin Sabar HTTP
Don wannan labarin za mu yi amfani da ƙaramin sabar HTTP a Golang. Ana iya samun duk lambar daga wannan labarin a nan.
Aikace-aikacen da ake nazarin sabar HTTP ce da ke yin zaɓen Postgresql don kowace buƙata. Bugu da ƙari, akwai Prometheus, node_exporter da Grafana don tattarawa da nuna ma'aunin aikace-aikace da tsarin.
Don sauƙaƙa, mun yi la'akari da cewa don ƙima a kwance (da sauƙaƙe ƙididdigewa) kowane sabis da bayanai ana tura su tare:
Ƙayyadaddun manufa
A wannan mataki, mun yanke shawara akan burin. Menene muke ƙoƙarin bincika? Ta yaya za mu san lokacin da ya ƙare? A cikin wannan labarin, za mu yi tunanin cewa muna da abokan ciniki kuma sabis ɗinmu zai aiwatar da buƙatun 10 a sakan daya.
В Google SRE Littafin An tattauna hanyoyin zaɓi da ƙirar ƙira dalla-dalla. Bari mu yi haka kuma mu gina samfura:
Latency: 99% na buƙatun yakamata a kammala a cikin ƙasa da 60ms;
Farashin: Ya kamata sabis ɗin ya cinye mafi ƙarancin adadin kuɗin da muke tunanin zai yiwu. Don yin wannan, muna ƙara yawan kayan aiki;
Tsare-tsare iya aiki: Yana buƙatar fahimta da rubuta adadin lokuta na aikace-aikacen da za a gudanar da su, gami da aikin ƙirƙira gabaɗaya, da kuma lokuta nawa ne za a buƙaci don cika nauyin farko da buƙatun samarwa. redundancy n+1.
Latency na iya buƙatar haɓakawa ban da bincike, amma ana buƙatar tantance abubuwan da ake samarwa a fili. Lokacin amfani da tsarin SRE SLO, buƙatar jinkiri ta fito daga abokin ciniki ko kasuwanci, wanda mai samfurin ke wakilta. Kuma sabis ɗinmu zai cika wannan wajibi tun daga farko ba tare da wani saiti ba!
Kafa yanayin gwaji
Tare da taimakon yanayin gwaji, za mu iya sanya nauyin da aka auna akan tsarin mu. Don bincike, za a samar da bayanai kan aikin sabis na gidan yanar gizo.
lodin ciniki
Wannan muhalli yana amfani Kayan lambu don ƙirƙirar ƙimar buƙatar HTTP ta al'ada har sai an tsaya:
$ make load-test LOAD_TEST_RATE=50
echo "POST http://localhost:8080" | vegeta attack -body tests/fixtures/age_no_match.json -rate=50 -duration=0 | tee results.bin | vegeta report
Lura
Za a yi amfani da nauyin ma'amala a lokacin aiki. Baya ga ma'aunin aikace-aikacen (yawan buƙatun, latencies martani) da tsarin aiki (memory, CPU, IOPS), za a ƙaddamar da bayanan aikace-aikacen don fahimtar inda yake da matsaloli, da kuma yadda ake cinye lokacin CPU.
Bayanan martaba
Profiling nau'in ma'auni ne wanda ke ba ka damar ganin inda lokacin CPU ke tafiya lokacin da aikace-aikacen ke gudana. Yana ba ku damar sanin ainihin inda kuma nawa ake kashe lokacin sarrafawa:
Ana iya amfani da wannan bayanan yayin bincike don samun fahimtar ɓata lokacin CPU da ayyukan da ba dole ba. Go (pprof) zai iya samar da bayanan martaba kuma ya hango su azaman jadawali na harshen wuta ta amfani da daidaitattun kayan aikin. Zan yi magana game da amfaninsu da jagorar saitin su daga baya a cikin labarin.
Kisa, lura, bincike.
Bari mu gudanar da gwaji. Za mu yi, lura da nazari har sai mun gamsu da aikin. Bari mu zaɓi ƙima mara ƙarancin ƙarfi don amfani da shi don samun sakamakon abubuwan lura na farko. A kowane mataki na gaba za mu ƙara kaya tare da wani nau'i mai mahimmanci, wanda aka zaɓa tare da wasu bambancin. Ana yin kowace gwajin gwaji tare da adadin buƙatun da aka daidaita: make load-test LOAD_TEST_RATE=X.
buƙatun 50 a sakan daya
Kula da manyan hotuna biyu. Hagu na sama yana nuna cewa aikace-aikacenmu yana aiwatar da buƙatun 50 a sakan daya (yana tunanin) kuma saman dama yana nuna tsawon kowane buƙatun. Duk sigogin biyu suna taimaka mana duba da bincika ko muna cikin iyakokin ayyukanmu ko a'a. Layin ja akan jadawali HTTP Buƙatar Latency yana nuna SLO a 60ms. Layin yana nuna cewa mun yi ƙasa da iyakar lokacin amsawa.
Mu kalli bangaren farashi:
Bukatun 10000 a sakan daya / buƙatun 50 kowane sabar = sabar 200 + 1
Har yanzu muna iya inganta wannan adadi.
buƙatun 500 a sakan daya
Abubuwa masu ban sha'awa sun fara faruwa lokacin da kaya ya kai buƙatun 500 a sakan daya:
Bugu da ƙari, a cikin jadawali na hagu na sama za ku iya ganin cewa aikace-aikacen yana rikodin nauyin al'ada. Idan ba haka ba, akwai matsala akan uwar garken da aikace-aikacen ke gudana. Hoton latency na amsawa yana sama a hannun dama, yana nuna cewa buƙatun 500 a cikin sakan daya ya haifar da jinkiri na 25-40ms. Kashi 99th har yanzu yana dacewa da kyau cikin 60ms SLO da aka zaɓa a sama.
Dangane da farashi:
Bukatun 10000 a sakan daya / buƙatun 500 kowane sabar = sabar 20 + 1
Har yanzu ana iya inganta komai.
buƙatun 1000 a sakan daya
Babban ƙaddamarwa! Aikace-aikacen ya nuna cewa yana aiwatar da buƙatun 1000 a sakan daya, amma SLO ya keta iyakokin latency. Ana iya ganin wannan a layi p99 a cikin jadawali na dama na sama. Duk da cewa layin p100 ya fi girma, ainihin jinkirin ya fi girma fiye da matsakaicin 60ms. Bari mu nutse cikin bayanan martaba don gano ainihin abin da aikace-aikacen yake yi.
Bayanan martaba
Don bayanin martaba, mun saita nauyin zuwa buƙatun 1000 a sakan daya, sannan a yi amfani da su pprof don ɗaukar bayanai don gano inda aikace-aikacen ke kashe lokacin CPU. Ana iya yin wannan ta kunna ƙarshen HTTP pprof, sannan, a ƙarƙashin kaya, adana sakamakon ta amfani da curl:
$ go tool pprof -http=:12345 cpu.1000_reqs_sec_no_optimizations.prof
Jadawalin yana nuna inda da nawa aikace-aikacen ke kashe lokacin CPU. Daga bayanin daga Brendan Gregg:
Axis X shine yawan bayanan martaba, an jera su ta haruffa (wannan ba lokaci bane), axis Y yana nuna zurfin tari, ana ƙirgawa daga sifili a [sama]. Kowane rectangular firam ne mai tari. Faɗin firam ɗin, sau da yawa yana kasancewa a cikin tari. Abin da ke saman yana gudana akan CPU, kuma abin da ke ƙasa shine abubuwan yara. Launuka yawanci ba sa nufin komai, amma ana zaɓe su kawai a bazuwar don bambanta firam ɗin.
Analysis - hasashe
Don kunnawa, za mu mai da hankali kan ƙoƙarin nemo ɓataccen lokacin CPU. Za mu nemo manyan hanyoyin kashe kuɗi marasa amfani kuma mu cire su. Da kyau, idan aka ba da bayanin martaba yana bayyana daidai inda ainihin aikace-aikacen ke kashe lokacin sarrafa na'urar, ƙila za ku yi shi sau da yawa, kuma kuna buƙatar canza lambar tushen aikace-aikacen, sake gudanar da gwaje-gwajen kuma ku ga cewa aikin ya kusanci abin da ake nufi.
Bayan shawarwarin Brendan Gregg, za mu karanta ginshiƙi daga sama zuwa ƙasa. Kowane layi yana nuna madaidaicin firam (kiran aiki). Layi na farko shine wurin shiga cikin shirin, iyayen duk sauran kira (wato, duk sauran kiran za su kasance a kan tari). Layi na gaba ya riga ya bambanta:
Idan ka shawagi siginan kwamfuta a kan sunan aiki a kan jadawali, jimlar lokacin da ya kasance akan tari yayin gyara za a nuna. Ayyukan HTTPServe yana can 65% na lokacin, sauran ayyukan lokacin gudu runtime.mcall, mstart и gc, ya ɗauki sauran lokacin. Gaskiya mai daɗi: 5% na jimlar lokacin ana kashe shi akan tambayoyin DNS:
Adireshin da shirin ke nema na Postgresql ne. Danna kan FindByAge:
Wani abin sha'awa, shirin ya nuna cewa, bisa ƙa'ida, akwai manyan hanyoyin guda uku waɗanda ke ƙara jinkiri: buɗewa da rufe hanyoyin sadarwa, neman bayanai, da haɗawa da ma'ajin bayanai. Jadawalin yana nuna cewa buƙatun DNS, buɗewa da haɗin haɗin gwiwa suna ɗaukar kusan 13% na jimlar lokacin aiwatarwa.
Hasashe: Sake amfani da haɗin kai ta amfani da haɗawa ya kamata ya rage lokacin buƙatun HTTP guda ɗaya, yana ba da damar mafi girma kayan aiki da ƙarancin latency..
Saita aikace-aikacen - gwaji
Muna sabunta lambar tushe, gwada cire haɗin kai zuwa Postgresql don kowace buƙata. Zaɓin farko shine amfani tafkin haɗin gwiwa a matakin aikace-aikace. A cikin wannan gwaji mun mu saita shi haɗin haɗin gwiwa ta amfani da direban sql don tafiya:
Bayan sake kunna gwajin tare da buƙatun 1000 a sakan daya, a bayyane yake cewa matakan latency na p99 sun dawo al'ada tare da SLO na 60ms!
Menene farashin?
Bukatun 10000 a sakan daya / buƙatun 1000 kowane sabar = sabar 10 + 1
Bari mu yi shi ma mafi kyau!
buƙatun 2000 a sakan daya
Sau biyu nauyin yana nuna abu iri ɗaya, jadawali na hagu na sama yana nuna cewa aikace-aikacen yana sarrafa buƙatun 2000 a sakan daya, p100 yana ƙasa da 60ms, p99 yana gamsar da SLO.
Dangane da farashi:
Bukatun 10000 a sakan daya / buƙatun 2000 kowane sabar = sabar 5 + 1
buƙatun 3000 a sakan daya
Anan aikace-aikacen na iya aiwatar da buƙatun 3000 tare da latency p99 na ƙasa da 60ms. Ba a keta SLO ba, kuma ana karɓar kuɗin kamar haka:
Bukatun 10000 a sakan daya / kowane buƙatun 3000 a kowane sabar = sabar 4 + 1 (Mawallafin ya tattara, kusan mai fassara)
Bari mu gwada wani zagaye na bincike.
Analysis - hasashe
Muna tattarawa da nuna sakamakon gyara aikace-aikacen a buƙatun 3000 a sakan daya:
Har yanzu 6% na lokacin ana kashewa akan kafa haɗin gwiwa. Ƙaddamar da tafkin ya inganta aiki, amma har yanzu kuna iya ganin cewa aikace-aikacen yana ci gaba da aiki akan ƙirƙirar sababbin haɗi zuwa bayanan bayanai.
Hasashe: Haɗin kai, duk da kasancewar wurin tafki, har yanzu ana sauke kuma an tsaftace su, don haka aikace-aikacen yana buƙatar sake saita su. Saita adadin haɗin haɗin da ke jiran zuwa girman tafkin ya kamata ya taimaka tare da latency ta rage lokacin da aikace-aikacen ke kashewa don ƙirƙirar haɗin..
Saita aikace-aikacen - gwaji
Ana ƙoƙarin shigarwa MaxIdleConns daidai da girman tafkin (wanda kuma aka kwatanta a nan):
Duba jadawali na harshen wuta yana nuna cewa haɗin ba a iya gani! Bari mu bincika daki-daki pg(*conn).query - mu kuma ba mu lura da an kafa haɗin a nan ba.
ƙarshe
Binciken ayyuka yana da mahimmanci don fahimtar cewa ana biyan tsammanin abokin ciniki da abubuwan da ba su aiki ba. Bincike ta hanyar kwatanta abubuwan lura tare da tsammanin abokin ciniki zai iya taimakawa wajen ƙayyade abin da aka yarda da abin da ba haka ba. Go yana ba da kayan aiki masu ƙarfi waɗanda aka gina a cikin daidaitaccen ɗakin karatu waɗanda ke yin bincike mai sauƙi da sauƙi.