Peb xam qhov kev thauj khoom thauj rau qhov system;
Peb ua qhov kev xeem (tsim cov ntaub ntawv);
Peb saib;
Peb txheeb xyuas - puas yog tag nrho cov kev xav tau ntsib?
Peb teeb nws scientifically, ua ib tug hypothesis;
Peb ua ib qho kev sim los sim qhov kev xav no.
Yooj yim HTTP Server Architecture
Rau tsab xov xwm no peb yuav siv HTTP me me server hauv Golang. Txhua tus lej los ntawm kab lus no tuaj yeem pom no.
Daim ntawv thov raug tshuaj xyuas yog HTTP server uas xaiv Postgresql rau txhua qhov kev thov. Tsis tas li ntawd, muaj Prometheus, node_exporter thiab Grafana rau kev sau thiab nthuav tawm daim ntawv thov thiab kev ntsuas qhov system.
Txhawm rau ua kom yooj yim, peb xav tias rau kab rov tav scaling (thiab simplifying xam) txhua qhov kev pabcuam thiab cov ntaub ntawv raug xa mus ua ke:
Txhais cov hom phiaj
Ntawm cov kauj ruam no, peb txiav txim siab rau lub hom phiaj. Peb sim soj ntsuam dab tsi? Peb yuav ua li cas thiaj paub thaum lub sijhawm kawg? Hauv tsab xov xwm no, peb yuav xav txog tias peb muaj cov neeg siv khoom thiab tias peb cov kev pabcuam yuav ua 10 thov ib ob.
Π Google SRE Phau Ntawv Cov txheej txheem ntawm kev xaiv thiab kev ua qauv yog tham nyob rau hauv kom meej. Cia peb ua tib yam thiab tsim qauv:
Latency: 99% ntawm kev thov yuav tsum ua kom tiav hauv tsawg dua 60ms;
Tus nqi: Cov kev pabcuam yuav tsum siv nyiaj tsawg kawg nkaus uas peb xav tias tsim nyog ua tau. Yuav kom ua tau li no, peb maximize throughput;
Kev npaj muaj peev xwm: Yuav tsum tau nkag siab thiab sau cov ntaub ntawv ntau npaum li cas ntawm daim ntawv thov yuav tsum tau khiav, suav nrog kev ua haujlwm tag nrho, thiab pes tsawg zaus yuav tsum tau ua kom tau raws li kev thauj khoom thawj zaug thiab kev npaj yuav tsum tau ua. redundancy n+1.
Latency tej zaum yuav xav tau kev ua kom zoo ntxiv nrog rau kev tsom xam, tab sis cov ntsiab lus kom meej yuav tsum tau txheeb xyuas. Thaum siv cov txheej txheem SRE SLO, qhov kev thov qeeb yog los ntawm cov neeg siv khoom lossis kev lag luam, sawv cev los ntawm tus tswv khoom. Thiab peb cov kev pabcuam yuav ua tiav cov luag haujlwm no txij thaum pib yam tsis muaj kev teeb tsa!
Ib puag ncig no siv Neeg noj zaub los tsim ib qho kev cai HTTP thov tus nqi kom txog thaum nres:
$ make load-test LOAD_TEST_RATE=50
echo "POST http://localhost:8080" | vegeta attack -body tests/fixtures/age_no_match.json -rate=50 -duration=0 | tee results.bin | vegeta report
Kev Soj Ntsuam
Kev thauj khoom thauj khoom yuav raug siv thaum lub sijhawm ua haujlwm. Ntxiv rau daim ntawv thov (tus naj npawb ntawm kev thov, cov lus teb latency) thiab kev ua haujlwm (nco, CPU, IOPS) kev ntsuas, daim ntawv thov profile yuav raug khiav kom nkag siab qhov twg nws muaj teeb meem thiab yuav siv sijhawm CPU li cas.
Profileing
Profiling yog hom kev ntsuas uas tso cai rau koj pom qhov twg CPU lub sijhawm yuav mus thaum daim ntawv thov ua haujlwm. Nws tso cai rau koj los txiav txim siab qhov twg thiab ntau npaum li cas lub sijhawm processor siv:
Cov ntaub ntawv no tuaj yeem siv thaum tshawb xyuas kom nkag siab txog lub sijhawm CPU khib nyiab thiab ua haujlwm tsis tsim nyog. Mus (pprof) tuaj yeem tsim cov profiles thiab pom lawv ua cov duab hluav taws xob siv cov txheej txheem txheej txheem. Kuv mam li tham txog lawv txoj kev siv thiab teeb tsa cov lus qhia tom qab hauv kab lus.
Kev ua, kev soj ntsuam, tsom xam.
Cia peb ua ib qho kev sim. Peb yuav ua, saib thiab txheeb xyuas kom txog thaum peb txaus siab rau qhov kev ua tau zoo. Cia peb xaiv tus nqi qis qis los siv nws kom tau txais cov txiaj ntsig ntawm thawj qhov kev soj ntsuam. Ntawm txhua kauj ruam tom ntej peb yuav nce lub load nrog ib qho kev ntsuas qhov ntsuas tau xaiv nrog qee qhov kev hloov pauv. Txhua qhov kev ntsuas kev thauj khoom yog ua nrog tus lej ntawm kev thov hloov kho: make load-test LOAD_TEST_RATE=X.
50 thov ib ob
Ua tib zoo saib ob daim duab saum toj kawg nkaus. Sab laug sab saum toj qhia tau tias peb daim ntawv thov txheej txheem 50 thov ib ob (nws xav) thiab sab xis saum toj qhia txog lub sijhawm ntawm txhua qhov kev thov. Ob qho kev ntsuas pab peb saib thiab txheeb xyuas seb peb puas nyob hauv peb thaj tsam kev ua haujlwm lossis tsis ua haujlwm. Kab liab ntawm daim duab HTTP Thov Latency qhia SLO ntawm 60ms. Cov kab qhia tau hais tias peb zoo hauv qab peb lub sijhawm teb siab tshaj plaws.
Cia peb saib ntawm tus nqi sab:
10000 thov ib ob / 50 thov rau ib tus neeg rau zaub mov = 200 servers + 1
Peb tseem tuaj yeem txhim kho daim duab no.
500 thov ib ob
Qhov nthuav ntau yam pib tshwm sim thaum lub load tau mus rau 500 thov ib ob:
Ib zaug ntxiv, nyob rau sab laug sab saum toj daim duab koj tuaj yeem pom tias daim ntawv thov tau sau cov khoom qub. Yog tias qhov no tsis yog qhov teeb meem, muaj teeb meem ntawm lub server uas daim ntawv thov tau ua haujlwm. Cov lus teb latency graph yog nyob rau sab xis saum toj, qhia tias 500 thov ib ob ua rau cov lus teb qeeb ntawm 25-40ms. Qhov 99th feem pua ββββtseem ua kom zoo rau hauv 60ms SLO xaiv saum toj no.
Hais txog tus nqi:
10000 thov ib ob / 500 thov rau ib tus neeg rau zaub mov = 20 servers + 1
Txhua yam tseem tuaj yeem txhim kho.
1000 thov ib ob
Zoo tshaj tawm! Daim ntawv thov qhia tias nws tau ua tiav 1000 qhov kev thov ib ob, tab sis qhov txwv tsis pub tshaj tawm tau ua txhaum los ntawm SLO. Qhov no tuaj yeem pom hauv kab p99 nyob rau sab xis sab xis. Txawm hais tias qhov tseeb tias p100 kab yog ntau dua, qhov tseeb qeeb yog siab tshaj qhov siab tshaj plaws ntawm 60ms. Cia peb dhia mus rau hauv profileing kom paub seb daim ntawv thov ua li cas.
Profileing
Rau profileing, peb teem lub load rau 1000 thov ib ob, ces siv pprof txhawm rau ntes cov ntaub ntawv kom paub seb qhov twg daim ntawv thov siv sijhawm CPU. Qhov no tuaj yeem ua tiav los ntawm kev ua kom HTTP qhov kawg pprof, thiab tom qab ntawd, hauv qab thauj khoom, txuag cov txiaj ntsig siv curl:
$ go tool pprof -http=:12345 cpu.1000_reqs_sec_no_optimizations.prof
Daim duab qhia qhov twg thiab ntau npaum li cas daim ntawv thov siv sijhawm CPU. Los ntawm kev piav qhia los ntawm Brendan Gregg:
X axis yog pawg neeg profile, txheeb cov tsiaj ntawv (qhov no tsis yog lub sijhawm), Y axis qhia qhov tob ntawm pawg, suav los ntawm xoom ntawm [sab saum toj]. Txhua daim duab plaub yog ib pawg ncej. Qhov dav ntawm lub thav duab, ntau zaus nws muaj nyob rau hauv pawg. Dab tsi yog nyob rau sab saum toj khiav ntawm CPU, thiab dab tsi hauv qab no yog cov ntsiab lus me nyuam. Cov xim feem ntau tsis txhais tau dab tsi, tab sis tsuas yog xaiv ntawm random kom sib txawv thav ntawv.
Kev tsom xam - hypothesis
Rau kev tu, peb yuav tsom mus rau kev sim nrhiav lub sijhawm siv CPU nkim. Peb yuav nrhiav qhov loj tshaj plaws ntawm kev siv nyiaj tsis zoo thiab tshem tawm lawv. Zoo, muab qhov profileing qhia tau tseeb heev qhov tseeb ntawm daim ntawv thov siv nws lub sijhawm ua haujlwm, koj yuav tau ua nws ob peb zaug, thiab koj tseem yuav tau hloov pauv daim ntawv thov qhov chaws, rov ua qhov kev xeem thiab pom tias kev ua tau zoo mus txog lub hom phiaj.
Ua raws li Brendan Gregg cov lus pom zoo, peb yuav nyeem daim ntawv qhia saum toj mus rau hauv qab. Txhua kab qhia ib pawg ncej (function hu). Thawj kab yog lub ntsiab lus nkag mus rau hauv qhov kev zov me nyuam, niam txiv ntawm tag nrho lwm yam kev hu (hauv lwm lo lus, tag nrho lwm yam kev hu yuav muaj nyob rau hauv lawv pawg). Cov kab tom ntej no twb txawv lawm:
Yog tias koj hover tus cursor dhau lub npe ntawm kev ua haujlwm ntawm daim duab, tag nrho lub sijhawm nws nyob ntawm pawg thaum debugging yuav tshwm sim. HTTPServe muaj nuj nqi nyob ntawd 65% ntawm lub sijhawm, lwm lub sijhawm ua haujlwm runtime.mcall, mstart ΠΈ gc, coj mus rau lub sijhawm so. Qhov tseeb lom zem: 5% ntawm tag nrho lub sijhawm yog siv rau cov lus nug DNS:
Tom qab rov pib qhov kev xeem nrog 1000 thov ib ob, nws yog qhov tseeb tias p99's latency theem tau rov qab mus rau qhov qub nrog SLO ntawm 60ms!
Tus nqi yog dab tsi?
10000 thov ib ob / 1000 thov rau ib tus neeg rau zaub mov = 10 servers + 1
Cia peb ua nws zoo dua!
2000 thov ib ob
Doubling lub load qhia tau hais tias tib yam, lub sab sauv daim duab qhia tau hais tias daim ntawv thov tswj cov txheej txheem 2000 thov ib ob, p100 qis dua 60ms, p99 txaus siab rau SLO.
Hais txog tus nqi:
10000 thov ib ob / 2000 thov rau ib tus neeg rau zaub mov = 5 servers + 1
3000 thov ib ob
Ntawm no daim ntawv thov tuaj yeem ua 3000 qhov kev thov nrog p99 latency tsawg dua 60ms. SLO tsis raug ua txhaum, thiab tus nqi raug lees txais raws li hauv qab no:
10000 thov ib ob / ib 3000 thov ib tus neeg rau zaub mov = 4 servers + 1 (tus sau tau sau tseg, kwv yees. tus txhais lus)
Cia peb sim lwm qhov kev ntsuam xyuas.
Kev tsom xam - hypothesis
Peb sau thiab tso saib cov txiaj ntsig ntawm kev debugging daim ntawv thov ntawm 3000 thov ib ob:
Tseem 6% ntawm lub sijhawm yog siv los tsim kev sib txuas. Kev teeb tsa lub pas dej tau txhim kho kev ua tau zoo, tab sis koj tseem tuaj yeem pom tias daim ntawv thov txuas ntxiv ua haujlwm ntawm kev tsim cov kev sib txuas tshiab rau cov ntaub ntawv.
Hypothesis: Kev sib txuas, txawm tias muaj lub pas dej, tseem poob thiab ntxuav, yog li daim ntawv thov yuav tsum rov pib dua. Kev teeb tsa tus naj npawb ntawm cov kev sib txuas tseem tos rau lub pas dej loj yuav tsum pab nrog latency los ntawm kev txo lub sij hawm daim ntawv thov siv los tsim kev sib txuas.
Teeb tsa daim ntawv thov - sim
Sim rau nruab MaxIdleConns sib npaug ntawm lub pas dej loj (tseem piav qhia no):