Falanqaynta TSDB ee Prometheus 2

Falanqaynta TSDB ee Prometheus 2

Xogta taxanaha wakhtiga (TSDB) ee Prometheus 2 waa tusaale aad u fiican oo ah xalka injineernimada kaas oo bixiya horumarin weyn oo ku saabsan kaydinta v2 ee Prometheus 1 marka loo eego xawaaraha ururinta xogta, fulinta weydiinta, iyo hufnaanta kheyraadka. Waxaan ka fulineynay Prometheus 2 gudaha Percona Monitoring and Management (PMM) waxaanan fursad u helay inaan fahmo waxqabadka Prometheus 2 TSDB. Maqaalkan waxaan kaga hadli doonaa natiijada indha-indheyntan.

Celceliska culeyska shaqada ee Prometheus

Kuwa loo isticmaalo inay wax ka qabtaan xogta xogta guud, culayska shaqada ee Prometheus waa mid aad u xiiso badan. Heerka ururinta xogta waxay u egtahay inay xasiloon tahay: badiyaa adeegyada aad la socoto waxay soo diraan ku dhawaad ​​tiro isku mid ah oo cabbiro ah, kaabayaashuna si tartiib tartiib ah ayey isu beddelaan.
Codsiyada macluumaadka waxay ka iman karaan ilo kala duwan. Qaar ka mid ah, sida digniinaha, waxay sidoo kale ku dadaalaan qiime deggan oo la saadaalin karo. Kuwa kale, sida codsiyada isticmaalaha, waxay keeni karaan dillaac, in kasta oo tani aysan ahayn kiiska culeysyada shaqada badankooda.

Imtixaanka culeyska

Inta lagu jiro tijaabada, waxaan diiradda saaray awoodda ururinta xogta. Waxaan geeyay Prometheus 2.3.2 oo lagu soo ururiyay Go 1.10.1 (oo qayb ka ah PMM 1.14) adeega Lindode anigoo isticmaalaya qoraalkan: StackScript. Jiilka culeyska ugu macquulsan, adoo isticmaalaya tan StackScript Waxaan bilaabay dhowr noode MySQL ah oo wata culeys dhab ah (Sysbench TPC-C Test), mid kasta oo ka mid ah kaas oo ku dayay 10 Linux/MySQL nodes.
Dhammaan tijaabooyinkan soo socda ayaa lagu sameeyay server-ka Linode oo leh siddeed xargo oo muuqaal ah iyo 32 GB oo xusuusta ah, oo ku shaqeynaya 20 jilitaannada xamuulka ah oo kormeeraya laba boqol oo xaaladood MySQL. Ama, marka la eego ereyada Prometheus, 800 bartilmaameed, 440 xoqid ilbiriqsi kasta, 380 kun oo rikoor ah ilbiriqsi kasta, iyo 1,7 milyan oo taxane waqti firfircoon ah.

Design

Habka caadiga ah ee kaydinta dhaqameed, oo ay ku jirto midka uu isticmaalo Prometheus 1.x, waa in xadka xusuusta. Haddii ay ku filnaan waydo in aad xakamayso culayska, waxa aad la kulmi doontaa daahitaanno sare iyo codsiyada qaarkood way fashilmi doonaan. Isticmaalka xusuusta ee Prometheus 2 waxaa lagu habeyn karaa furaha storage.tsdb.min-block-duration, kaas oo go'aaminaya muddada duubista lagu hayn doono xusuusta ka hor inta aan lagu shubin saxanka (default waa 2 saacadood). Qadarka xusuusta loo baahan yahay waxay ku xirnaan doontaa tirada taxanaha wakhtiga, calaamadaha, iyo xoqidda lagu daray qulqulka saafiga ah. Marka la eego booska saxanka, Prometheus wuxuu higsanayaa inuu isticmaalo 3 bytes diiwaankiiba (muunad). Dhanka kale, shuruudaha xusuusta ayaa aad uga sarreeya.

Inkasta oo ay suurtagal tahay in la habeeyo cabbirka xannibaadda, laguma talinayo in lagu habeeyo gacanta, sidaas darteed waxaa lagugu qasbay inaad siiso Prometheus inta ugu badan ee xusuusta ay u baahan tahay shaqadaada.
Haddii aysan jirin xusuus ku filan oo lagu taageerayo socodka soo socda ee metrics, Prometheus wuxuu ka dhici doonaa xusuusta ama dilaaga OOM ayaa heli doona.
Ku darista isweydaarsiga si loo daahiyo shilka marka Prometheus ay ka dhamaato xusuusta runtii ma caawinayso, sababtoo ah isticmaalka shaqadan waxay sababtaa isticmaalka xusuusta qarxa. Waxaan filayaa inay tahay wax ku saabsan Go, qashin ururinteeda iyo habka ay ula macaamilayso isdhaafsiga.
Habka kale ee xiisaha leh ayaa ah in la habeeyo qaybta madaxa si loogu daadiyo disk waqti go'an, halkii laga tirin lahaa bilawga habka.

Falanqaynta TSDB ee Prometheus 2

Sida aad ka arki karto garaafka, ku shubashada saxanka waxay dhacdaa labadii saacadoodba mar. Haddii aad beddesho qiyaasta muddada-yar-block ilaa hal saac, markaa dib-u-habayntan ayaa dhici doona saacad kasta, laga bilaabo nus saac ka dib.
Haddii aad rabto inaad tan iyo garaafyada kale ku isticmaasho rakibaada Prometheus, waad isticmaali kartaa tan dashboard. Waxaa loogu talagalay PMM laakiin, iyada oo leh wax-ka-beddelo yaryar, waxay ku habboon tahay rakibaad kasta oo Prometheus ah.
Waxaan leenahay block firfircoon oo loo yaqaan block block kaas oo ku kaydsan xusuusta; blocks oo leh xog hore ayaa la heli karaa iyada oo loo marayo mmap(). Tani waxay meesha ka saaraysaa baahida loo qabo in kaydinta si gaar ah loo habeeyo, laakiin sidoo kale waxay la macno tahay inaad u baahan tahay inaad ka tagto meel ku filan kaydinta nidaamka qalliinka haddii aad rabto inaad waydiiso xogta ka weyn waxa madaxa block uu qaadi karo.
Tani waxay sidoo kale ka dhigan tahay in isticmaalka xusuusta farsamada ee Prometheus uu u ekaan doono mid aad u sarreeya, taas oo aan ahayn wax laga walwalo.

Falanqaynta TSDB ee Prometheus 2

Qodob kale oo xiiso leh ayaa ah isticmaalka WAL (qor hore u qor). Sida aad ka arki karto dukumeentiga kaydinta, Prometheus waxay isticmaashaa WAL si ay uga fogaato shilalka. Hababka gaarka ah ee dammaanadda badbaadada xogta ayaa ah, nasiib darro, si fiican looma diiwaangelin. Nooca Prometheus 2.3.2 wuxuu ku shubaa WAL saxanka 10-kii ilbiriqsi kasta oo ikhtiyaarkan maaha mid isticmaaluhu habayn karo.

Isku-duubnida

Prometheus TSDB waxa loo nashqadeeyay sida dukaanka LSM (Log Structured Merge): qaybta madaxa ayaa si xilliyo ah loogu shubaa saxanka, halka habka isafgaradku uu isku darayo baloogyo badan si looga fogaado in la baadho baloogyo badan inta lagu jiro su'aalaha. Halkan waxaad ku arki kartaa tirada baloogyada aan ku arkay nidaamka tijaabada ka dib maalin rar.

Falanqaynta TSDB ee Prometheus 2

Haddii aad rabto inaad wax badan ka ogaato dukaanka, waxaad baari kartaa faylka meta.json, kaas oo haya macluumaadka ku saabsan baloogyada jira iyo sida ay ku yimaadeen.

{
       "ulid": "01CPZDPD1D9R019JS87TPV5MPE",
       "minTime": 1536472800000,
       "maxTime": 1536494400000,
       "stats": {
               "numSamples": 8292128378,
               "numSeries": 1673622,
               "numChunks": 69528220
       },
       "compaction": {
               "level": 2,
               "sources": [
                       "01CPYRY9MS465Y5ETM3SXFBV7X",
                       "01CPYZT0WRJ1JB1P0DP80VY5KJ",
                       "01CPZ6NR4Q3PDP3E57HEH760XS"
               ],
               "parents": [
                       {
                               "ulid": "01CPYRY9MS465Y5ETM3SXFBV7X",
                               "minTime": 1536472800000,
                               "maxTime": 1536480000000
                       },
                       {
                               "ulid": "01CPYZT0WRJ1JB1P0DP80VY5KJ",
                               "minTime": 1536480000000,
                               "maxTime": 1536487200000
                       },
                       {
                               "ulid": "01CPZ6NR4Q3PDP3E57HEH760XS",
                               "minTime": 1536487200000,
                               "maxTime": 1536494400000
                       }
               ]
       },
       "version": 1
}

Isku-duubnida Prometheus waxay ku xidhan tahay wakhtiga xannibaadda madaxa lagu daadiyo saxanka. Halkaa marka ay marayso, dhawr hawlgal oo noocaas ah ayaa laga yaabaa in la fuliyo.

Falanqaynta TSDB ee Prometheus 2

Waxay u muuqataa in isugaynta aanay sinaba u xaddidnayn oo ay sababi karto xargaha I/O ee waaweyn inta lagu jiro fulinta.

Falanqaynta TSDB ee Prometheus 2

culeyska CPU ee kor u kaca

Falanqaynta TSDB ee Prometheus 2

Dabcan, tani waxay saameyn xun ku leedahay xawaaraha nidaamka, waxayna sidoo kale caqabad weyn ku tahay kaydinta LSM: sida loo sameeyo isku-dhafka si loo taageero heerarka sare ee codsiga iyada oo aan keenin xad-dhaaf badan?
Isticmaalka xusuusta ee habka isafgaradka ayaa sidoo kale u muuqda mid xiiso leh.

Falanqaynta TSDB ee Prometheus 2

Waxaan arki karnaa sida, isugeynta ka dib, inta badan xusuusta ay u beddesho xaalad Cache ilaa Bilaash: tani waxay ka dhigan tahay in macluumaadka qiimaha leh laga saaray halkaas. Cajiib haddii halkan lagu isticmaalo fadvice() ama farsamo kale oo yaraynta, mise sababtoo ah kaydka ayaa laga xoreeyay baloogyada la burburiyay intii lagu jiray isugaynta?

Soo kabashada ka dib guuldarada

Ka soo kabashada guuldarrooyinka waxay qaadataa waqti, iyo sabab wanaagsan. Socodka soo socda ee hal milyan oo diiwaan ilbiriqsi kasta, waa inaan sugo ilaa 25 daqiiqo inta soo kabashada la sameeyay iyadoo la tixgelinayo darawalka SSD.

level=info ts=2018-09-13T13:38:14.09650965Z caller=main.go:222 msg="Starting Prometheus" version="(version=2.3.2, branch=v2.3.2, revision=71af5e29e815795e9dd14742ee7725682fa14b7b)"
level=info ts=2018-09-13T13:38:14.096599879Z caller=main.go:223 build_context="(go=go1.10.1, user=Jenkins, date=20180725-08:58:13OURCE)"
level=info ts=2018-09-13T13:38:14.096624109Z caller=main.go:224 host_details="(Linux 4.15.0-32-generic #35-Ubuntu SMP Fri Aug 10 17:58:07 UTC 2018 x86_64 1bee9e9b78cf (none))"
level=info ts=2018-09-13T13:38:14.096641396Z caller=main.go:225 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2018-09-13T13:38:14.097715256Z caller=web.go:415 component=web msg="Start listening for connections" address=:9090
level=info ts=2018-09-13T13:38:14.097400393Z caller=main.go:533 msg="Starting TSDB ..."
level=info ts=2018-09-13T13:38:14.098718401Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536530400000 maxt=1536537600000 ulid=01CQ0FW3ME8Q5W2AN5F9CB7R0R
level=info ts=2018-09-13T13:38:14.100315658Z caller=web.go:467 component=web msg="router prefix" prefix=/prometheus
level=info ts=2018-09-13T13:38:14.101793727Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536732000000 maxt=1536753600000 ulid=01CQ78486TNX5QZTBF049PQHSM
level=info ts=2018-09-13T13:38:14.102267346Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536537600000 maxt=1536732000000 ulid=01CQ78DE7HSQK0C0F5AZ46YGF0
level=info ts=2018-09-13T13:38:14.102660295Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536775200000 maxt=1536782400000 ulid=01CQ7SAT4RM21Y0PT5GNSS146Q
level=info ts=2018-09-13T13:38:14.103075885Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536753600000 maxt=1536775200000 ulid=01CQ7SV8WJ3C2W5S3RTAHC2GHB
level=error ts=2018-09-13T14:05:18.208469169Z caller=wal.go:275 component=tsdb msg="WAL corruption detected; truncating" err="unexpected CRC32 checksum d0465484, want 0" file=/opt/prometheus/data/.prom2-data/wal/007357 pos=15504363
level=info ts=2018-09-13T14:05:19.471459777Z caller=main.go:543 msg="TSDB started"
level=info ts=2018-09-13T14:05:19.471604598Z caller=main.go:603 msg="Loading configuration file" filename=/etc/prometheus.yml
level=info ts=2018-09-13T14:05:19.499156711Z caller=main.go:629 msg="Completed loading of configuration file" filename=/etc/prometheus.yml
level=info ts=2018-09-13T14:05:19.499228186Z caller=main.go:502 msg="Server is ready to receive web requests."

Dhibaatada ugu weyn ee habka soo kabashada waa isticmaalka xusuusta sare. In kasta oo xaqiiqda ah in xaalad caadi ah server-ku uu si deggan ugu shaqeyn karo isla qaddarka xusuusta, haddii uu burburo waxaa laga yaabaa inuusan soo kabsaneynin sababtoo ah OOM. Xalka kaliya ee aan helay ayaa ahaa in aan joojiyo xog ururinta, keeno server-ka, ha soo kabsado oo dib u bilaabo iyada oo ururinta karti loo leeyahay.

kululaynta

Dabeecad kale oo maskaxda lagu hayo inta lagu jiro diirimaadka waa xiriirka ka dhexeeya waxqabadka hooseeya iyo isticmaalka kheyraadka sare isla markaaba bilawga. Inta lagu jiro qaar, laakiin aan dhammaan bilaabmin, waxaan arkay culeys culus oo ku saabsan CPU iyo xusuusta.

Falanqaynta TSDB ee Prometheus 2

Falanqaynta TSDB ee Prometheus 2

Nusqaamaha isticmaalka xusuusta ayaa tilmaamaya in Prometheus uusan isku habeyn karin dhammaan aruurinta bilowga, macluumaadka qaarna waa lumay.
Ma garanayo sababaha saxda ah ee CPU-ga sareeya iyo culeyska xusuusta. Waxaan ka shakisanahay in tani ay sabab u tahay abuurista taxane cusub oo waqtiyo ah oo ku yaal xannibaadda madaxa oo leh soo noqnoqoshada sare.

Culayska CPU ayaa kor u kacaya

Marka lagu daro isugaynta, kuwaas oo abuuraya culeyska I/O ee cadaaladda sareeyo, waxaan ogaaday kor u kac culus oo ku jira culeyska CPU labadii daqiiqoba mar. Dilaacyadu way dheer yihiin marka qulqulka wax gelinta ahi uu sarreeyo oo ay u muuqato inay sababto qashin ururiyaha Go, iyadoo ugu yaraan qaar ka mid ah koofiyadaha si buuxda loo raray.

Falanqaynta TSDB ee Prometheus 2

Falanqaynta TSDB ee Prometheus 2

boodboodyadani maaha kuwo aad u yar. Waxay u muuqataa in marka kuwani dhacaan, barta gudaha ee Prometheus iyo cabbirada ay noqdaan kuwo aan la heli karin, taasoo keenaysa farqiga xogta inta lagu jiro isla waqtiyadaas.

Falanqaynta TSDB ee Prometheus 2

Waxa kale oo aad ogaan kartaa in dhoofiyaha Prometheus uu xidho hal ilbiriqsi.

Falanqaynta TSDB ee Prometheus 2

Waxaan ogaan karnaa xidhiidhka ka dhexeeya ururinta qashinka (GC).

Falanqaynta TSDB ee Prometheus 2

gunaanad

TSDB ee Prometheus 2 waa mid dhakhso badan, oo awood u leh inuu maareeyo malaayiin taxane ah isla mar ahaantaana kumanyaal rikoor ah ilbiriqsi kasta iyadoo la adeegsanayo qalab dhexdhexaad ah. Isticmaalka CPU iyo diskka I/O sidoo kale waa mid cajiib ah. Tusaalahaygu wuxuu muujiyay ilaa 200 metrik ilbiriqsi kasta xudunta la isticmaalo.

Si aad u qorsheysato balaadhinta, waxaad u baahan tahay inaad xasuusato qiyaas ku filan oo xusuusta ah, tani waa inay noqotaa xusuusta dhabta ah. Qadarka xusuusta la isticmaalay ee aan arkay waxay ahayd ilaa 5 GB 100 diiwaankii ilbiriqsi ee qulqulka soo socda, kaas oo ay weheliso kaydinta nidaamka hawlgalka ay siisay ilaa 000 GB oo xusuusta la qabsaday.

Dabcan, weli waxaa jira shaqo badan oo la qabanayo si loo xakameeyo CPU iyo disk I / O spikes, tani maahan wax la yaab leh marka la eego sida da'yarta TSDB Prometheus 2 loo barbar dhigo InnoDB, TokuDB, RocksDB, WiredTiger, laakiin dhammaantood waxay lahaayeen isku mid. dhibaatooyinka horaanta meertada noloshooda.

Source: www.habr.com

Add a comment