Xogta taxanaha wakhtiga (TSDB) ee Prometheus 2 waa tusaale aad u fiican oo ah xalka injineernimada kaas oo bixiya horumarin weyn oo ku saabsan kaydinta v2 ee Prometheus 1 marka loo eego xawaaraha ururinta xogta, fulinta weydiinta, iyo hufnaanta kheyraadka. Waxaan ka fulineynay Prometheus 2 gudaha Percona Monitoring and Management (PMM) waxaanan fursad u helay inaan fahmo waxqabadka Prometheus 2 TSDB. Maqaalkan waxaan kaga hadli doonaa natiijada indha-indheyntan.
Celceliska culeyska shaqada ee Prometheus
Kuwa loo isticmaalo inay wax ka qabtaan xogta xogta guud, culayska shaqada ee Prometheus waa mid aad u xiiso badan. Heerka ururinta xogta waxay u egtahay inay xasiloon tahay: badiyaa adeegyada aad la socoto waxay soo diraan ku dhawaad ββtiro isku mid ah oo cabbiro ah, kaabayaashuna si tartiib tartiib ah ayey isu beddelaan.
Codsiyada macluumaadka waxay ka iman karaan ilo kala duwan. Qaar ka mid ah, sida digniinaha, waxay sidoo kale ku dadaalaan qiime deggan oo la saadaalin karo. Kuwa kale, sida codsiyada isticmaalaha, waxay keeni karaan dillaac, in kasta oo tani aysan ahayn kiiska culeysyada shaqada badankooda.
Imtixaanka culeyska
Inta lagu jiro tijaabada, waxaan diiradda saaray awoodda ururinta xogta. Waxaan geeyay Prometheus 2.3.2 oo lagu soo ururiyay Go 1.10.1 (oo qayb ka ah PMM 1.14) adeega Lindode anigoo isticmaalaya qoraalkan:
Dhammaan tijaabooyinkan soo socda ayaa lagu sameeyay server-ka Linode oo leh siddeed xargo oo muuqaal ah iyo 32 GB oo xusuusta ah, oo ku shaqeynaya 20 jilitaannada xamuulka ah oo kormeeraya laba boqol oo xaaladood MySQL. Ama, marka la eego ereyada Prometheus, 800 bartilmaameed, 440 xoqid ilbiriqsi kasta, 380 kun oo rikoor ah ilbiriqsi kasta, iyo 1,7 milyan oo taxane waqti firfircoon ah.
Design
Habka caadiga ah ee kaydinta dhaqameed, oo ay ku jirto midka uu isticmaalo Prometheus 1.x, waa in storage.tsdb.min-block-duration
, kaas oo go'aaminaya muddada duubista lagu hayn doono xusuusta ka hor inta aan lagu shubin saxanka (default waa 2 saacadood). Qadarka xusuusta loo baahan yahay waxay ku xirnaan doontaa tirada taxanaha wakhtiga, calaamadaha, iyo xoqidda lagu daray qulqulka saafiga ah. Marka la eego booska saxanka, Prometheus wuxuu higsanayaa inuu isticmaalo 3 bytes diiwaankiiba (muunad). Dhanka kale, shuruudaha xusuusta ayaa aad uga sarreeya.
Inkasta oo ay suurtagal tahay in la habeeyo cabbirka xannibaadda, laguma talinayo in lagu habeeyo gacanta, sidaas darteed waxaa lagugu qasbay inaad siiso Prometheus inta ugu badan ee xusuusta ay u baahan tahay shaqadaada.
Haddii aysan jirin xusuus ku filan oo lagu taageerayo socodka soo socda ee metrics, Prometheus wuxuu ka dhici doonaa xusuusta ama dilaaga OOM ayaa heli doona.
Ku darista isweydaarsiga si loo daahiyo shilka marka Prometheus ay ka dhamaato xusuusta runtii ma caawinayso, sababtoo ah isticmaalka shaqadan waxay sababtaa isticmaalka xusuusta qarxa. Waxaan filayaa inay tahay wax ku saabsan Go, qashin ururinteeda iyo habka ay ula macaamilayso isdhaafsiga.
Habka kale ee xiisaha leh ayaa ah in la habeeyo qaybta madaxa si loogu daadiyo disk waqti go'an, halkii laga tirin lahaa bilawga habka.
Sida aad ka arki karto garaafka, ku shubashada saxanka waxay dhacdaa labadii saacadoodba mar. Haddii aad beddesho qiyaasta muddada-yar-block ilaa hal saac, markaa dib-u-habayntan ayaa dhici doona saacad kasta, laga bilaabo nus saac ka dib.
Haddii aad rabto inaad tan iyo garaafyada kale ku isticmaasho rakibaada Prometheus, waad isticmaali kartaa tan
Waxaan leenahay block firfircoon oo loo yaqaan block block kaas oo ku kaydsan xusuusta; blocks oo leh xog hore ayaa la heli karaa iyada oo loo marayo mmap()
. Tani waxay meesha ka saaraysaa baahida loo qabo in kaydinta si gaar ah loo habeeyo, laakiin sidoo kale waxay la macno tahay inaad u baahan tahay inaad ka tagto meel ku filan kaydinta nidaamka qalliinka haddii aad rabto inaad waydiiso xogta ka weyn waxa madaxa block uu qaadi karo.
Tani waxay sidoo kale ka dhigan tahay in isticmaalka xusuusta farsamada ee Prometheus uu u ekaan doono mid aad u sarreeya, taas oo aan ahayn wax laga walwalo.
Qodob kale oo xiiso leh ayaa ah isticmaalka WAL (qor hore u qor). Sida aad ka arki karto dukumeentiga kaydinta, Prometheus waxay isticmaashaa WAL si ay uga fogaato shilalka. Hababka gaarka ah ee dammaanadda badbaadada xogta ayaa ah, nasiib darro, si fiican looma diiwaangelin. Nooca Prometheus 2.3.2 wuxuu ku shubaa WAL saxanka 10-kii ilbiriqsi kasta oo ikhtiyaarkan maaha mid isticmaaluhu habayn karo.
Isku-duubnida
Prometheus TSDB waxa loo nashqadeeyay sida dukaanka LSM (Log Structured Merge): qaybta madaxa ayaa si xilliyo ah loogu shubaa saxanka, halka habka isafgaradku uu isku darayo baloogyo badan si looga fogaado in la baadho baloogyo badan inta lagu jiro su'aalaha. Halkan waxaad ku arki kartaa tirada baloogyada aan ku arkay nidaamka tijaabada ka dib maalin rar.
Haddii aad rabto inaad wax badan ka ogaato dukaanka, waxaad baari kartaa faylka meta.json, kaas oo haya macluumaadka ku saabsan baloogyada jira iyo sida ay ku yimaadeen.
{
"ulid": "01CPZDPD1D9R019JS87TPV5MPE",
"minTime": 1536472800000,
"maxTime": 1536494400000,
"stats": {
"numSamples": 8292128378,
"numSeries": 1673622,
"numChunks": 69528220
},
"compaction": {
"level": 2,
"sources": [
"01CPYRY9MS465Y5ETM3SXFBV7X",
"01CPYZT0WRJ1JB1P0DP80VY5KJ",
"01CPZ6NR4Q3PDP3E57HEH760XS"
],
"parents": [
{
"ulid": "01CPYRY9MS465Y5ETM3SXFBV7X",
"minTime": 1536472800000,
"maxTime": 1536480000000
},
{
"ulid": "01CPYZT0WRJ1JB1P0DP80VY5KJ",
"minTime": 1536480000000,
"maxTime": 1536487200000
},
{
"ulid": "01CPZ6NR4Q3PDP3E57HEH760XS",
"minTime": 1536487200000,
"maxTime": 1536494400000
}
]
},
"version": 1
}
Isku-duubnida Prometheus waxay ku xidhan tahay wakhtiga xannibaadda madaxa lagu daadiyo saxanka. Halkaa marka ay marayso, dhawr hawlgal oo noocaas ah ayaa laga yaabaa in la fuliyo.
Waxay u muuqataa in isugaynta aanay sinaba u xaddidnayn oo ay sababi karto xargaha I/O ee waaweyn inta lagu jiro fulinta.
culeyska CPU ee kor u kaca
Dabcan, tani waxay saameyn xun ku leedahay xawaaraha nidaamka, waxayna sidoo kale caqabad weyn ku tahay kaydinta LSM: sida loo sameeyo isku-dhafka si loo taageero heerarka sare ee codsiga iyada oo aan keenin xad-dhaaf badan?
Isticmaalka xusuusta ee habka isafgaradka ayaa sidoo kale u muuqda mid xiiso leh.
Waxaan arki karnaa sida, isugeynta ka dib, inta badan xusuusta ay u beddesho xaalad Cache ilaa Bilaash: tani waxay ka dhigan tahay in macluumaadka qiimaha leh laga saaray halkaas. Cajiib haddii halkan lagu isticmaalo fadvice()
ama farsamo kale oo yaraynta, mise sababtoo ah kaydka ayaa laga xoreeyay baloogyada la burburiyay intii lagu jiray isugaynta?
Soo kabashada ka dib guuldarada
Ka soo kabashada guuldarrooyinka waxay qaadataa waqti, iyo sabab wanaagsan. Socodka soo socda ee hal milyan oo diiwaan ilbiriqsi kasta, waa inaan sugo ilaa 25 daqiiqo inta soo kabashada la sameeyay iyadoo la tixgelinayo darawalka SSD.
level=info ts=2018-09-13T13:38:14.09650965Z caller=main.go:222 msg="Starting Prometheus" version="(version=2.3.2, branch=v2.3.2, revision=71af5e29e815795e9dd14742ee7725682fa14b7b)"
level=info ts=2018-09-13T13:38:14.096599879Z caller=main.go:223 build_context="(go=go1.10.1, user=Jenkins, date=20180725-08:58:13OURCE)"
level=info ts=2018-09-13T13:38:14.096624109Z caller=main.go:224 host_details="(Linux 4.15.0-32-generic #35-Ubuntu SMP Fri Aug 10 17:58:07 UTC 2018 x86_64 1bee9e9b78cf (none))"
level=info ts=2018-09-13T13:38:14.096641396Z caller=main.go:225 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2018-09-13T13:38:14.097715256Z caller=web.go:415 component=web msg="Start listening for connections" address=:9090
level=info ts=2018-09-13T13:38:14.097400393Z caller=main.go:533 msg="Starting TSDB ..."
level=info ts=2018-09-13T13:38:14.098718401Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536530400000 maxt=1536537600000 ulid=01CQ0FW3ME8Q5W2AN5F9CB7R0R
level=info ts=2018-09-13T13:38:14.100315658Z caller=web.go:467 component=web msg="router prefix" prefix=/prometheus
level=info ts=2018-09-13T13:38:14.101793727Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536732000000 maxt=1536753600000 ulid=01CQ78486TNX5QZTBF049PQHSM
level=info ts=2018-09-13T13:38:14.102267346Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536537600000 maxt=1536732000000 ulid=01CQ78DE7HSQK0C0F5AZ46YGF0
level=info ts=2018-09-13T13:38:14.102660295Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536775200000 maxt=1536782400000 ulid=01CQ7SAT4RM21Y0PT5GNSS146Q
level=info ts=2018-09-13T13:38:14.103075885Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536753600000 maxt=1536775200000 ulid=01CQ7SV8WJ3C2W5S3RTAHC2GHB
level=error ts=2018-09-13T14:05:18.208469169Z caller=wal.go:275 component=tsdb msg="WAL corruption detected; truncating" err="unexpected CRC32 checksum d0465484, want 0" file=/opt/prometheus/data/.prom2-data/wal/007357 pos=15504363
level=info ts=2018-09-13T14:05:19.471459777Z caller=main.go:543 msg="TSDB started"
level=info ts=2018-09-13T14:05:19.471604598Z caller=main.go:603 msg="Loading configuration file" filename=/etc/prometheus.yml
level=info ts=2018-09-13T14:05:19.499156711Z caller=main.go:629 msg="Completed loading of configuration file" filename=/etc/prometheus.yml
level=info ts=2018-09-13T14:05:19.499228186Z caller=main.go:502 msg="Server is ready to receive web requests."
Dhibaatada ugu weyn ee habka soo kabashada waa isticmaalka xusuusta sare. In kasta oo xaqiiqda ah in xaalad caadi ah server-ku uu si deggan ugu shaqeyn karo isla qaddarka xusuusta, haddii uu burburo waxaa laga yaabaa inuusan soo kabsaneynin sababtoo ah OOM. Xalka kaliya ee aan helay ayaa ahaa in aan joojiyo xog ururinta, keeno server-ka, ha soo kabsado oo dib u bilaabo iyada oo ururinta karti loo leeyahay.
kululaynta
Dabeecad kale oo maskaxda lagu hayo inta lagu jiro diirimaadka waa xiriirka ka dhexeeya waxqabadka hooseeya iyo isticmaalka kheyraadka sare isla markaaba bilawga. Inta lagu jiro qaar, laakiin aan dhammaan bilaabmin, waxaan arkay culeys culus oo ku saabsan CPU iyo xusuusta.
Nusqaamaha isticmaalka xusuusta ayaa tilmaamaya in Prometheus uusan isku habeyn karin dhammaan aruurinta bilowga, macluumaadka qaarna waa lumay.
Ma garanayo sababaha saxda ah ee CPU-ga sareeya iyo culeyska xusuusta. Waxaan ka shakisanahay in tani ay sabab u tahay abuurista taxane cusub oo waqtiyo ah oo ku yaal xannibaadda madaxa oo leh soo noqnoqoshada sare.
Culayska CPU ayaa kor u kacaya
Marka lagu daro isugaynta, kuwaas oo abuuraya culeyska I/O ee cadaaladda sareeyo, waxaan ogaaday kor u kac culus oo ku jira culeyska CPU labadii daqiiqoba mar. Dilaacyadu way dheer yihiin marka qulqulka wax gelinta ahi uu sarreeyo oo ay u muuqato inay sababto qashin ururiyaha Go, iyadoo ugu yaraan qaar ka mid ah koofiyadaha si buuxda loo raray.
boodboodyadani maaha kuwo aad u yar. Waxay u muuqataa in marka kuwani dhacaan, barta gudaha ee Prometheus iyo cabbirada ay noqdaan kuwo aan la heli karin, taasoo keenaysa farqiga xogta inta lagu jiro isla waqtiyadaas.
Waxa kale oo aad ogaan kartaa in dhoofiyaha Prometheus uu xidho hal ilbiriqsi.
Waxaan ogaan karnaa xidhiidhka ka dhexeeya ururinta qashinka (GC).
gunaanad
TSDB ee Prometheus 2 waa mid dhakhso badan, oo awood u leh inuu maareeyo malaayiin taxane ah isla mar ahaantaana kumanyaal rikoor ah ilbiriqsi kasta iyadoo la adeegsanayo qalab dhexdhexaad ah. Isticmaalka CPU iyo diskka I/O sidoo kale waa mid cajiib ah. Tusaalahaygu wuxuu muujiyay ilaa 200 metrik ilbiriqsi kasta xudunta la isticmaalo.
Si aad u qorsheysato balaadhinta, waxaad u baahan tahay inaad xasuusato qiyaas ku filan oo xusuusta ah, tani waa inay noqotaa xusuusta dhabta ah. Qadarka xusuusta la isticmaalay ee aan arkay waxay ahayd ilaa 5 GB 100 diiwaankii ilbiriqsi ee qulqulka soo socda, kaas oo ay weheliso kaydinta nidaamka hawlgalka ay siisay ilaa 000 GB oo xusuusta la qabsaday.
Dabcan, weli waxaa jira shaqo badan oo la qabanayo si loo xakameeyo CPU iyo disk I / O spikes, tani maahan wax la yaab leh marka la eego sida da'yarta TSDB Prometheus 2 loo barbar dhigo InnoDB, TokuDB, RocksDB, WiredTiger, laakiin dhammaantood waxay lahaayeen isku mid. dhibaatooyinka horaanta meertada noloshooda.
Source: www.habr.com