Ukuhlaziywa kwe-TSDB ku-Prometheus 2

Ukuhlaziywa kwe-TSDB ku-Prometheus 2

I-database yochungechunge lwesikhathi (TSDB) ku-Prometheus 2 iyisibonelo esihle kakhulu sesixazululo sobunjiniyela esinikeza intuthuko enkulu ngaphezu kwesitoreji se-v2 ku-Prometheus 1 ngokuya ngesivinini sokuqoqwa kwedatha, ukukhishwa kwemibuzo, nokusebenza kahle kwezinsiza. Besisebenzisa i-Prometheus 2 ku-Percona Monitoring and Management (PMM) futhi ngaba nethuba lokuqonda ukusebenza kwe-Prometheus 2 TSDB. Kulesi sihloko ngizokhuluma ngemiphumela yalokhu kubuka.

Isilinganiso somsebenzi we-Prometheus

Kulabo abajwayele ukubhekana nesizindalwazi senhloso ejwayelekile, umthwalo ojwayelekile we-Prometheus uyathakazelisa impela. Izinga lokuqoqwa kwedatha livame ukuzinza: ngokuvamile amasevisi owaqaphayo athumela cishe inani elifanayo lamamethrikhi, futhi ingqalasizinda ishintsha kancane uma kuqhathaniswa.
Izicelo zolwazi zingavela emithonjeni ehlukahlukene. Ezinye zazo, njengezixwayiso, nazo zilwela ukuba nevelu elizinzile nelibikezelayo. Ezinye, njengezicelo zabasebenzisi, zingase zibangele ukuqhuma, nakuba lokhu kungenjalo kumthwalo omningi womsebenzi.

Layisha ukuhlolwa

Ngesikhathi sokuhlola, ngigxile ekhonweni lokuqongelela idatha. Ngikhiphe i-Prometheus 2.3.2 ehlanganiswe ne-Go 1.10.1 (njengengxenye ye-PMM 1.14) kusevisi ye-Linode ngisebenzisa lesi script: I-StackScript. Ukuze uthole isizukulwane esingokoqobo somthwalo, usebenzisa lokhu I-StackScript Ngethule ama-node amaningana e-MySQL anomthwalo wangempela (Ukuhlolwa kwe-Sysbench TPC-C), ngayinye eyayilingisa ama-node ayi-10 e-Linux/MySQL.
Zonke lezi zivivinyo ezilandelayo zenziwa kuseva ye-Linode enama-virtual cores ayisishiyagalombili kanye nenkumbulo engu-32 GB, esebenzisa ukulingisa kwemithwalo engu-20 eqapha izimo ezingamakhulu amabili ze-MySQL. Noma, ngokwemibandela ye-Prometheus, okuhlosiwe okungu-800, ama-scrapes angu-440 ngomzuzwana, amarekhodi ayizinkulungwane ezingu-380 ngomzuzwana, kanye nochungechunge lwesikhathi esisebenzayo oluyizigidi ezingu-1,7.

Design

Indlela evamile yolwazi olugciniwe oluvamile, kuhlanganise naleyo esetshenziswa i-Prometheus 1.x, iwukuba umkhawulo wenkumbulo. Uma kunganele ukuphatha umthwalo, uzothola ukubambezeleka okuphezulu futhi ezinye izicelo zizohluleka. Ukusetshenziswa kwememori ku-Prometheus 2 kuyalungiseka ngokhiye storage.tsdb.min-block-duration, enquma ukuthi ukurekhodwa kuzogcinwa isikhathi esingakanani enkumbulweni ngaphambi kokuthi kuhanjiswe kudiski (okuzenzakalelayo amahora angu-2). Inani lememori elidingekayo lizoncika enanini lochungechunge lwesikhathi, amalebula, nokuklwebheka okungezwe ekusakazweni okuphelele okungenayo. Ngokuphathelene nesikhala sediski, i-Prometheus ihlose ukusebenzisa amabhayithi angu-3 ngerekhodi ngalinye (isampula). Ngakolunye uhlangothi, izidingo zenkumbulo ziphakeme kakhulu.

Nakuba kungenzeka ukulungisa usayizi we-block, akunconywa ukuyilungisa ngesandla, ngakho-ke uphoqeleka ukunikeza i-Prometheus inkumbulo eningi njengoba idinga umthwalo wakho womsebenzi.
Uma ingekho inkumbulo eyanele yokusekela ukusakaza okungenayo kwamamethrikhi, u-Prometheus uzophelelwa yinkumbulo noma umbulali we-OOM uzofika kukho.
Ukwengeza ukushintshwa ukuze kubambezeleke ukuphahlazeka lapho u-Prometheus ephelelwa yinkumbulo akusizi ngempela, ngoba ukusebenzisa lo msebenzi kubangela ukusetshenziswa kwenkumbulo okuqhumayo. Ngicabanga ukuthi ihlobene ne-Go, umqoqi wayo wezibi kanye nendlela ebhekana ngayo nokushintshana.
Enye indlela ethakazelisayo ukulungisa i-head block ukuthi ihanjiswe kudiski ngesikhathi esithile, esikhundleni sokuyibala kusukela ekuqaleni kwenqubo.

Ukuhlaziywa kwe-TSDB ku-Prometheus 2

Njengoba ubona kugrafu, ukushintshela kudiski kwenzeka njalo emahoreni amabili. Uma ushintsha ipharamitha ye-min-block-ubude ibe ihora elilodwa, lokhu kusetha kabusha kuzokwenzeka njalo ngehora, kuqala ngemva kwesigamu sehora.
Uma ufuna ukusebenzisa leli kanye namanye amagrafu ekufakeni kwakho i-Prometheus, ungasebenzisa lokhu ideshibhodi. Yakhelwe i-PMM kodwa, ngokuguqulwa okuncane, ingena kunoma yikuphi ukufakwa kwe-Prometheus.
Sine-block esebenzayo ebizwa ngokuthi i-head block egcinwe kumemori; amabhlogo anedatha endala ayatholakala nge mmap(). Lokhu kuqeda isidingo sokumisa i-cache ngokwehlukana, kodwa futhi kusho ukuthi udinga ukushiya isikhala esanele se-cache yesistimu yokusebenza uma ufuna ukubuza idatha endala kunaleyo i-head block engakwazi ukukwamukela.
Lokhu kusho futhi ukuthi ukusetshenziswa kwememori ebonakalayo ye-Prometheus kuzobukeka kuphakeme kakhulu, okungeyona into yokukhathazeka ngayo.

Ukuhlaziywa kwe-TSDB ku-Prometheus 2

Elinye iphuzu elithakazelisayo lokuklama ukusetshenziswa kwe-WAL (bhala ilogi kusengaphambili). Njengoba ubona kumadokhumenti esitoreji, u-Prometheus usebenzisa i-WAL ukugwema ukuphahlazeka. Izindlela eziqondile zokuqinisekisa ukusinda kwedatha, ngeshwa, azibhalwanga kahle. Inguqulo ye-Prometheus 2.3.2 isusa i-WAL kudiski njalo ngemizuzwana eyi-10 futhi le nketho ayilungiseki umsebenzisi.

Ukuhlanganisa

I-Prometheus TSDB iklanywe njengesitolo se-LSM (I-Log Structured Merge): ibhulokhi yekhanda iguquguquka ngezikhathi ezithile kudiski, kuyilapho indlela yokuhlanganisa ihlanganisa amabhulokhi amaningi ndawonye ukuze kugwenywe ukuskena amabhulokhi amaningi ngesikhathi semibuzo. Lapha ungabona inani lamabhulokhi engiwabonile ohlelweni lokuhlola ngemva kosuku lomthwalo.

Ukuhlaziywa kwe-TSDB ku-Prometheus 2

Uma ufuna ukufunda okwengeziwe mayelana nesitolo, ungahlola ifayela le-meta.json, elinolwazi mayelana namabhulokhi atholakalayo nokuthi aba khona kanjani.

{
       "ulid": "01CPZDPD1D9R019JS87TPV5MPE",
       "minTime": 1536472800000,
       "maxTime": 1536494400000,
       "stats": {
               "numSamples": 8292128378,
               "numSeries": 1673622,
               "numChunks": 69528220
       },
       "compaction": {
               "level": 2,
               "sources": [
                       "01CPYRY9MS465Y5ETM3SXFBV7X",
                       "01CPYZT0WRJ1JB1P0DP80VY5KJ",
                       "01CPZ6NR4Q3PDP3E57HEH760XS"
               ],
               "parents": [
                       {
                               "ulid": "01CPYRY9MS465Y5ETM3SXFBV7X",
                               "minTime": 1536472800000,
                               "maxTime": 1536480000000
                       },
                       {
                               "ulid": "01CPYZT0WRJ1JB1P0DP80VY5KJ",
                               "minTime": 1536480000000,
                               "maxTime": 1536487200000
                       },
                       {
                               "ulid": "01CPZ6NR4Q3PDP3E57HEH760XS",
                               "minTime": 1536487200000,
                               "maxTime": 1536494400000
                       }
               ]
       },
       "version": 1
}

Ama-Compactions ku-Prometheus aboshelwe esikhathini lapho ibhulokhi yekhanda igudluzwa kudiski. Kuleli qophelo, imisebenzi eminingana enjalo ingenziwa.

Ukuhlaziywa kwe-TSDB ku-Prometheus 2

Kubonakala sengathi ukuhlanganisa akunqunyelwe nganoma iyiphi indlela futhi kungabangela ama-spikes amakhulu we-disk I/O ngesikhathi sokubulawa.

Ukuhlaziywa kwe-TSDB ku-Prometheus 2

Ama-spikes wokulayisha we-CPU

Ukuhlaziywa kwe-TSDB ku-Prometheus 2

Yiqiniso, lokhu kunomthelela omubi kunalokho ejubaneni lesistimu, futhi kubangela inselele enkulu yokugcina i-LSM: indlela yokwenza ukuhlanganisa ukuze kusekelwe amanani ezicelo aphezulu ngaphandle kokubangela ukukhuphuka okukhulu?
Ukusetshenziswa kwenkumbulo enqubweni yokuhlanganisa nakho kubukeka kuthakazelisa kakhulu.

Ukuhlaziywa kwe-TSDB ku-Prometheus 2

Singabona ukuthi, ngemva kokuhlanganiswa, iningi lememori lishintsha kanjani isimo kusukela ku-Cached ukuya ku-Free: lokhu kusho ukuthi ulwazi olunamandla lususiwe lapho. Ngiyafisa ukwazi uma isetshenziswa lapha fadvice() noma enye indlela yokunciphisa, noma kungenxa yokuthi inqolobane ikhululiwe kumabhulokhi abhujiswe ngesikhathi sokuminyana?

Ukubuyisela ngemva kokwehluleka

Ukululama ekuhlulekeni kuthatha isikhathi, futhi ngesizathu esihle. Ukuze uthole ukusakazwa okungenayo kwamarekhodi ayisigidi ngomzuzwana, bekufanele ngilinde cishe imizuzu engama-25 ngenkathi ukutakula kwenziwa kucatshangelwa idrayivu ye-SSD.

level=info ts=2018-09-13T13:38:14.09650965Z caller=main.go:222 msg="Starting Prometheus" version="(version=2.3.2, branch=v2.3.2, revision=71af5e29e815795e9dd14742ee7725682fa14b7b)"
level=info ts=2018-09-13T13:38:14.096599879Z caller=main.go:223 build_context="(go=go1.10.1, user=Jenkins, date=20180725-08:58:13OURCE)"
level=info ts=2018-09-13T13:38:14.096624109Z caller=main.go:224 host_details="(Linux 4.15.0-32-generic #35-Ubuntu SMP Fri Aug 10 17:58:07 UTC 2018 x86_64 1bee9e9b78cf (none))"
level=info ts=2018-09-13T13:38:14.096641396Z caller=main.go:225 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2018-09-13T13:38:14.097715256Z caller=web.go:415 component=web msg="Start listening for connections" address=:9090
level=info ts=2018-09-13T13:38:14.097400393Z caller=main.go:533 msg="Starting TSDB ..."
level=info ts=2018-09-13T13:38:14.098718401Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536530400000 maxt=1536537600000 ulid=01CQ0FW3ME8Q5W2AN5F9CB7R0R
level=info ts=2018-09-13T13:38:14.100315658Z caller=web.go:467 component=web msg="router prefix" prefix=/prometheus
level=info ts=2018-09-13T13:38:14.101793727Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536732000000 maxt=1536753600000 ulid=01CQ78486TNX5QZTBF049PQHSM
level=info ts=2018-09-13T13:38:14.102267346Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536537600000 maxt=1536732000000 ulid=01CQ78DE7HSQK0C0F5AZ46YGF0
level=info ts=2018-09-13T13:38:14.102660295Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536775200000 maxt=1536782400000 ulid=01CQ7SAT4RM21Y0PT5GNSS146Q
level=info ts=2018-09-13T13:38:14.103075885Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536753600000 maxt=1536775200000 ulid=01CQ7SV8WJ3C2W5S3RTAHC2GHB
level=error ts=2018-09-13T14:05:18.208469169Z caller=wal.go:275 component=tsdb msg="WAL corruption detected; truncating" err="unexpected CRC32 checksum d0465484, want 0" file=/opt/prometheus/data/.prom2-data/wal/007357 pos=15504363
level=info ts=2018-09-13T14:05:19.471459777Z caller=main.go:543 msg="TSDB started"
level=info ts=2018-09-13T14:05:19.471604598Z caller=main.go:603 msg="Loading configuration file" filename=/etc/prometheus.yml
level=info ts=2018-09-13T14:05:19.499156711Z caller=main.go:629 msg="Completed loading of configuration file" filename=/etc/prometheus.yml
level=info ts=2018-09-13T14:05:19.499228186Z caller=main.go:502 msg="Server is ready to receive web requests."

Inkinga eyinhloko yenqubo yokutakula ukusetshenziswa kwememori ephezulu. Naphezu kweqiniso lokuthi esimweni esivamile iseva ingasebenza ngokuzinza ngenani elifanayo lememori, uma iphahlazeka ingase ingalulami ngenxa ye-OOM. Isixazululo kuphela engisitholile kwakuwukukhubaza ukuqoqwa kwedatha, ukuveza iseva, ukuyiyeka ilulame futhi iqalise kabusha ngokuqoqwa kunikwe amandla.

Ukufudumala

Okunye ukuziphatha okufanele kukhunjulwe ngesikhathi sokufudumala ubuhlobo phakathi kokusebenza okuphansi nokusetshenziswa okuphezulu kwezinsiza ngemva nje kokuqala. Phakathi kokunye, kodwa akuqali konke, ngibone umthwalo omkhulu ku-CPU nenkumbulo.

Ukuhlaziywa kwe-TSDB ku-Prometheus 2

Ukuhlaziywa kwe-TSDB ku-Prometheus 2

Izikhala ekusetshenzisweni kwememori zibonisa ukuthi u-Prometheus akakwazi ukumisa wonke amaqoqo kusukela ekuqaleni, futhi olunye ulwazi luyalahleka.
Angikatholi izizathu eziqondile ze-CPU ephezulu nomthwalo wenkumbulo. Ngisola ukuthi lokhu kungenxa yokudalwa kochungechunge lwesikhathi olusha ku-head block enemvamisa ephezulu.

Ukulayisha kwe-CPU kuyakhuphuka

Ngokungeziwe ekuhlanganiseni, okudala umthwalo we-I/O ophakeme kakhulu, ngibone ama-spikes amakhulu ekulayisheni kwe-CPU njalo ngemizuzu emibili. Ukuqhuma kuba kude uma ukugeleza kokufaka kuphezulu futhi kubonakala sengathi kubangelwa umqoqi kadoti we-Go, okungenani amanye ama-cores alayishwa ngokugcwele.

Ukuhlaziywa kwe-TSDB ku-Prometheus 2

Ukuhlaziywa kwe-TSDB ku-Prometheus 2

Lokhu kugxuma akukona okungasho lutho. Kubonakala sengathi uma lokhu kwenzeka, indawo yokungena yangaphakathi ka-Prometheus namamethrikhi awatholakali, okubangela izikhala zedatha phakathi nalezi zikhathi ezifanayo.

Ukuhlaziywa kwe-TSDB ku-Prometheus 2

Ungaqaphela futhi ukuthi umthengisi we-Prometheus uvala umzuzwana owodwa.

Ukuhlaziywa kwe-TSDB ku-Prometheus 2

Singaqaphela ukuhlobana nokuqoqwa kukadoti (GC).

Ukuhlaziywa kwe-TSDB ku-Prometheus 2

isiphetho

I-TSDB ku-Prometheus 2 iyashesha, ikwazi ukuphatha izigidi zochungechunge lwesikhathi futhi ngesikhathi esifanayo izinkulungwane zamarekhodi ngomzuzwana zisebenzisa ihadiwe elinesizotha. Ukusetshenziswa kwe-CPU nediski I/O nakho kuyamangaza. Isibonelo sami sibonise kufika kuma-metrics angu-200 ngesekhondi ngayinye esetshenzisiwe.

Ukuze uhlele ukunwetshwa, udinga ukukhumbula inani elanele lememori, futhi lokhu kufanele kube inkumbulo yangempela. Inani lenkumbulo elisetshenzisiwe engilibonile lalicishe libe ngu-5 GB ngamarekhodi ayi-100 ngomzuzwana wokusakaza ongenayo, okuhlangene nenqolobane yesistimu yokusebenza kwanikeza cishe i-000 GB yenkumbulo ehlala abantu.

Vele, usemningi umsebenzi okufanele wenziwe ukulungisa izikhonkwane ze-CPU kanye ne-disk I/O, futhi lokhu akumangazi uma ubheka ukuthi i-TSDB Prometheus 2 encane iqhathaniswa kanjani ne-InnoDB, TokuDB, RocksDB, WiredTiger, kodwa bonke babefana. izinkinga ekuqaleni komjikelezo wabo wokuphila.

Source: www.habr.com

Engeza amazwana