Uhlalutyo lwe-TSDB kwi-Prometheus 2

Uhlalutyo lwe-TSDB kwi-Prometheus 2

I-database ye-time series (TSDB) kwi-Prometheus 2 ngumzekelo obalaseleyo wesisombululo sobunjineli esinika ukuphuculwa okukhulu kwi-v2 yokugcina kwi-Prometheus 1 ngokubhekiselele kwisantya sokuqokelela idatha, ukuphunyezwa kwemibuzo, kunye nokusebenza kakuhle kwezibonelelo. Sasiphumeza iPrometheus 2 kwiPercona Monitoring and Management (PMM) kwaye ndafumana ithuba lokuqonda ukusebenza kwePrometheus 2 TSDB. Kweli nqaku ndiza kuthetha ngeziphumo zolu qwalaselo.

Umyinge woMsebenzi wePrometheus

Kwabo basetyenziselwa ukujongana needathabheyisi zenjongo ngokubanzi, umthwalo oqhelekileyo wePrometheus unomdla kakhulu. Izinga lokuqokelelwa kwedatha lithande ukuzinza: ngokwesiqhelo iinkonzo ozijongileyo zithumela malunga nenani elifanayo leemethrikhi, kwaye isiseko sitshintsha kancinci.
Izicelo zolwazi zinokuvela kwimithombo eyahlukeneyo. Ezinye zazo, ezifana nezilumkiso, zikwalwela ixabiso elizinzileyo neliqikelelwayo. Ezinye, njengezicelo zabasebenzisi, zinokubangela ukugqabhuka, nangona oku akunjalo kumthwalo omninzi womsebenzi.

Uvavanyo lomthwalo

Ngexesha lokuvavanya, ndagxila ekukwazini ukuqokelela idatha. Ndasebenzisa i-Prometheus 2.3.2 ehlanganiswe ne-Go 1.10.1 (njengenxalenye ye-PMM 1.14) kwinkonzo ye-Linode usebenzisa le script: StackScript. Kwesona sizukulwana sinomthwalo wokwenene, usebenzisa oku StackScript Ndasungula iindawo ezininzi ze-MySQL ezinomthwalo wangempela (Uvavanyo lwe-Sysbench TPC-C), nganye apho ifanise i-10 Linux / MySQL nodes.
Zonke ezi mvavanyo zilandelayo zenziwa kwi-server ye-Linode ene-core cores ezisibhozo kunye ne-32 GB yememori, iqhuba i-20 yokulinganisa imithwalo yokubeka iliso kwimizekelo engamakhulu amabini e-MySQL. Okanye, ngokwemigaqo ye-Prometheus, iithagethi ze-800, i-440 scrapes ngesekhondi, iirekhodi ezingamawaka angama-380 ngesekhondi, kunye ne-1,7 yezigidi zexesha elisebenzayo.

Uyilo

Indlela eqhelekileyo yogcino-lwazi lwemveli, kuqukwa naleyo isetyenziswa nguPrometheus 1.x, kuku umda wenkumbulo. Ukuba akwanelanga ukuphatha umthwalo, uya kuba ne-latencies ephezulu kwaye ezinye izicelo ziya kusilela. Ukusetyenziswa kwememori kwiPrometheus 2 kuqwalaselwe ngesitshixo storage.tsdb.min-block-duration, emisela ukuba urekhodo luya kugcinwa ixesha elingakanani kwimemori phambi kokugungxulwa kwidisk (okungagqibekanga ziiyure ezi-2). Ubungakanani bememori efunekayo iya kuxhomekeka kwinani lexesha loluhlu, iilebhile, kunye ne-scrapes ezongeziweyo kumjelo ongenayo ongenayo. Ngokuphathelele indawo yediski, i-Prometheus ijolise ekusebenziseni i-3 bytes ngerekhodi (isampuli). Kwelinye icala, iimfuno zememori ziphezulu kakhulu.

Nangona kunokwenzeka ukuqwalasela ubungakanani bebhloko, akukhuthazwa ukuba uyiqwalasele ngesandla, ngoko unyanzelekile ukuba unike i-Prometheus imemori eninzi njengoko ifuna umthwalo wakho womsebenzi.
Ukuba akukho nkumbulo yaneleyo yokuxhasa umjelo ongenayo weemethrikhi, i-Prometheus iya kuwa kwinkumbulo okanye umbulali we-OOM uya kufika kuyo.
Ukongeza utshintshiselwano ukulibazisa ingozi xa uPrometheus ephelelwa yinkumbulo akuncedi, kuba ukusebenzisa lo msebenzi kubangela ukusetyenziswa kwememori eqhumayo. Ndicinga ukuba yinto enxulumene neGo, umqokeleli wayo wenkunkuma kunye nendlela ejongana ngayo nokutshintshana.
Enye indlela enomdla kukuqwalasela ibhloko yentloko ukuba igungxulwe kwidiski ngexesha elithile, endaweni yokubala ukususela ekuqaleni kwenkqubo.

Uhlalutyo lwe-TSDB kwi-Prometheus 2

Njengoko unokubona kwigrafu, ukugungxulwa kwidisk kwenzeka rhoqo kwiiyure ezimbini. Ukuba utshintsha ipharamitha ye-min-block-ubude ukuya kwiyure enye, ke oku kusetwa kwakhona kuya kwenzeka rhoqo ngeyure, ukuqala emva kwesiqingatha seyure.
Ukuba ufuna ukusebenzisa le kunye nezinye iigrafu kufakelo lwakho lwePrometheus, ungasebenzisa oku kwideshibhodi. Yayilungiselelwe i-PMM kodwa, ngokuguqulwa okuncinci, ingena kuyo nayiphi na iPrometheus ufakelo.
Sinebhloko esebenzayo ebizwa ngokuba yibhloko yentloko egcinwe kwinkumbulo; iibhloko ezinedatha endala ziyafumaneka nge mmap(). Oku kuphelisa isidingo sokuqwalasela i-cache ngokwahlukileyo, kodwa kukwathetha ukuba kufuneka ushiye indawo eyaneleyo ye-cache yenkqubo yokusebenza ukuba ufuna ukubuza idatha endala kunokuba ibhloko yentloko inokusingatha.
Oku kuthetha ukuba ukusetyenziswa kwememori yePrometheus kuya kujongeka phezulu, engeyiyo into enokukhathazeka ngayo.

Uhlalutyo lwe-TSDB kwi-Prometheus 2

Enye inqaku loyilo elinomdla kukusetyenziswa kwe-WAL (bhala i-log yangaphambili). Njengoko unokubona kumaxwebhu okugcina, i-Prometheus isebenzisa i-WAL ukuphepha ukuphahlazeka. Iindlela ezikhethekileyo zokuqinisekisa ubukho bedatha, ngelishwa, azibhalwanga kakuhle. Uguqulelo lwe-Prometheus 2.3.2 lugungxula i-WAL kwidisk rhoqo ngemizuzwana eyi-10 kwaye olu khetho alunakuqwalaselwa ngumsebenzisi.

Ukudibanisa

I-Prometheus TSDB iyilwe njengevenkile ye-LSM (Log Structured Merge): ibhloko yentloko igungxulwa ngamaxesha athile kwidiski, ngelixa indlela yokudibanisa idibanisa iibhloko ezininzi kunye ukunqanda ukuskena iibhloko ezininzi ngexesha lemibuzo. Apha ungabona inani leebhloko endizibonileyo kwinkqubo yovavanyo emva kosuku lomthwalo.

Uhlalutyo lwe-TSDB kwi-Prometheus 2

Ukuba ufuna ukufunda ngakumbi malunga nevenkile, unokujonga ifayile ye-meta.json, enolwazi malunga neebhloko ezikhoyo kunye nokuba zeza njani.

{
       "ulid": "01CPZDPD1D9R019JS87TPV5MPE",
       "minTime": 1536472800000,
       "maxTime": 1536494400000,
       "stats": {
               "numSamples": 8292128378,
               "numSeries": 1673622,
               "numChunks": 69528220
       },
       "compaction": {
               "level": 2,
               "sources": [
                       "01CPYRY9MS465Y5ETM3SXFBV7X",
                       "01CPYZT0WRJ1JB1P0DP80VY5KJ",
                       "01CPZ6NR4Q3PDP3E57HEH760XS"
               ],
               "parents": [
                       {
                               "ulid": "01CPYRY9MS465Y5ETM3SXFBV7X",
                               "minTime": 1536472800000,
                               "maxTime": 1536480000000
                       },
                       {
                               "ulid": "01CPYZT0WRJ1JB1P0DP80VY5KJ",
                               "minTime": 1536480000000,
                               "maxTime": 1536487200000
                       },
                       {
                               "ulid": "01CPZ6NR4Q3PDP3E57HEH760XS",
                               "minTime": 1536487200000,
                               "maxTime": 1536494400000
                       }
               ]
       },
       "version": 1
}

I-Compactions kwi-Prometheus ibotshelelwe kwixesha ibhloko yentloko igungxulwa kwidiski. Ngeli xesha, imisebenzi emininzi enjalo inokwenziwa.

Uhlalutyo lwe-TSDB kwi-Prometheus 2

Kubonakala ngathi ukuxinwa akukhawulelwanga nangayiphi na indlela kwaye kunokubangela i-disk enkulu ye-I / O spikes ngexesha lokuphunyezwa.

Uhlalutyo lwe-TSDB kwi-Prometheus 2

CPU umthwalo spikes

Uhlalutyo lwe-TSDB kwi-Prometheus 2

Ngokuqinisekileyo, oku kunempembelelo embi kakhulu kwisantya senkqubo, kwaye iphinda ibangele umngeni omkhulu wokugcinwa kwe-LSM: ukwenza njani ukuxinzelela ukuxhasa amaxabiso aphezulu ezicelo ngaphandle kokubangela ukugqithiswa okukhulu?
Ukusetyenziswa kwememori kwinkqubo yokudibanisa nako kubonakala kunomdla kakhulu.

Uhlalutyo lwe-TSDB kwi-Prometheus 2

Sinokubona ukuba, emva kokudibanisa, ininzi yememori itshintsha imeko ukusuka kwi-Cached ukuya kwi-Free: oku kuthetha ukuba ulwazi olunokuba luxabisekileyo lususiwe apho. Ndinomdla ukuba isetyenziswe apha fadvice() okanye enye indlela yokunciphisa, okanye kungenxa yokuba i-cache ikhululwe kwiibhloko ezitshatyalalisiwe ngexesha lokuxinana?

Ukuchacha emva kokusilela

Ukuchacha kwiintsilelo kuthatha ixesha, kwaye ngesizathu esihle. Kumsinga ongenayo wesigidi seerekhodi ngomzuzwana, kwafuneka ndilinde malunga nemizuzu engama-25 ngelixa ukubuyisela kwenziwa kuthathelwa ingqalelo i-SSD drive.

level=info ts=2018-09-13T13:38:14.09650965Z caller=main.go:222 msg="Starting Prometheus" version="(version=2.3.2, branch=v2.3.2, revision=71af5e29e815795e9dd14742ee7725682fa14b7b)"
level=info ts=2018-09-13T13:38:14.096599879Z caller=main.go:223 build_context="(go=go1.10.1, user=Jenkins, date=20180725-08:58:13OURCE)"
level=info ts=2018-09-13T13:38:14.096624109Z caller=main.go:224 host_details="(Linux 4.15.0-32-generic #35-Ubuntu SMP Fri Aug 10 17:58:07 UTC 2018 x86_64 1bee9e9b78cf (none))"
level=info ts=2018-09-13T13:38:14.096641396Z caller=main.go:225 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2018-09-13T13:38:14.097715256Z caller=web.go:415 component=web msg="Start listening for connections" address=:9090
level=info ts=2018-09-13T13:38:14.097400393Z caller=main.go:533 msg="Starting TSDB ..."
level=info ts=2018-09-13T13:38:14.098718401Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536530400000 maxt=1536537600000 ulid=01CQ0FW3ME8Q5W2AN5F9CB7R0R
level=info ts=2018-09-13T13:38:14.100315658Z caller=web.go:467 component=web msg="router prefix" prefix=/prometheus
level=info ts=2018-09-13T13:38:14.101793727Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536732000000 maxt=1536753600000 ulid=01CQ78486TNX5QZTBF049PQHSM
level=info ts=2018-09-13T13:38:14.102267346Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536537600000 maxt=1536732000000 ulid=01CQ78DE7HSQK0C0F5AZ46YGF0
level=info ts=2018-09-13T13:38:14.102660295Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536775200000 maxt=1536782400000 ulid=01CQ7SAT4RM21Y0PT5GNSS146Q
level=info ts=2018-09-13T13:38:14.103075885Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536753600000 maxt=1536775200000 ulid=01CQ7SV8WJ3C2W5S3RTAHC2GHB
level=error ts=2018-09-13T14:05:18.208469169Z caller=wal.go:275 component=tsdb msg="WAL corruption detected; truncating" err="unexpected CRC32 checksum d0465484, want 0" file=/opt/prometheus/data/.prom2-data/wal/007357 pos=15504363
level=info ts=2018-09-13T14:05:19.471459777Z caller=main.go:543 msg="TSDB started"
level=info ts=2018-09-13T14:05:19.471604598Z caller=main.go:603 msg="Loading configuration file" filename=/etc/prometheus.yml
level=info ts=2018-09-13T14:05:19.499156711Z caller=main.go:629 msg="Completed loading of configuration file" filename=/etc/prometheus.yml
level=info ts=2018-09-13T14:05:19.499228186Z caller=main.go:502 msg="Server is ready to receive web requests."

Ingxaki ephambili yenkqubo yokubuyisela kukusetyenziswa kwememori ephezulu. Ngaphandle kwento yokuba kwimeko eqhelekileyo umncedisi unokusebenza ngokuzinzileyo kunye nomthamo ofanayo wememori, ukuba iyantlitheka ayinakuphila ngenxa ye-OOM. Ekuphela kwesisombululo endisifumeneyo kukukhubaza uqokelelo lwedatha, ndizise umncedisi, makayibuyisele kwaye iqalise ngokuqokelela okuvuliweyo.

Ukushushubeza

Enye indlela yokuziphatha okufuneka uyigcine engqondweni ngexesha lokufudumala kubudlelwane phakathi kokusebenza okuphantsi kunye nokusetyenziswa okuphezulu kobutyebi emva kokuqala. Ngexesha elithile, kodwa ayizizo zonke eziqalayo, ndabona umthwalo onzima kwi-CPU kunye nememori.

Uhlalutyo lwe-TSDB kwi-Prometheus 2

Uhlalutyo lwe-TSDB kwi-Prometheus 2

Izikhewu ekusetyenzisweni kwememori zibonisa ukuba i-Prometheus ayikwazi ukuqwalasela zonke iiqoqo ukusuka ekuqaleni, kwaye ulwazi oluthile lulahlekile.
Andikhange ndicinge izizathu ezichanekileyo ze-CPU ephezulu kunye nomthwalo wememori. Ndiyakrokrela ukuba oku kungenxa yokudalwa kwexesha elitsha kwibhloko yentloko kunye ne-frequency ephezulu.

Ukunyuka komthwalo we-CPU

Ukongeza kwi-compactions, eyenza umthwalo we-I / O ophezulu ngokufanelekileyo, ndiqaphele ii-spikes ezinzulu kumthwalo we-CPU rhoqo kwimizuzu emibini. Ukuqhuma ixesha elide xa igalelo lokungena liphezulu kwaye libonakala libangelwa ngumqokeleli wenkunkuma we-Go, kunye nokona ama-cores athile alayishwe ngokupheleleyo.

Uhlalutyo lwe-TSDB kwi-Prometheus 2

Uhlalutyo lwe-TSDB kwi-Prometheus 2

Le mitsiba ayibalulekanga kangako. Kubonakala ukuba xa oku kusenzeka, indawo yokungena yangaphakathi ka-Prometheus kunye neemetrics azifumaneki, okubangela izithuba zedatha ngexesha elifanayo lexesha.

Uhlalutyo lwe-TSDB kwi-Prometheus 2

Unokuqaphela kwakhona ukuba umthengisi we-Prometheus uvala umzuzwana omnye.

Uhlalutyo lwe-TSDB kwi-Prometheus 2

Sinokuqaphela ukuhambelana nokuqokelelwa kwenkunkuma (GC).

Uhlalutyo lwe-TSDB kwi-Prometheus 2

isiphelo

I-TSDB kwi-Prometheus 2 iyakhawuleza, iyakwazi ukuphatha izigidi zexesha kwaye ngaxeshanye amawaka eerekhodi ngesekhondi usebenzisa i-hardware efanelekileyo. Ukusetyenziswa kwe-CPU kunye nediski ye-I/O nayo iyamangalisa. Umzekelo wam ubonise ukuya kuthi ga kwi-200 yeemethrikhi ngesekhondi nganye ngondoqo osetyenzisiweyo.

Ukucwangcisa ukwandiswa, kufuneka ukhumbule malunga nezixa ezaneleyo zememori, kwaye oku kufuneka kube yinkumbulo yokwenyani. Ubungakanani bememori esetyenzisiweyo endiyibonileyo malunga ne-5 GB kwiirekhodi ze-100 ngesekhondi yomlambo ongenayo, kunye ne-cache yenkqubo yokusebenza yanika malunga ne-000 GB yememori ehlalayo.

Ewe kunjalo, usemninzi umsebenzi ekufuneka wenziwe ukulungisa i-CPU kunye ne-disk I/O spikes, kwaye oko akumangalisi xa ucinga ukuba iTSDB Prometheus 2 encinci ithelekiswa njani ne-InnoDB, iTokuDB, iRocksDB, iWiredTiger, kodwa zonke zazifana. iingxaki ekuqaleni komjikelo wabo wobomi.

umthombo: www.habr.com

Yongeza izimvo