Nyocha TSDB na Prometheus 2

Nyocha TSDB na Prometheus 2

Usoro nchekwa data usoro oge (TSDB) na Prometheus 2 bụ ezigbo ihe atụ nke ngwọta injinia nke na-enye nkwalite dị ukwuu na nchekwa v2 na Prometheus 1 n'ihe gbasara ọsọ mkpokọta data, mkpochapụ ajụjụ, na arụmọrụ akụrụngwa. Anyị na-emejuputa Prometheus 2 na Percona Monitoring and Management (PMM) na enwere m ohere ịghọta arụmọrụ nke Prometheus 2 TSDB. N'isiokwu a, m ga-ekwu maka nsonaazụ nke ihe ndị a.

Nkezi Ibu Ọrụ Prometheus

Maka ndị na-emekọ ihe na ọdụ data ebumnuche izugbe, ọrụ Prometheus na-ahụkarị bụ ihe na-atọ ụtọ. Ọnụego nchịkọta data na-achọsi ike: na-abụkarị ọrụ ndị ị na-enyocha na-eziga ihe dịka otu ọnụọgụ metrik, akụrụngwa na-agbanwe nwayọ nwayọ.
Arịrịọ maka ozi nwere ike isi n'ebe dị iche iche pụta. Ụfọdụ n'ime ha, dị ka ọkwa ọkwa, na-agbakwa mbọ maka uru kwụsiri ike na nke a pụrụ ịkọ. Ndị ọzọ, dị ka arịrịọ onye ọrụ, nwere ike ịkpata mgbawa, n'agbanyeghị na nke a abụghị ikpe maka ọtụtụ ibu ọrụ.

Nnwale ibu

N'oge ule, m lekwasịrị anya n'ikike ịchịkọta data. Ebugara m Prometheus 2.3.2 jikọtara ya na Go 1.10.1 (dịka akụkụ nke PMM 1.14) na ọrụ Linode site na iji edemede a: StackScript. N'ihi na kasị ezi ibu ọgbọ, na-eji nke a StackScript Ewepụtara m ọtụtụ ọnụ MySQL nwere ezigbo ibu (Sysbench TPC-C Test), nke ọ bụla ṅomiri 10 Linux/MySQL nodes.
Emere ule niile ndị a na ihe nkesa Linode nwere cores asatọ na ebe nchekwa 32 GB, na-agba ọsọ simulations 20 na-elele narị abụọ MySQL ikpe. Ma ọ bụ, na okwu Prometheus, 800 lekwasịrị anya, 440 scrapes kwa nkeji, 380 puku ndekọ kwa nke abụọ, na 1,7 nde na-arụ ọrụ oge usoro.

Design

Ụzọ a na-emekarị nke ọdụ data ọdịnala, gụnyere nke Prometheus 1.x na-eji, bụ oke ebe nchekwa. Ọ bụrụ na ezughị iji jikwaa ibu ahụ, ị ​​ga-enweta nnukwu latencies na ụfọdụ arịrịọ ga-ada. A na-ahazi ojiji ebe nchekwa na Prometheus 2 site na igodo storage.tsdb.min-block-duration, nke na-ekpebi ogologo oge ndekọ a ga-edobe na ebe nchekwa tupu ịkwanye na diski (ndabere bụ awa 2). Ọnụ ọgụgụ nke ebe nchekwa achọrọ ga-adabere na ọnụ ọgụgụ nke usoro oge, akara, na scrapes agbakwunyere na ntanetị na-abata. N'ihe gbasara ohere diski, Prometheus chọrọ iji 3 bytes kwa ndekọ (sample). N'aka nke ọzọ, ihe nchekwa chọrọ dị elu karịa.

Ọ bụ ezie na ọ ga-ekwe omume ịhazi nha ngọngọ, a naghị atụ aro ka ịhazi ya na aka ya, n'ihi ya, a na-amanye gị inye Prometheus dị ka ebe nchekwa dị ka ọ chọrọ maka ọrụ gị.
Ọ bụrụ na enweghị ebe nchekwa zuru oke iji kwado iyi na-abata nke metrics, Prometheus ga-ada na ebe nchekwa ma ọ bụ onye na-egbu OOM ga-enweta ya.
Ịgbakwunye swap iji gbuo ihe mberede ahụ mgbe Prometheus na-agwụ na ebe nchekwa anaghị enyere aka n'ezie, n'ihi na iji ọrụ a na-akpata ihe mgbawa ebe nchekwa. Echere m na ọ bụ ihe metụtara Go, onye na-achịkọta ihe mkpofu ya na otu o si emeso mgbanwe.
Ụzọ ọzọ na-adọrọ mmasị bụ ịhazi ihe mgbochi isi ka a na-agbanye na diski n'oge ụfọdụ, kama ịgụ ya site na mmalite nke usoro ahụ.

Nyocha TSDB na Prometheus 2

Dị ka ị na-ahụ site na eserese ahụ, ịgbanye na diski na-eme kwa awa abụọ. Ọ bụrụ n'ịgbanwe nkeji nkeji-block-nor ka otu elekere, mgbe ahụ nrụpụta ndị a ga-eme kwa elekere, malite mgbe ọkara elekere gachara.
Ọ bụrụ na ịchọrọ iji nke a na eserese ndị ọzọ na ntinye Prometheus gị, ị nwere ike iji nke a dashboard. Emebere ya maka PMM mana, nwere obere mgbanwe, dabara na nrụnye Prometheus ọ bụla.
Anyị nwere ngọngọ na-arụ ọrụ nke a na-akpọ ngọngọ isi nke echekwara na ebe nchekwa; blocks nwere data ochie dị site na mmap(). Nke a na-ewepụ mkpa ịhazi cache iche iche, mana ọ pụtakwara na ị ga-ahapụ ohere zuru ezu maka cache sistemụ arụmọrụ ma ọ bụrụ na ịchọrọ ịjụ ajụjụ data tọrọ karịa ihe ngọngọ isi nwere ike ịnabata.
Nke a pụtakwara na Prometheus mebere ebe nchekwa oriri ga-adị oke elu, nke na-abụghị ihe na-echegbu onwe ya.

Nyocha TSDB na Prometheus 2

Ihe ọzọ na-adọrọ mmasị imewe bụ iji WAL (dee n'ihu log). Dịka ị na-ahụ site na akwụkwọ nchekwa ahụ, Prometheus na-eji WAL iji zere mkpọka. Usoro a kapịrị ọnụ maka ikwado nlanarị data bụ, ọ dị nwute, edebeghị nke ọma. Ụdị Prometheus 2.3.2 na-atụgharị WAL na diski kwa sekọnd iri ọ bụla na nhọrọ a abụghị nhazi onye ọrụ.

Mmekọrịta

Emebere Prometheus TSDB dị ka ụlọ ahịa LSM (Log Structured Merge): a na-atụgharị isi ihe mgbochi oge na diski, ebe usoro nkwekọrịta na-ejikọta ọtụtụ ngọngọ ọnụ iji zere inyocha ọtụtụ ngọngọ n'oge ajụjụ. N'ebe a, ị nwere ike ịhụ ọnụ ọgụgụ nke ngọngọ nke m hụrụ na usoro ule mgbe ụbọchị ibu gasịrị.

Nyocha TSDB na Prometheus 2

Ọ bụrụ na ịchọrọ ịmatakwu gbasara ụlọ ahịa ahụ, ị ​​​​nwere ike nyochaa faịlụ meta.json, nke nwere ozi gbasara ngọngọ dị na otu ha siri dị.

{
       "ulid": "01CPZDPD1D9R019JS87TPV5MPE",
       "minTime": 1536472800000,
       "maxTime": 1536494400000,
       "stats": {
               "numSamples": 8292128378,
               "numSeries": 1673622,
               "numChunks": 69528220
       },
       "compaction": {
               "level": 2,
               "sources": [
                       "01CPYRY9MS465Y5ETM3SXFBV7X",
                       "01CPYZT0WRJ1JB1P0DP80VY5KJ",
                       "01CPZ6NR4Q3PDP3E57HEH760XS"
               ],
               "parents": [
                       {
                               "ulid": "01CPYRY9MS465Y5ETM3SXFBV7X",
                               "minTime": 1536472800000,
                               "maxTime": 1536480000000
                       },
                       {
                               "ulid": "01CPYZT0WRJ1JB1P0DP80VY5KJ",
                               "minTime": 1536480000000,
                               "maxTime": 1536487200000
                       },
                       {
                               "ulid": "01CPZ6NR4Q3PDP3E57HEH760XS",
                               "minTime": 1536487200000,
                               "maxTime": 1536494400000
                       }
               ]
       },
       "version": 1
}

A na-ejikọta nkwekọrịta na Prometheus na oge isi ihe na-agbanye na diski. N'ebe a, enwere ike ịrụ ọtụtụ ọrụ dị otú ahụ.

Nyocha TSDB na Prometheus 2

Ọ na-egosi na nchịkọta anaghị ejedebe n'ụzọ ọ bụla ma nwee ike ime ka nnukwu diski I / O spikes n'oge egbu.

Nyocha TSDB na Prometheus 2

CPU ibu spikes

Nyocha TSDB na Prometheus 2

N'ezie, nke a nwere mmetụta na-adịghị mma na ọsọ nke usoro ahụ, ma na-ebutekwa ihe ịma aka siri ike maka nchekwa LSM: otu esi eme nkwekọrịta iji kwado ọnụego arịrịọ dị elu n'emeghị ka ọ dị elu?
Ojiji nke ebe nchekwa na usoro nchịkọta na-elekwa anya nke ọma.

Nyocha TSDB na Prometheus 2

Anyị nwere ike ịhụ ka, mgbe mkpịchara, ọtụtụ ebe nchekwa na-agbanwe ọnọdụ site na Cached gaa na efu: nke a pụtara na ewepụrụ ozi nwere ike bara uru n'ebe ahụ. Ịchọ ịmata ma ọ bụrụ na ejiri ya ebe a fadvice() ma ọ bụ usoro mbelata ọzọ, ka ọ bụ n'ihi na ewepụtara cache ahụ na ngọngọ mebiri emebi n'oge mkpirisi?

Iweghachite mgbe ọdịda gasịrị

Iweghachite site na ọdịda na-ewe oge, na maka ezi ihe kpatara ya. Maka iyi na-abata nke nde ndekọ kwa sekọnd, aghaghị m ichere ihe dị ka nkeji 25 ka a na-eme mgbake ahụ na-eburu n'uche draịva SSD.

level=info ts=2018-09-13T13:38:14.09650965Z caller=main.go:222 msg="Starting Prometheus" version="(version=2.3.2, branch=v2.3.2, revision=71af5e29e815795e9dd14742ee7725682fa14b7b)"
level=info ts=2018-09-13T13:38:14.096599879Z caller=main.go:223 build_context="(go=go1.10.1, user=Jenkins, date=20180725-08:58:13OURCE)"
level=info ts=2018-09-13T13:38:14.096624109Z caller=main.go:224 host_details="(Linux 4.15.0-32-generic #35-Ubuntu SMP Fri Aug 10 17:58:07 UTC 2018 x86_64 1bee9e9b78cf (none))"
level=info ts=2018-09-13T13:38:14.096641396Z caller=main.go:225 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2018-09-13T13:38:14.097715256Z caller=web.go:415 component=web msg="Start listening for connections" address=:9090
level=info ts=2018-09-13T13:38:14.097400393Z caller=main.go:533 msg="Starting TSDB ..."
level=info ts=2018-09-13T13:38:14.098718401Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536530400000 maxt=1536537600000 ulid=01CQ0FW3ME8Q5W2AN5F9CB7R0R
level=info ts=2018-09-13T13:38:14.100315658Z caller=web.go:467 component=web msg="router prefix" prefix=/prometheus
level=info ts=2018-09-13T13:38:14.101793727Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536732000000 maxt=1536753600000 ulid=01CQ78486TNX5QZTBF049PQHSM
level=info ts=2018-09-13T13:38:14.102267346Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536537600000 maxt=1536732000000 ulid=01CQ78DE7HSQK0C0F5AZ46YGF0
level=info ts=2018-09-13T13:38:14.102660295Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536775200000 maxt=1536782400000 ulid=01CQ7SAT4RM21Y0PT5GNSS146Q
level=info ts=2018-09-13T13:38:14.103075885Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536753600000 maxt=1536775200000 ulid=01CQ7SV8WJ3C2W5S3RTAHC2GHB
level=error ts=2018-09-13T14:05:18.208469169Z caller=wal.go:275 component=tsdb msg="WAL corruption detected; truncating" err="unexpected CRC32 checksum d0465484, want 0" file=/opt/prometheus/data/.prom2-data/wal/007357 pos=15504363
level=info ts=2018-09-13T14:05:19.471459777Z caller=main.go:543 msg="TSDB started"
level=info ts=2018-09-13T14:05:19.471604598Z caller=main.go:603 msg="Loading configuration file" filename=/etc/prometheus.yml
level=info ts=2018-09-13T14:05:19.499156711Z caller=main.go:629 msg="Completed loading of configuration file" filename=/etc/prometheus.yml
level=info ts=2018-09-13T14:05:19.499228186Z caller=main.go:502 msg="Server is ready to receive web requests."

Isi nsogbu nke usoro mgbake bụ nnukwu nchekwa nchekwa. N'agbanyeghị eziokwu na na a nkịtị ọnọdụ ihe nkesa nwere ike na-arụ ọrụ kwụsie ike na otu ego nke ebe nchekwa, ma ọ bụrụ na ọ na-akụda, ọ nwere ike ọ gaghị agbake n'ihi OOM. Naanị ihe ngwọta m chọtara bụ iji gbanyụọ nchịkọta data, bulite ihe nkesa ahụ, hapụ ya ka ọ gbakee ma malitegharịa site na nnakọta.

Na-ekpo ọkụ

Omume ọzọ ị ga-eburu n'uche n'oge okpomọkụ bụ mmekọrịta dị n'etiti arụmọrụ dị ala na nnukwu ihe oriri na-eri nri ozugbo mmalite. N'oge ụfọdụ, ma ọ bụghị ihe niile na-amalite, ahụrụ m nnukwu ibu na CPU na ebe nchekwa.

Nyocha TSDB na Prometheus 2

Nyocha TSDB na Prometheus 2

Oghere dị na iji ebe nchekwa na-egosi na Prometheus enweghị ike ịhazi mkpokọta niile site na mmalite, ụfọdụ ozi na-efunarị.
Achọpụtabeghị m kpọmkwem ihe kpatara CPU dị elu na ibu ebe nchekwa. M na-eche na nke a bụ n'ihi ịmepụta usoro oge ọhụrụ na ngọngọ isi na nnukwu ugboro.

Ibu CPU na-arị elu

Na mgbakwunye na compacts, nke na-emepụta ibu I / O dị elu, achọpụtara m nnukwu spikes na ibu CPU kwa nkeji abụọ. Ihe mgbawa na-adị ogologo mgbe ntinye ntinye dị elu ma yie ka ọ bụ onye nchịkọta ihe mkpofu Go kpatara ya, ma ọ dịkarịa ala ụfọdụ cores na-ejuju.

Nyocha TSDB na Prometheus 2

Nyocha TSDB na Prometheus 2

Mwụli elu ndị a abụghị obere ihe. Ọ na-egosi na mgbe ihe ndị a mere, ntinye ntinye na metrics nke Prometheus adịghị adị, na-akpata ọdịiche data n'otu oge ndị a.

Nyocha TSDB na Prometheus 2

Ị nwekwara ike chọpụta na onye na-ebupụ Prometheus na-emechi maka otu sekọnd.

Nyocha TSDB na Prometheus 2

Anyị nwere ike ịhụ njikọ na mkpofu mkpofu (GC).

Nyocha TSDB na Prometheus 2

nkwubi

TSDB na Prometheus 2 dị ngwa ngwa, nwee ike ijikwa ọtụtụ nde usoro oge yana n'otu oge ahụ ọtụtụ puku ndekọ kwa sekọnd site na iji ngwaike dị mma. Iji CPU na diski I/O na-atọkwa ụtọ. Ihe atụ m gosiri ihe ruru 200 metrics kwa sekọnd kwa isi eji.

Iji mee atụmatụ mgbasawanye, ịkwesịrị icheta banyere oke nchekwa zuru oke, nke a ga-abụrịrị ezigbo ebe nchekwa. Ọnụ ọgụgụ ebe nchekwa m hụrụ bụ ihe dị ka 5 GB kwa 100 ndekọ kwa sekọnd nke iyi na-abata, bụ nke yana cache sistemụ arụ ọrụ nyere ihe dị ka 000 GB nke ebe nchekwa.

N'ezie, a ka nwere ọtụtụ ọrụ a ga-arụ iji zụọ CPU na disk I / O spikes, na nke a abụghị ihe ijuanya na-atụle otú TSDB Prometheus 2 na-eto eto si tụnyere InnoDB, TokuDB, RocksDB, WiredTiger, ma ha niile nwere otu ihe ahụ. nsogbu n'oge usoro ndụ ha.

isi: www.habr.com

Tinye a comment