Usoro nchekwa data usoro oge (TSDB) na Prometheus 2 bụ ezigbo ihe atụ nke ngwọta injinia nke na-enye nkwalite dị ukwuu na nchekwa v2 na Prometheus 1 n'ihe gbasara ọsọ mkpokọta data, mkpochapụ ajụjụ, na arụmọrụ akụrụngwa. Anyị na-emejuputa Prometheus 2 na Percona Monitoring and Management (PMM) na enwere m ohere ịghọta arụmọrụ nke Prometheus 2 TSDB. N'isiokwu a, m ga-ekwu maka nsonaazụ nke ihe ndị a.
Nkezi Ibu Ọrụ Prometheus
Maka ndị na-emekọ ihe na ọdụ data ebumnuche izugbe, ọrụ Prometheus na-ahụkarị bụ ihe na-atọ ụtọ. Ọnụego nchịkọta data na-achọsi ike: na-abụkarị ọrụ ndị ị na-enyocha na-eziga ihe dịka otu ọnụọgụ metrik, akụrụngwa na-agbanwe nwayọ nwayọ.
Arịrịọ maka ozi nwere ike isi n'ebe dị iche iche pụta. Ụfọdụ n'ime ha, dị ka ọkwa ọkwa, na-agbakwa mbọ maka uru kwụsiri ike na nke a pụrụ ịkọ. Ndị ọzọ, dị ka arịrịọ onye ọrụ, nwere ike ịkpata mgbawa, n'agbanyeghị na nke a abụghị ikpe maka ọtụtụ ibu ọrụ.
Nnwale ibu
N'oge ule, m lekwasịrị anya n'ikike ịchịkọta data. Ebugara m Prometheus 2.3.2 jikọtara ya na Go 1.10.1 (dịka akụkụ nke PMM 1.14) na ọrụ Linode site na iji edemede a:
Emere ule niile ndị a na ihe nkesa Linode nwere cores asatọ na ebe nchekwa 32 GB, na-agba ọsọ simulations 20 na-elele narị abụọ MySQL ikpe. Ma ọ bụ, na okwu Prometheus, 800 lekwasịrị anya, 440 scrapes kwa nkeji, 380 puku ndekọ kwa nke abụọ, na 1,7 nde na-arụ ọrụ oge usoro.
Design
Ụzọ a na-emekarị nke ọdụ data ọdịnala, gụnyere nke Prometheus 1.x na-eji, bụ storage.tsdb.min-block-duration
, nke na-ekpebi ogologo oge ndekọ a ga-edobe na ebe nchekwa tupu ịkwanye na diski (ndabere bụ awa 2). Ọnụ ọgụgụ nke ebe nchekwa achọrọ ga-adabere na ọnụ ọgụgụ nke usoro oge, akara, na scrapes agbakwunyere na ntanetị na-abata. N'ihe gbasara ohere diski, Prometheus chọrọ iji 3 bytes kwa ndekọ (sample). N'aka nke ọzọ, ihe nchekwa chọrọ dị elu karịa.
Ọ bụ ezie na ọ ga-ekwe omume ịhazi nha ngọngọ, a naghị atụ aro ka ịhazi ya na aka ya, n'ihi ya, a na-amanye gị inye Prometheus dị ka ebe nchekwa dị ka ọ chọrọ maka ọrụ gị.
Ọ bụrụ na enweghị ebe nchekwa zuru oke iji kwado iyi na-abata nke metrics, Prometheus ga-ada na ebe nchekwa ma ọ bụ onye na-egbu OOM ga-enweta ya.
Ịgbakwunye swap iji gbuo ihe mberede ahụ mgbe Prometheus na-agwụ na ebe nchekwa anaghị enyere aka n'ezie, n'ihi na iji ọrụ a na-akpata ihe mgbawa ebe nchekwa. Echere m na ọ bụ ihe metụtara Go, onye na-achịkọta ihe mkpofu ya na otu o si emeso mgbanwe.
Ụzọ ọzọ na-adọrọ mmasị bụ ịhazi ihe mgbochi isi ka a na-agbanye na diski n'oge ụfọdụ, kama ịgụ ya site na mmalite nke usoro ahụ.
Dị ka ị na-ahụ site na eserese ahụ, ịgbanye na diski na-eme kwa awa abụọ. Ọ bụrụ n'ịgbanwe nkeji nkeji-block-nor ka otu elekere, mgbe ahụ nrụpụta ndị a ga-eme kwa elekere, malite mgbe ọkara elekere gachara.
Ọ bụrụ na ịchọrọ iji nke a na eserese ndị ọzọ na ntinye Prometheus gị, ị nwere ike iji nke a
Anyị nwere ngọngọ na-arụ ọrụ nke a na-akpọ ngọngọ isi nke echekwara na ebe nchekwa; blocks nwere data ochie dị site na mmap()
. Nke a na-ewepụ mkpa ịhazi cache iche iche, mana ọ pụtakwara na ị ga-ahapụ ohere zuru ezu maka cache sistemụ arụmọrụ ma ọ bụrụ na ịchọrọ ịjụ ajụjụ data tọrọ karịa ihe ngọngọ isi nwere ike ịnabata.
Nke a pụtakwara na Prometheus mebere ebe nchekwa oriri ga-adị oke elu, nke na-abụghị ihe na-echegbu onwe ya.
Ihe ọzọ na-adọrọ mmasị imewe bụ iji WAL (dee n'ihu log). Dịka ị na-ahụ site na akwụkwọ nchekwa ahụ, Prometheus na-eji WAL iji zere mkpọka. Usoro a kapịrị ọnụ maka ikwado nlanarị data bụ, ọ dị nwute, edebeghị nke ọma. Ụdị Prometheus 2.3.2 na-atụgharị WAL na diski kwa sekọnd iri ọ bụla na nhọrọ a abụghị nhazi onye ọrụ.
Mmekọrịta
Emebere Prometheus TSDB dị ka ụlọ ahịa LSM (Log Structured Merge): a na-atụgharị isi ihe mgbochi oge na diski, ebe usoro nkwekọrịta na-ejikọta ọtụtụ ngọngọ ọnụ iji zere inyocha ọtụtụ ngọngọ n'oge ajụjụ. N'ebe a, ị nwere ike ịhụ ọnụ ọgụgụ nke ngọngọ nke m hụrụ na usoro ule mgbe ụbọchị ibu gasịrị.
Ọ bụrụ na ịchọrọ ịmatakwu gbasara ụlọ ahịa ahụ, ị nwere ike nyochaa faịlụ meta.json, nke nwere ozi gbasara ngọngọ dị na otu ha siri dị.
{
"ulid": "01CPZDPD1D9R019JS87TPV5MPE",
"minTime": 1536472800000,
"maxTime": 1536494400000,
"stats": {
"numSamples": 8292128378,
"numSeries": 1673622,
"numChunks": 69528220
},
"compaction": {
"level": 2,
"sources": [
"01CPYRY9MS465Y5ETM3SXFBV7X",
"01CPYZT0WRJ1JB1P0DP80VY5KJ",
"01CPZ6NR4Q3PDP3E57HEH760XS"
],
"parents": [
{
"ulid": "01CPYRY9MS465Y5ETM3SXFBV7X",
"minTime": 1536472800000,
"maxTime": 1536480000000
},
{
"ulid": "01CPYZT0WRJ1JB1P0DP80VY5KJ",
"minTime": 1536480000000,
"maxTime": 1536487200000
},
{
"ulid": "01CPZ6NR4Q3PDP3E57HEH760XS",
"minTime": 1536487200000,
"maxTime": 1536494400000
}
]
},
"version": 1
}
A na-ejikọta nkwekọrịta na Prometheus na oge isi ihe na-agbanye na diski. N'ebe a, enwere ike ịrụ ọtụtụ ọrụ dị otú ahụ.
Ọ na-egosi na nchịkọta anaghị ejedebe n'ụzọ ọ bụla ma nwee ike ime ka nnukwu diski I / O spikes n'oge egbu.
CPU ibu spikes
N'ezie, nke a nwere mmetụta na-adịghị mma na ọsọ nke usoro ahụ, ma na-ebutekwa ihe ịma aka siri ike maka nchekwa LSM: otu esi eme nkwekọrịta iji kwado ọnụego arịrịọ dị elu n'emeghị ka ọ dị elu?
Ojiji nke ebe nchekwa na usoro nchịkọta na-elekwa anya nke ọma.
Anyị nwere ike ịhụ ka, mgbe mkpịchara, ọtụtụ ebe nchekwa na-agbanwe ọnọdụ site na Cached gaa na efu: nke a pụtara na ewepụrụ ozi nwere ike bara uru n'ebe ahụ. Ịchọ ịmata ma ọ bụrụ na ejiri ya ebe a fadvice()
ma ọ bụ usoro mbelata ọzọ, ka ọ bụ n'ihi na ewepụtara cache ahụ na ngọngọ mebiri emebi n'oge mkpirisi?
Iweghachite mgbe ọdịda gasịrị
Iweghachite site na ọdịda na-ewe oge, na maka ezi ihe kpatara ya. Maka iyi na-abata nke nde ndekọ kwa sekọnd, aghaghị m ichere ihe dị ka nkeji 25 ka a na-eme mgbake ahụ na-eburu n'uche draịva SSD.
level=info ts=2018-09-13T13:38:14.09650965Z caller=main.go:222 msg="Starting Prometheus" version="(version=2.3.2, branch=v2.3.2, revision=71af5e29e815795e9dd14742ee7725682fa14b7b)"
level=info ts=2018-09-13T13:38:14.096599879Z caller=main.go:223 build_context="(go=go1.10.1, user=Jenkins, date=20180725-08:58:13OURCE)"
level=info ts=2018-09-13T13:38:14.096624109Z caller=main.go:224 host_details="(Linux 4.15.0-32-generic #35-Ubuntu SMP Fri Aug 10 17:58:07 UTC 2018 x86_64 1bee9e9b78cf (none))"
level=info ts=2018-09-13T13:38:14.096641396Z caller=main.go:225 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2018-09-13T13:38:14.097715256Z caller=web.go:415 component=web msg="Start listening for connections" address=:9090
level=info ts=2018-09-13T13:38:14.097400393Z caller=main.go:533 msg="Starting TSDB ..."
level=info ts=2018-09-13T13:38:14.098718401Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536530400000 maxt=1536537600000 ulid=01CQ0FW3ME8Q5W2AN5F9CB7R0R
level=info ts=2018-09-13T13:38:14.100315658Z caller=web.go:467 component=web msg="router prefix" prefix=/prometheus
level=info ts=2018-09-13T13:38:14.101793727Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536732000000 maxt=1536753600000 ulid=01CQ78486TNX5QZTBF049PQHSM
level=info ts=2018-09-13T13:38:14.102267346Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536537600000 maxt=1536732000000 ulid=01CQ78DE7HSQK0C0F5AZ46YGF0
level=info ts=2018-09-13T13:38:14.102660295Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536775200000 maxt=1536782400000 ulid=01CQ7SAT4RM21Y0PT5GNSS146Q
level=info ts=2018-09-13T13:38:14.103075885Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536753600000 maxt=1536775200000 ulid=01CQ7SV8WJ3C2W5S3RTAHC2GHB
level=error ts=2018-09-13T14:05:18.208469169Z caller=wal.go:275 component=tsdb msg="WAL corruption detected; truncating" err="unexpected CRC32 checksum d0465484, want 0" file=/opt/prometheus/data/.prom2-data/wal/007357 pos=15504363
level=info ts=2018-09-13T14:05:19.471459777Z caller=main.go:543 msg="TSDB started"
level=info ts=2018-09-13T14:05:19.471604598Z caller=main.go:603 msg="Loading configuration file" filename=/etc/prometheus.yml
level=info ts=2018-09-13T14:05:19.499156711Z caller=main.go:629 msg="Completed loading of configuration file" filename=/etc/prometheus.yml
level=info ts=2018-09-13T14:05:19.499228186Z caller=main.go:502 msg="Server is ready to receive web requests."
Isi nsogbu nke usoro mgbake bụ nnukwu nchekwa nchekwa. N'agbanyeghị eziokwu na na a nkịtị ọnọdụ ihe nkesa nwere ike na-arụ ọrụ kwụsie ike na otu ego nke ebe nchekwa, ma ọ bụrụ na ọ na-akụda, ọ nwere ike ọ gaghị agbake n'ihi OOM. Naanị ihe ngwọta m chọtara bụ iji gbanyụọ nchịkọta data, bulite ihe nkesa ahụ, hapụ ya ka ọ gbakee ma malitegharịa site na nnakọta.
Na-ekpo ọkụ
Omume ọzọ ị ga-eburu n'uche n'oge okpomọkụ bụ mmekọrịta dị n'etiti arụmọrụ dị ala na nnukwu ihe oriri na-eri nri ozugbo mmalite. N'oge ụfọdụ, ma ọ bụghị ihe niile na-amalite, ahụrụ m nnukwu ibu na CPU na ebe nchekwa.
Oghere dị na iji ebe nchekwa na-egosi na Prometheus enweghị ike ịhazi mkpokọta niile site na mmalite, ụfọdụ ozi na-efunarị.
Achọpụtabeghị m kpọmkwem ihe kpatara CPU dị elu na ibu ebe nchekwa. M na-eche na nke a bụ n'ihi ịmepụta usoro oge ọhụrụ na ngọngọ isi na nnukwu ugboro.
Ibu CPU na-arị elu
Na mgbakwunye na compacts, nke na-emepụta ibu I / O dị elu, achọpụtara m nnukwu spikes na ibu CPU kwa nkeji abụọ. Ihe mgbawa na-adị ogologo mgbe ntinye ntinye dị elu ma yie ka ọ bụ onye nchịkọta ihe mkpofu Go kpatara ya, ma ọ dịkarịa ala ụfọdụ cores na-ejuju.
Mwụli elu ndị a abụghị obere ihe. Ọ na-egosi na mgbe ihe ndị a mere, ntinye ntinye na metrics nke Prometheus adịghị adị, na-akpata ọdịiche data n'otu oge ndị a.
Ị nwekwara ike chọpụta na onye na-ebupụ Prometheus na-emechi maka otu sekọnd.
Anyị nwere ike ịhụ njikọ na mkpofu mkpofu (GC).
nkwubi
TSDB na Prometheus 2 dị ngwa ngwa, nwee ike ijikwa ọtụtụ nde usoro oge yana n'otu oge ahụ ọtụtụ puku ndekọ kwa sekọnd site na iji ngwaike dị mma. Iji CPU na diski I/O na-atọkwa ụtọ. Ihe atụ m gosiri ihe ruru 200 metrics kwa sekọnd kwa isi eji.
Iji mee atụmatụ mgbasawanye, ịkwesịrị icheta banyere oke nchekwa zuru oke, nke a ga-abụrịrị ezigbo ebe nchekwa. Ọnụ ọgụgụ ebe nchekwa m hụrụ bụ ihe dị ka 5 GB kwa 100 ndekọ kwa sekọnd nke iyi na-abata, bụ nke yana cache sistemụ arụ ọrụ nyere ihe dị ka 000 GB nke ebe nchekwa.
N'ezie, a ka nwere ọtụtụ ọrụ a ga-arụ iji zụọ CPU na disk I / O spikes, na nke a abụghị ihe ijuanya na-atụle otú TSDB Prometheus 2 na-eto eto si tụnyere InnoDB, TokuDB, RocksDB, WiredTiger, ma ha niile nwere otu ihe ahụ. nsogbu n'oge usoro ndụ ha.
isi: www.habr.com