Kusanthula kwa TSDB ku Prometheus 2

Kusanthula kwa TSDB ku Prometheus 2

Mndandanda wa nthawi (TSDB) ku Prometheus 2 ndi chitsanzo chabwino kwambiri cha yankho la uinjiniya lomwe limapereka kusintha kwakukulu pakusungidwa kwa v2 ku Prometheus 1 potengera liwiro la kusonkhanitsa deta, kufufuzidwa kwa mafunso, komanso kugwiritsa ntchito bwino zida. Tinkagwiritsa ntchito Prometheus 2 ku Percona Monitoring and Management (PMM) ndipo ndinali ndi mwayi womvetsetsa momwe Prometheus 2 TSDB ikuyendera. M'nkhaniyi ndilankhula za zotsatira za kuwunikaku.

Pafupifupi Prometheus Workload

Kwa iwo omwe amagwiritsidwa ntchito ndi nkhokwe zambiri, kuchuluka kwa ntchito ya Prometheus ndikosangalatsa kwambiri. Mlingo wa kusonkhanitsa deta umakhala wokhazikika: nthawi zambiri ntchito zomwe mumayang'anira zimatumiza pafupifupi ma metrics ofanana, ndipo zomangamanga zimasintha pang'onopang'ono.
Zopempha kuti mudziwe zambiri zitha kuchokera kuzinthu zosiyanasiyana. Zina mwa izo, monga zidziwitso, zimayesetsanso kukhala ndi mtengo wokhazikika komanso wodziwikiratu. Zina, monga zopempha za ogwiritsa ntchito, zingayambitse kuphulika, ngakhale kuti sizili choncho pa ntchito zambiri.

Katundu mayeso

Panthawi yoyesedwa, ndinayang'ana kwambiri luso la kusonkhanitsa deta. Ndinayika Prometheus 2.3.2 yopangidwa ndi Go 1.10.1 (monga gawo la PMM 1.14) pa Linode service pogwiritsa ntchito script: StackScript. Kwa m'badwo weniweni wolemetsa, kugwiritsa ntchito izi StackScript Ndinayambitsa ma node angapo a MySQL ndi katundu weniweni (Sysbench TPC-C Test), iliyonse yomwe inkatsanzira 10 Linux / MySQL nodes.
Mayesero onse otsatirawa adachitidwa pa seva ya Linode yokhala ndi ma cores asanu ndi atatu ndi 32 GB ya kukumbukira, kuyendetsa zoyeserera 20 zowunikira zochitika mazana awiri a MySQL. Kapena, m'mawu a Prometheus, zolinga za 800, 440 kukwapula pa sekondi imodzi, ma 380 zikwi pa sekondi imodzi, ndi 1,7 miliyoni yogwira ntchito.

kamangidwe

Njira yodziwika bwino yama database achikhalidwe, kuphatikiza yomwe imagwiritsidwa ntchito ndi Prometheus 1.x, ndiyo malire a kukumbukira. Ngati sikukwanira kuthana ndi katunduyo, mudzakhala ndi ma latency apamwamba ndipo zopempha zina zidzalephera. Kugwiritsa ntchito kukumbukira ku Prometheus 2 kumasinthidwa kudzera pa kiyi storage.tsdb.min-block-duration, yomwe imatsimikizira kuti zojambulira zidzasungidwa nthawi yayitali bwanji musanatumize ku disk (zosakhazikika ndi maola a 2). Kuchuluka kwa kukumbukira komwe kumafunikira kumatengera kuchuluka kwa nthawi, zolemba, ndi zotsalira zomwe zawonjezeredwa pamtsinje womwe ukubwera. Pankhani ya disk space, Prometheus akufuna kugwiritsa ntchito 3 byte pa mbiri (chitsanzo). Kumbali ina, zofunika kukumbukira ndi apamwamba kwambiri.

Ngakhale kuti n'zotheka kukonza kukula kwa chipika, sikuvomerezeka kuti muyike pamanja, kotero mumakakamizika kupereka Prometheus kukumbukira kwambiri monga momwe zimafunira pa ntchito yanu.
Ngati palibe kukumbukira kokwanira kuthandizira ma metric omwe akubwera, Prometheus adzaiwala kapena wakupha wa OOM adzafika.
Kuwonjezera kusinthana kuti muchedwetse ngozi Prometheus ikatha kukumbukira sikuthandiza kwenikweni, chifukwa kugwiritsa ntchito ntchitoyi kumayambitsa kukumbukira kukumbukira. Ndikuganiza kuti ndichinthu chochita ndi Go, otolera zinyalala ndi momwe amachitira ndi kusinthana.
Njira ina yosangalatsa ndiyo kukonza mutuwo kuti usunthidwe ku disk panthawi inayake, m'malo mowerengera kuyambira pachiyambi.

Kusanthula kwa TSDB ku Prometheus 2

Monga mukuwonera pa graph, kuthamangitsidwa ku disk kumachitika maola awiri aliwonse. Ngati musintha mphindi-block-duration parameter kukhala ola limodzi, ndiye kuti kukonzanso uku kudzachitika ola lililonse, kuyambira patatha theka la ola.
Ngati mukufuna kugwiritsa ntchito izi ndi ma graph ena pakuyika kwanu kwa Prometheus, mutha kugwiritsa ntchito izi dashboard. Zinapangidwira PMM koma, ndi zosintha zazing'ono, zimagwirizana ndi kuika kulikonse kwa Prometheus.
Tili ndi chipika chogwira ntchito chotchedwa mutu wamutu chomwe chimasungidwa kukumbukira; midadada yokhala ndi data yakale ikupezeka kudzera mmap(). Izi zimathetsa kufunika kokonzekera cache padera, komanso zikutanthauza kuti muyenera kusiya malo okwanira osungira makina opangira opaleshoni ngati mukufuna kufunsa deta yakale kuposa zomwe mutu wamutu ungathe kukhala nawo.
Izi zikutanthauzanso kuti kugwiritsa ntchito kukumbukira kwa Prometheus kudzawoneka kwambiri, chomwe sichinthu chodetsa nkhawa.

Kusanthula kwa TSDB ku Prometheus 2

Mfundo ina yochititsa chidwi ndi kugwiritsa ntchito WAL (lembani kutsogolo log). Monga mukuwonera pazolemba zosungira, Prometheus amagwiritsa WAL kupewa ngozi. Njira zenizeni zotsimikizira kupulumuka kwa deta, mwatsoka, sizinalembedwe bwino. Mtundu wa Prometheus 2.3.2 umathamangitsa WAL ku disk masekondi 10 aliwonse ndipo njirayi siyosinthika.

Zogwirizana

Prometheus TSDB idapangidwa ngati sitolo ya LSM (Log Structured Merge): chipika chamutu chimasunthidwa nthawi ndi nthawi kupita ku diski, pomwe makina ophatikizira amaphatikiza midadada ingapo kuti asayang'ane midadada yambiri pakufunsa. Apa mutha kuwona kuchuluka kwa midadada yomwe ndidayiwona pamayeso atatha tsiku lolemetsa.

Kusanthula kwa TSDB ku Prometheus 2

Ngati mukufuna kudziwa zambiri za sitolo, mutha kuyang'ana fayilo ya meta.json, yomwe ili ndi chidziwitso cha midadada yomwe ilipo komanso momwe idakhalira.

{
       "ulid": "01CPZDPD1D9R019JS87TPV5MPE",
       "minTime": 1536472800000,
       "maxTime": 1536494400000,
       "stats": {
               "numSamples": 8292128378,
               "numSeries": 1673622,
               "numChunks": 69528220
       },
       "compaction": {
               "level": 2,
               "sources": [
                       "01CPYRY9MS465Y5ETM3SXFBV7X",
                       "01CPYZT0WRJ1JB1P0DP80VY5KJ",
                       "01CPZ6NR4Q3PDP3E57HEH760XS"
               ],
               "parents": [
                       {
                               "ulid": "01CPYRY9MS465Y5ETM3SXFBV7X",
                               "minTime": 1536472800000,
                               "maxTime": 1536480000000
                       },
                       {
                               "ulid": "01CPYZT0WRJ1JB1P0DP80VY5KJ",
                               "minTime": 1536480000000,
                               "maxTime": 1536487200000
                       },
                       {
                               "ulid": "01CPZ6NR4Q3PDP3E57HEH760XS",
                               "minTime": 1536487200000,
                               "maxTime": 1536494400000
                       }
               ]
       },
       "version": 1
}

Ma Compactions ku Prometheus amamangiriridwa ku nthawi yomwe chipika chamutu chimatsitsidwa ku disk. Pakadali pano, ntchito zingapo zotere zitha kuchitika.

Kusanthula kwa TSDB ku Prometheus 2

Zikuwoneka kuti kuphatikizika sikuli malire mwanjira iliyonse ndipo kumatha kuyambitsa ma spikes akulu a disk I/O pakuphedwa.

Kusanthula kwa TSDB ku Prometheus 2

CPU load spikes

Kusanthula kwa TSDB ku Prometheus 2

Zoonadi, izi zimakhala ndi zotsatira zoipa pa liwiro la dongosolo, ndipo zimabweretsanso vuto lalikulu la kusungirako kwa LSM: momwe mungapangire compaction kuti muthandizire mitengo yopempha kwambiri popanda kuchititsa kukweza kwambiri?
Kugwiritsa ntchito kukumbukira munjira yophatikizika kumawonekanso kosangalatsa.

Kusanthula kwa TSDB ku Prometheus 2

Titha kuwona momwe, pambuyo pophatikizika, zokumbukira zambiri zimasintha kuchokera ku Cached kupita ku Free: izi zikutanthauza kuti chidziwitso chofunikira chachotsedwa pamenepo. Ndikufuna kudziwa ngati agwiritsidwa ntchito pano fadvice() kapena njira ina yochepetsera, kapena ndichifukwa choti cache idamasulidwa ku midadada yomwe idawonongeka panthawi yophatikizika?

Kuchira pambuyo kulephera

Kuchira ku zolephera kumatenga nthawi, ndipo pazifukwa zomveka. Kwa mtsinje ukubwera wa miliyoni mbiri pa sekondi, Ndinayenera kudikira pafupi mphindi 25 pamene kuchira kunachitika poganizira SSD pagalimoto.

level=info ts=2018-09-13T13:38:14.09650965Z caller=main.go:222 msg="Starting Prometheus" version="(version=2.3.2, branch=v2.3.2, revision=71af5e29e815795e9dd14742ee7725682fa14b7b)"
level=info ts=2018-09-13T13:38:14.096599879Z caller=main.go:223 build_context="(go=go1.10.1, user=Jenkins, date=20180725-08:58:13OURCE)"
level=info ts=2018-09-13T13:38:14.096624109Z caller=main.go:224 host_details="(Linux 4.15.0-32-generic #35-Ubuntu SMP Fri Aug 10 17:58:07 UTC 2018 x86_64 1bee9e9b78cf (none))"
level=info ts=2018-09-13T13:38:14.096641396Z caller=main.go:225 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2018-09-13T13:38:14.097715256Z caller=web.go:415 component=web msg="Start listening for connections" address=:9090
level=info ts=2018-09-13T13:38:14.097400393Z caller=main.go:533 msg="Starting TSDB ..."
level=info ts=2018-09-13T13:38:14.098718401Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536530400000 maxt=1536537600000 ulid=01CQ0FW3ME8Q5W2AN5F9CB7R0R
level=info ts=2018-09-13T13:38:14.100315658Z caller=web.go:467 component=web msg="router prefix" prefix=/prometheus
level=info ts=2018-09-13T13:38:14.101793727Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536732000000 maxt=1536753600000 ulid=01CQ78486TNX5QZTBF049PQHSM
level=info ts=2018-09-13T13:38:14.102267346Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536537600000 maxt=1536732000000 ulid=01CQ78DE7HSQK0C0F5AZ46YGF0
level=info ts=2018-09-13T13:38:14.102660295Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536775200000 maxt=1536782400000 ulid=01CQ7SAT4RM21Y0PT5GNSS146Q
level=info ts=2018-09-13T13:38:14.103075885Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536753600000 maxt=1536775200000 ulid=01CQ7SV8WJ3C2W5S3RTAHC2GHB
level=error ts=2018-09-13T14:05:18.208469169Z caller=wal.go:275 component=tsdb msg="WAL corruption detected; truncating" err="unexpected CRC32 checksum d0465484, want 0" file=/opt/prometheus/data/.prom2-data/wal/007357 pos=15504363
level=info ts=2018-09-13T14:05:19.471459777Z caller=main.go:543 msg="TSDB started"
level=info ts=2018-09-13T14:05:19.471604598Z caller=main.go:603 msg="Loading configuration file" filename=/etc/prometheus.yml
level=info ts=2018-09-13T14:05:19.499156711Z caller=main.go:629 msg="Completed loading of configuration file" filename=/etc/prometheus.yml
level=info ts=2018-09-13T14:05:19.499228186Z caller=main.go:502 msg="Server is ready to receive web requests."

Vuto lalikulu la njira yobwezeretsa ndikugwiritsa ntchito kukumbukira kwambiri. Ngakhale kuti nthawi yabwino seva imatha kugwira ntchito mokhazikika ndi kukumbukira komweko, ngati itawonongeka mwina sikuchira chifukwa cha OOM. Yankho lokhalo lomwe ndidapeza ndikuletsa kusonkhanitsa deta, kubweretsa seva, kuyilola kuti ibwerere ndikuyambiranso ndikutha kusonkhanitsa.

Kukuwotha

Khalidwe lina loyenera kukumbukira panthawi yotentha ndi mgwirizano womwe ulipo pakati pa kuchepa kwa ntchito ndi kugwiritsa ntchito zinthu zambiri mutangoyamba kumene. Nthawi zina, koma osati zonse zomwe zimayamba, ndidawona zovuta kwambiri pa CPU ndi kukumbukira.

Kusanthula kwa TSDB ku Prometheus 2

Kusanthula kwa TSDB ku Prometheus 2

Mipata pakugwiritsa ntchito kukumbukira ikuwonetsa kuti Prometheus sangathe kukonza zosonkhanitsira zonse kuyambira pachiyambi, ndipo zina zimatayika.
Sindinapeze zifukwa zenizeni za kuchuluka kwa CPU ndi kukumbukira kukumbukira. Ndikukayikira kuti izi ndichifukwa chopanga mndandanda wanthawi zatsopano pamutu wamutu wokhala ndi ma frequency apamwamba.

Kuchulukitsa kwa CPU

Kuphatikiza pa ma compaction, omwe amapanga katundu wokwera kwambiri wa I / O, ndidawona ma spikes akulu mu kuchuluka kwa CPU mphindi ziwiri zilizonse. Kuphulika kumakhala kotalika pamene kulowetsedwa kumakhala kwakukulu ndipo kumawoneka kuti kumayambitsidwa ndi otolera zinyalala a Go, pomwe ma cores ena amadzaza.

Kusanthula kwa TSDB ku Prometheus 2

Kusanthula kwa TSDB ku Prometheus 2

Kudumpha kumeneku sikochepa. Zikuwoneka kuti izi zikachitika, malo olowera mkati mwa Prometheus ndi ma metric sapezeka, zomwe zimapangitsa kuti pakhale kusiyana kwa data munthawi yomweyo.

Kusanthula kwa TSDB ku Prometheus 2

Mutha kuzindikiranso kuti wogulitsa kunja kwa Prometheus amatseka kwa sekondi imodzi.

Kusanthula kwa TSDB ku Prometheus 2

Titha kuzindikira kulumikizana ndi kusonkhanitsa zinyalala (GC).

Kusanthula kwa TSDB ku Prometheus 2

Pomaliza

TSDB ku Prometheus 2 ndi yachangu, yokhoza kunyamula miyandamiyanda ya nthawi ndipo nthawi yomweyo ma rekodi masauzande pamphindi imodzi pogwiritsa ntchito zida zocheperako. Kugwiritsa ntchito kwa CPU ndi disk I/O ndikosangalatsanso. Chitsanzo changa chinawonetsa mpaka ma metric 200 pa sekondi iliyonse pachimake chogwiritsidwa ntchito.

Kuti mukonzekere kukulitsa, muyenera kukumbukira za kukumbukira kokwanira, ndipo izi ziyenera kukhala zokumbukira zenizeni. Kuchuluka kwa kukumbukira komwe ndidawona kunali pafupi ndi 5 GB pa 100 zolemba pamphindi imodzi yamtsinje womwe ukubwera, womwe pamodzi ndi makina ogwiritsira ntchito makina ogwiritsira ntchito adapereka pafupifupi 000 GB ya kukumbukira.

Zachidziwikire, pali ntchito yambiri yoti ichitidwe kuti awononge CPU ndi disk I/O spikes, ndipo izi sizodabwitsa poganizira momwe TSDB Prometheus 2 yachichepere ikufananizira ndi InnoDB, TokuDB, RocksDB, WiredTiger, koma onse anali ndi zofanana. mavuto kumayambiriro kwa moyo wawo.

Source: www.habr.com

Kuwonjezera ndemanga