TSDB Analysis in Prometheus 2

TSDB Analysis in Prometheus 2

Lub sij hawm series database (TSDB) hauv Prometheus 2 yog ib qho piv txwv zoo heev ntawm kev daws teeb meem engineering uas muaj kev txhim kho loj dua v2 cia hauv Prometheus 1 hais txog cov ntaub ntawv nrawm nrawm, kev nug ua tiav, thiab kev siv peev txheej. Peb tab tom siv Prometheus 2 hauv Percona Kev Saib Xyuas thiab Kev Tswj Xyuas (PMM) thiab kuv muaj lub sijhawm los nkag siab txog kev ua tau zoo ntawm Prometheus 2 TSDB. Hauv tsab xov xwm no kuv yuav tham txog cov txiaj ntsig ntawm cov kev soj ntsuam no.

Nruab Nrab Prometheus Workload

Rau cov neeg siv los cuam tshuam nrog cov ntsiab lus dav dav, qhov kev ua haujlwm Prometheus yog qhov nthuav heev. Tus nqi ntawm cov ntaub ntawv khaws cia zoo li ruaj khov: feem ntau cov kev pabcuam uas koj saib xyuas xa tawm kwv yees li tib tus lej ntawm kev ntsuas, thiab cov txheej txheem hloov pauv qeeb qeeb.
Kev thov cov ntaub ntawv tuaj yeem los ntawm ntau qhov chaw. Ib txhia ntawm lawv, xws li kev ceeb toom, kuj siv zog rau qhov ruaj khov thiab kwv yees tus nqi. Lwm tus, xws li cov neeg siv thov, tuaj yeem ua rau tawg, txawm hais tias qhov no tsis yog rau feem ntau ntawm cov haujlwm.

Load test

Thaum lub sij hawm sim, kuv tsom ntsoov rau lub peev xwm los sau cov ntaub ntawv. Kuv xa Prometheus 2.3.2 tso ua ke nrog Go 1.10.1 (raws li ib feem ntawm PMM 1.14) ntawm Linode kev pabcuam siv tsab ntawv no: StackScript. Rau qhov tseeb tshaj plaws load tiam, siv qhov no StackScript Kuv tau tshaj tawm ntau lub MySQL nodes nrog lub load tiag tiag (Sysbench TPC-C Test), txhua tus uas emulated 10 Linux / MySQL nodes.
Tag nrho cov kev ntsuam xyuas hauv qab no tau ua tiav ntawm Linode server nrog yim virtual cores thiab 32 GB ntawm kev nco, khiav 20 load simulations xyuas ob puas MySQL piv txwv. Los yog, hauv Prometheus cov ntsiab lus, 800 lub hom phiaj, 440 scrapes ib ob, 380 txhiab cov ntaub ntawv ib ob, thiab 1,7 lab lub sijhawm ua haujlwm.

tsim

Txoj hauv kev ib txwm muaj ntawm cov ntaub ntawv ib txwm siv, suav nrog ib qho siv los ntawm Prometheus 1.x, yog rau nco txwv. Yog tias nws tsis txaus los tuav lub load, koj yuav muaj kev latencies siab thiab qee qhov kev thov yuav ua tsis tiav. Kev siv nco hauv Prometheus 2 yog configurable ntawm tus yuam sij storage.tsdb.min-block-duration, uas txiav txim siab ntev npaum li cas cov ntaub ntawv yuav khaws cia hauv lub cim xeeb ua ntej ntws mus rau disk (default yog 2 teev). Lub cim xeeb xav tau yuav nyob ntawm seb muaj pes tsawg lub sijhawm, cov ntawv sau, thiab cov khoom seem ntxiv rau cov kwj nkag. Hais txog qhov chaw disk, Prometheus lub hom phiaj siv 3 bytes ib cov ntaub ntawv (piv txwv). Ntawm qhov tod tes, kev xav nco tau ntau dua.

Txawm hais tias nws tuaj yeem teeb tsa qhov loj me, nws tsis pom zoo kom teeb tsa nws manually, yog li koj raug yuam kom muab Prometheus nco ntau npaum li nws xav tau rau koj cov haujlwm.
Yog tias tsis muaj lub cim xeeb txaus los txhawb cov kwj ntawm kev ntsuas, Prometheus yuav poob ntawm lub cim xeeb lossis OOM killer yuav tau txais rau nws.
Ntxiv kev sib pauv kom ncua kev sib tsoo thaum Prometheus khiav tawm ntawm lub cim xeeb tsis pab tiag tiag, vim tias siv cov haujlwm no ua rau lub cim xeeb tawg. Kuv xav tias nws yog ib yam dab tsi ua nrog Go, nws cov khoom khib nyiab thiab txoj kev nws cuam tshuam nrog kev sib pauv.
Lwm qhov kev nthuav qhia yog los teeb tsa lub taub hau thaiv kom ntws mus rau disk ntawm ib lub sijhawm, es tsis txhob suav txij thaum pib ntawm txoj haujlwm.

TSDB Analysis in Prometheus 2

Raws li koj tuaj yeem pom los ntawm daim duab, flushes rau disk tshwm sim txhua ob teev. Yog tias koj hloov qhov min-block-ntev parameter rau ib teev, ces cov kev rov pib dua no yuav tshwm sim txhua teev, pib tom qab ib nrab teev.
Yog tias koj xav siv qhov no thiab lwm cov duab hauv koj qhov kev teeb tsa Prometheus, koj tuaj yeem siv qhov no dashboard. Nws tau tsim los rau PMM tab sis, nrog kev hloov kho me me, haum rau txhua qhov kev teeb tsa Prometheus.
Peb muaj ib qho active block hu ua head block uas yog khaws cia rau hauv lub cim xeeb; blocks nrog cov ntaub ntawv qub muaj nyob ntawm mmap(). Qhov no tshem tawm qhov yuav tsum tau teeb tsa lub cache cais, tab sis kuj txhais tau hais tias koj yuav tsum tau tawm qhov chaw txaus rau lub operating system cache yog tias koj xav nug cov ntaub ntawv qub dua li qhov twg lub taub hau thaiv tuaj yeem haum.
Qhov no kuj txhais tau hais tias Prometheus virtual nco noj yuav zoo heev, uas tsis yog ib yam dab tsi yuav txhawj txog.

TSDB Analysis in Prometheus 2

Lwm qhov nthuav dav tsim yog siv WAL (sau ua ntej lub cav). Raws li koj tuaj yeem pom los ntawm cov ntaub ntawv khaws cia, Prometheus siv WAL kom tsis txhob muaj kev sib tsoo. Cov txheej txheem tshwj xeeb rau kev lees paub cov ntaub ntawv muaj sia nyob yog, hmoov tsis, tsis muaj ntaub ntawv zoo. Prometheus version 2.3.2 flushes WAL rau disk txhua 10 vib nas this thiab qhov kev xaiv no tsis tuaj yeem siv tau.

Compactions

Prometheus TSDB yog tsim los zoo li lub khw LSM (Log Structured Merge) lub khw: lub taub hau thaiv yog flushed ib ntus mus rau disk, thaum lub compaction mechanism muab ntau blocks ua ke kom tsis txhob scan ntau dhau lawm blocks thaum queries. Ntawm no koj tuaj yeem pom cov naj npawb ntawm cov blocks uas kuv tau pom ntawm qhov kev sim ntsuas tom qab ib hnub ntawm kev thauj khoom.

TSDB Analysis in Prometheus 2

Yog tias koj xav kawm ntxiv txog lub khw, koj tuaj yeem tshawb xyuas cov ntaub ntawv meta.json, uas muaj cov ntaub ntawv hais txog cov blocks muaj thiab lawv tuaj ua li cas.

{
       "ulid": "01CPZDPD1D9R019JS87TPV5MPE",
       "minTime": 1536472800000,
       "maxTime": 1536494400000,
       "stats": {
               "numSamples": 8292128378,
               "numSeries": 1673622,
               "numChunks": 69528220
       },
       "compaction": {
               "level": 2,
               "sources": [
                       "01CPYRY9MS465Y5ETM3SXFBV7X",
                       "01CPYZT0WRJ1JB1P0DP80VY5KJ",
                       "01CPZ6NR4Q3PDP3E57HEH760XS"
               ],
               "parents": [
                       {
                               "ulid": "01CPYRY9MS465Y5ETM3SXFBV7X",
                               "minTime": 1536472800000,
                               "maxTime": 1536480000000
                       },
                       {
                               "ulid": "01CPYZT0WRJ1JB1P0DP80VY5KJ",
                               "minTime": 1536480000000,
                               "maxTime": 1536487200000
                       },
                       {
                               "ulid": "01CPZ6NR4Q3PDP3E57HEH760XS",
                               "minTime": 1536487200000,
                               "maxTime": 1536494400000
                       }
               ]
       },
       "version": 1
}

Compactions nyob rau hauv Prometheus yog khi rau lub sij hawm lub taub hau block yog flushed rau disk. Lub sijhawm no, ntau yam haujlwm zoo li no tuaj yeem ua tiav.

TSDB Analysis in Prometheus 2

Nws zoo nkaus li tias compactions tsis txwv nyob rau hauv txhua txoj kev thiab tuaj yeem ua rau loj disk I / O spikes thaum ua tiav.

TSDB Analysis in Prometheus 2

CPU load spikes

TSDB Analysis in Prometheus 2

Tau kawg, qhov no muaj qhov cuam tshuam tsis zoo rau kev ceev ntawm lub kaw lus, thiab tseem ua rau muaj kev sib tw loj rau LSM cia: yuav ua li cas kom compaction los txhawb tus nqi thov siab yam tsis ua rau ntau dhau?
Kev siv lub cim xeeb hauv cov txheej txheem compaction kuj zoo li nthuav heev.

TSDB Analysis in Prometheus 2

Peb tuaj yeem pom yuav ua li cas, tom qab compaction, feem ntau ntawm lub cim xeeb hloov lub xeev los ntawm Cached rau Dawb: qhov no txhais tau tias cov ntaub ntawv tseem ceeb tau raug tshem tawm ntawm qhov ntawd. Xav paub yog tias nws tau siv ntawm no fadvice() los yog ib co lwm yam minimization txheej txheem, los yog nws yog vim hais tias lub cache twb freed los ntawm blocks puas thaum lub sij hawm compaction?

Kev kho mob tom qab ua tsis tiav

Kev rov qab los ntawm kev ua tsis tiav yuav siv sijhawm, thiab yog vim li cas. Rau cov kwj tuaj ntawm ib lab cov ntaub ntawv ib ob, kuv yuav tsum tau tos txog 25 feeb thaum qhov kev rov qab tau ua los ntawm SSD tsav.

level=info ts=2018-09-13T13:38:14.09650965Z caller=main.go:222 msg="Starting Prometheus" version="(version=2.3.2, branch=v2.3.2, revision=71af5e29e815795e9dd14742ee7725682fa14b7b)"
level=info ts=2018-09-13T13:38:14.096599879Z caller=main.go:223 build_context="(go=go1.10.1, user=Jenkins, date=20180725-08:58:13OURCE)"
level=info ts=2018-09-13T13:38:14.096624109Z caller=main.go:224 host_details="(Linux 4.15.0-32-generic #35-Ubuntu SMP Fri Aug 10 17:58:07 UTC 2018 x86_64 1bee9e9b78cf (none))"
level=info ts=2018-09-13T13:38:14.096641396Z caller=main.go:225 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2018-09-13T13:38:14.097715256Z caller=web.go:415 component=web msg="Start listening for connections" address=:9090
level=info ts=2018-09-13T13:38:14.097400393Z caller=main.go:533 msg="Starting TSDB ..."
level=info ts=2018-09-13T13:38:14.098718401Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536530400000 maxt=1536537600000 ulid=01CQ0FW3ME8Q5W2AN5F9CB7R0R
level=info ts=2018-09-13T13:38:14.100315658Z caller=web.go:467 component=web msg="router prefix" prefix=/prometheus
level=info ts=2018-09-13T13:38:14.101793727Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536732000000 maxt=1536753600000 ulid=01CQ78486TNX5QZTBF049PQHSM
level=info ts=2018-09-13T13:38:14.102267346Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536537600000 maxt=1536732000000 ulid=01CQ78DE7HSQK0C0F5AZ46YGF0
level=info ts=2018-09-13T13:38:14.102660295Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536775200000 maxt=1536782400000 ulid=01CQ7SAT4RM21Y0PT5GNSS146Q
level=info ts=2018-09-13T13:38:14.103075885Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1536753600000 maxt=1536775200000 ulid=01CQ7SV8WJ3C2W5S3RTAHC2GHB
level=error ts=2018-09-13T14:05:18.208469169Z caller=wal.go:275 component=tsdb msg="WAL corruption detected; truncating" err="unexpected CRC32 checksum d0465484, want 0" file=/opt/prometheus/data/.prom2-data/wal/007357 pos=15504363
level=info ts=2018-09-13T14:05:19.471459777Z caller=main.go:543 msg="TSDB started"
level=info ts=2018-09-13T14:05:19.471604598Z caller=main.go:603 msg="Loading configuration file" filename=/etc/prometheus.yml
level=info ts=2018-09-13T14:05:19.499156711Z caller=main.go:629 msg="Completed loading of configuration file" filename=/etc/prometheus.yml
level=info ts=2018-09-13T14:05:19.499228186Z caller=main.go:502 msg="Server is ready to receive web requests."

Qhov teeb meem tseem ceeb ntawm cov txheej txheem rov qab yog siab nco noj. Txawm tias muaj tseeb hais tias nyob rau hauv ib txwm muaj xwm txheej tus neeg rau zaub mov tuaj yeem ua haujlwm ruaj khov nrog tib lub cim xeeb, yog tias nws poob nws yuav tsis zoo vim OOM. Qhov kev daws teeb meem nkaus xwb uas kuv pom tau yog los lov tes taw cov ntaub ntawv sau, nqa cov neeg rau zaub mov, cia nws rov qab thiab rov pib nrog kev sau ua haujlwm.

Ua kom sov

Lwm tus cwj pwm kom nco ntsoov thaum sov so yog kev sib raug zoo ntawm kev ua haujlwm qis thiab kev siv nyiaj ntau tom qab pib. Thaum qee qhov, tab sis tsis yog txhua qhov pib, kuv tau pom qhov hnyav ntawm CPU thiab nco.

TSDB Analysis in Prometheus 2

TSDB Analysis in Prometheus 2

Qhov khoob ntawm kev siv lub cim xeeb qhia tias Prometheus tsis tuaj yeem teeb tsa txhua qhov kev sau los ntawm qhov pib, thiab qee cov ntaub ntawv ploj.
Kuv tsis tau xam qhov tseeb yog vim li cas rau lub siab CPU thiab nco load. Kuv xav tias qhov no yog vim yog kev tsim lub sijhawm tshiab nyob rau hauv lub taub hau thaiv nrog ntau zaus.

CPU loads nce

Ntxiv rau qhov compactions, uas tsim ib tug ncaj siab I / O load, kuv pom loj spikes nyob rau hauv CPU load txhua ob feeb. Qhov tawg tau ntev dua thaum cov khoom nkag siab siab thiab zoo li tshwm sim los ntawm Go tus neeg khaws khib nyiab, nrog tsawg kawg qee cov cores tau ntim tag nrho.

TSDB Analysis in Prometheus 2

TSDB Analysis in Prometheus 2

Cov jumps no tsis yog qhov tsis tseem ceeb. Nws zoo nkaus li tias thaum cov no tshwm sim, Prometheus qhov chaw nkag sab hauv thiab cov ntsuas ntsuas tsis muaj, ua rau cov ntaub ntawv sib txawv nyob rau tib lub sijhawm.

TSDB Analysis in Prometheus 2

Koj tuaj yeem pom tias Prometheus exporter kaw rau ib pliag.

TSDB Analysis in Prometheus 2

Peb tuaj yeem pom muaj kev sib raug zoo nrog kev khaws khib nyiab (GC).

TSDB Analysis in Prometheus 2

xaus

TSDB nyob rau hauv Prometheus 2 yog ceev, muaj peev xwm tuav tau ntau lab lub sij hawm series thiab tib lub sij hawm phav phav cov ntaub ntawv nyob rau hauv ib tug thib ob siv cov khoom siv me me. Kev siv CPU thiab disk I / O yog qhov tseem ceeb. Kuv qhov piv txwv tau pom txog 200 metrics ib ob rau ib tus tub ntxhais siv.

Txhawm rau npaj kev nthuav dav, koj yuav tsum nco ntsoov txog qhov txaus ntawm lub cim xeeb, thiab qhov no yuav tsum yog lub cim xeeb tiag tiag. Tus nqi ntawm lub cim xeeb siv uas kuv pom yog hais txog 5 GB ib 100 cov ntaub ntawv ib ob ntawm cov kwj nkag, uas ua ke nrog lub operating system cache muab txog 000 GB ntawm lub cim xeeb nyob.

Tau kawg, tseem muaj ntau txoj haujlwm yuav tsum tau ua kom tswj tau CPU thiab disk I / O spikes, thiab qhov no tsis yog qhov xav tsis thoob xav tias yuav ua li cas cov tub ntxhais hluas TSDB Prometheus 2 piv rau InnoDB, TokuDB, RocksDB, WiredTiger, tab sis lawv txhua tus muaj qhov zoo sib xws. teeb meem thaum ntxov ntawm lawv lub neej.

Tau qhov twg los: www.hab.com

Ntxiv ib saib