Nā hana hoʻopunipuni no ka hana ʻana i nā ana ma Kapacitor

ʻO ka mea nui, i kēia lā ʻaʻohe mea e nīnau i ke kumu e pono ai e hōʻiliʻili i nā metric lawelawe. ʻO ka hana kūpono aʻe, ʻo ia ka hoʻonohonoho ʻana i kahi mākaʻikaʻi no nā metric i hōʻiliʻili ʻia, e hōʻike ana e pili ana i nā ʻokoʻa o ka ʻikepili i nā ala kūpono iā ʻoe (maila, Slack, Telegram). Ma ka lawelawe hokele hokele pūnaewele Ostrovok.ru ua ninini ʻia nā metric āpau o kā mākou lawelawe i loko o InfluxDB a hōʻike ʻia ma Grafana, a ua hoʻonohonoho pū ʻia ka makaʻala kumu ma laila. No nā hana e like me "pono ​​ʻoe e helu i kekahi mea a hoʻohālikelike me ia," hoʻohana mākou iā Kapacitor.

Nā hana hoʻopunipuni no ka hana ʻana i nā ana ma Kapacitor
ʻO Kapacitor kahi ʻāpana o ka waihona TICK hiki ke hoʻoponopono i nā metric mai InfluxDB. Hiki iā ia ke hoʻohui i kekahi mau ana (hui pū), helu i kahi mea pono mai ka ʻikepili i loaʻa, kākau i ka hopena i InfluxDB, hoʻouna i kahi leka iā Slack/Telegram/mail.

ʻO ka pūʻulu holoʻokoʻa he ʻoluʻolu a kikoʻī palapala, akā e loaʻa mau nā mea pono i hōʻike ʻole ʻia i loko o nā manual. Ma kēia ʻatikala, ua hoʻoholo wau e hōʻiliʻili i kekahi o nā ʻōlelo aʻoaʻo pono ʻole a maopopo ʻole (ua wehewehe ʻia ka syntax kumu o TICKscipt. maanei) a hōʻike pehea e hoʻohana ʻia ai me ka hoʻohana ʻana i kahi laʻana o ka hoʻoponopono ʻana i kekahi o kā mākou pilikia.

E hele kākou!

float & int, hewa helu

ʻO kahi pilikia maʻamau, hoʻoholo ʻia ma o nā castes:

var alert_float = 5.0
var alert_int = 10
data|eval(lambda: float("value") > alert_float OR float("value") < float("alert_int"))

Ke hoʻohana nei i ka paʻamau ()

Inā ʻaʻole i hoʻopiha ʻia kahi hōʻailona/ʻāina, hiki mai nā hewa helu helu:

|default()
        .tag('status', 'empty')
        .field('value', 0)

hoʻopiha i ka hui (loko vs waho)

Ma ka paʻamau, e hoʻolei ʻo hui i nā wahi ʻaʻohe ʻikepili (i loko).
Me ka fill('null'), e hana ʻia kahi hui waho, a laila pono ʻoe e hana i kahi paʻamau () a hoʻopiha i nā waiwai ʻole:

var data = res1
    |join(res2)
        .as('res1', 'res2)
        .fill('null')
    |default()
        .field('res1.value', 0.0)
        .field('res2.value', 100.0)

Aia nō kahi nuance ma ʻaneʻi. Ma ka laʻana ma luna, inā nele kekahi o ka moʻo (res1 a i ʻole res2), e nele pū ka moʻo hopena (ʻikepili). Nui nā tiketi ma kēia kumuhana ma Github (1633, 1871, 6967) - ke kali nei mākou no ka hoʻoponopono a me ka pilikia iki.

Ke hoʻohana nei i nā kūlana i ka helu ʻana (inā i lambda)

|eval(lambda: if("value" > 0, true, false)

ʻElima mau minuke hope mai ka pipeline no ka manawa

No ka laʻana, pono ʻoe e hoʻohālikelike i nā waiwai o nā minuke ʻelima i hala me ka pule i hala. Hiki iā ʻoe ke lawe i ʻelua pūʻulu ʻikepili i ʻelua pūʻulu ʻokoʻa a i ʻole e unuhi i kahi ʻāpana o ka ʻikepili mai kahi manawa nui aʻe:

 |where(lambda: duration((unixNano(now()) - unixNano("time"))/1000, 1u) < 5m)

ʻO kahi koho no nā minuke ʻelima hope loa e hoʻohana i kahi BarrierNode, kahi e ʻoki ai i ka ʻikepili ma mua o ka manawa i ʻōlelo ʻia:

|barrier()
        .period(5m)

Nā laʻana o ka hoʻohana ʻana i nā template Go ma ka memo

Kūlike nā templates i ke ʻano mai ka pūʻolo kikokikona.templateAia ma lalo iho kekahi mau puʻupuʻu i ʻike pinepine ʻia.

inā-ʻē aʻe

Hoʻonohonoho mākou i nā mea a ʻaʻole hoʻomaka hou i ka poʻe me ka kikokikona:

|alert()
    ...
    .message(
        '{{ if eq .Level "OK" }}It is ok now{{ else }}Chief, everything is broken{{end}}'
    )

ʻElua huahelu ma hope o ka helu decimal i ka memo

Hoʻomaikaʻi i ka heluhelu ʻana o ka memo:

|alert()
    ...
    .message(
        'now value is {{ index .Fields "value" | printf "%0.2f" }}'
    )

Ke hoʻonui nei i nā ʻano like ʻole i ka memo

Hōʻike mākou i nā ʻike hou aʻe ma ka leka e pane ai i ka nīnau "No ke aha e uē ai"?

var warnAlert = 10
  |alert()
    ...
    .message(
       'Today value less then '+string(warnAlert)+'%'
    )

Mea hōʻike makaʻala kū hoʻokahi

He mea pono kēia inā ʻoi aku ma mua o hoʻokahi pūʻulu i ka ʻikepili, inā ʻaʻole e hana ʻia hoʻokahi makaʻala:

|alert()
      ...
      .id('{{ index .Tags "myname" }}/{{ index .Tags "myfield" }}')

ʻO ka mea lawelawe maʻamau

ʻO ka papa inoa nui o nā mea hoʻohana e pili ana i ka exec, kahi e hiki ai iā ʻoe ke hoʻokō i kāu palapala me nā ʻāpana i hala (stdin) - hana hana a ʻaʻohe mea ʻē aʻe!

ʻO kekahi o kā mākou maʻamau he palapala Python liʻiliʻi no ka hoʻouna ʻana i nā leka hoʻomaopopo i ka lohi.
I ka wā mua, makemake mākou e hoʻouna i kahi kiʻi grafana pale ʻia ma kahi leka. Ma hope iho, e kākau i ka OK ma ka pae i ka makaala mua mai ka hui like, ʻaʻole ma ke ʻano he leka kaʻawale. Ma hope iki - hoʻohui i ka memo i ka hewa maʻamau i nā minuke X hope loa.

ʻO ke kumuhana ʻokoʻa ke kamaʻilio ʻana me nā lawelawe ʻē aʻe a me nā hana i hoʻomaka ʻia e kahi makaʻala (inā wale nō inā maikaʻi kāu nānā ʻana).
ʻO kahi laʻana o ka wehewehe ʻana i ka mea hoʻohana, kahi ʻo slack_handler.py kā mākou palapala kākau iho:

topic: slack_graph
id: slack_graph.alert
match: level() != INFO AND changed() == TRUE
kind: exec
options:
  prog: /sbin/slack_handler.py
  args: ["-c", "CHANNELID", "--graph", "--search"]

Pehea e debug?

Koho me ka puka puka

|log()
      .level("error")
      .prefix("something")

Nānā (cli): kapacitor -url host-a-ip:9092 log lvl=hewa

Koho me httpOut

Hōʻike i ka ʻikepili i ka pipeline o kēia manawa:

|httpOut('something')

Nānā (loaʻa): host-a-ip:9092/kapacitor/v1/tasks/task_name/kekahi mea

Papahana hoʻokō

  • Hoʻihoʻi ʻia kēlā me kēia hana i kahi lāʻau hoʻokō me nā helu kūpono i ka ʻano graphviz.
  • Lawe i kahi poloka kiko.
  • Hoʻopili i ka mea nānā, lealea.

Ma hea ʻoe e loaʻa ai kahi rake?

timestamp ma influxdb ma ke kakau hope

No ka laʻana, hoʻonohonoho mākou i kahi makaʻala no ka huina o nā noi i kēlā me kēia hola (groupBy(1h)) a makemake mākou e hoʻopaʻa i ka makaʻala i hana ʻia ma influxdb (e hōʻike nani i ka ʻoiaʻiʻo o ka pilikia ma ka pakuhi ma grafana).

influxDBOut () e kākau i ka waiwai manawa mai ka makaʻala a i ka timestamp; no laila, e kākau ʻia ka helu ma ka pakuhi ma mua/ma hope o ka hiki ʻana mai o ka makaʻala.

Ke koi ʻia ka pololei: hana mākou i kēia pilikia ma ke kāhea ʻana i kahi mea hana maʻamau, nāna e kākau i ka ʻikepili i influxdb me ka timestamp o kēia manawa.

docker, kūkulu a hoʻolālā

I ka hoʻomaka ʻana, hiki i ke kapacitor ke hoʻouka i nā hana, nā mamana a me nā mea lawelawe mai ka papa kuhikuhi i kuhikuhi ʻia ma ka config ma ka poloka [load].

No ka hana pono ʻana i kahi hana, pono ʻoe i kēia mau mea:

  1. Ka inoa faila - hoʻonui ʻia i ka id/name script
  2. ʻAno - kahawai / pūʻulu
  3. dbrp - huaʻōlelo e hōʻike i ka waihona waihona + kulekele e holo ai ka palapala (dbrp "mea hoʻolako." "autogen")

Inā ʻaʻole i loaʻa kahi laina me ka dbrp i kekahi hana puʻupuʻu, e hōʻole ka lawelawe holoʻokoʻa e hoʻomaka a kākau pololei e pili ana iā ia ma ka log.

Ma ka chronograf, ʻaʻole pono kēia laina; ʻaʻole ʻae ʻia ma o ka interface a hoʻopuka i kahi hewa.

Hack i ka wā e kūkulu ai i kahi pahu: puka ʻo Dockerfile me -1 inā loaʻa nā laina me //.+dbrp, e hiki ai iā ʻoe ke hoʻomaopopo koke i ke kumu o ka hāʻule ʻana i ka wā e hui pū ai i ke kūkulu.

hui i kekahi i na mea he nui

Ka laʻana hana: pono ʻoe e lawe i ka 95th percentile o ka manawa hana o ka lawelawe no hoʻokahi pule, e hoʻohālikelike i kēlā me kēia minuke o ka 10 hope loa me kēia waiwai.

ʻAʻole hiki iā ʻoe ke hana i kahi hui hoʻokahi-i-nui, hope/mean/median ma luna o kahi pūʻulu o nā helu e hoʻohuli i ka node i kahawai, ʻo ka hewa "ʻaʻole hiki ke hoʻohui i nā ʻaoʻao like ʻole o ke keiki: batch -> stream" e hoʻihoʻi ʻia.

ʻAʻole i hoʻololi ʻia ka hopena o kahi puʻupuʻu, ma ke ʻano he hoʻololi i ka ʻōlelo lambda.

Aia kahi koho e mālama i nā helu pono mai ka pūʻulu mua i kahi faila ma udf a hoʻouka i kēia faila ma ka sideload.

He aha kā mākou i hoʻoholo ai me kēia?

Loaʻa iā mākou ma kahi o 100 mau mea hoʻolako hōkele, hiki i kēlā me kēia o lākou ke loaʻa i kekahi mau pilina, e kapa mākou he ala. Aia ma kahi o 300 o kēia mau kahawai, hiki i kēlā me kēia awāwa ke hāʻule. ʻO nā metric a pau i hoʻopaʻa ʻia, e nānā mākou i ka helu hewa (nā noi a me nā hewa).

No ke aha ʻaʻole grafana?

Loaʻa nā hemahema i nā ʻōkuhi hewa i hoʻonohonoho ʻia ma Grafana. He koʻikoʻi kekahi, hiki iā ʻoe ke pani i kou mau maka, ma muli o ke kūlana.

ʻAʻole ʻike ʻo Grafana i ka helu ʻana ma waena o nā ana + makaʻala, akā pono mākou i kahi helu (nā noi-hewa) / noi.

ʻIke hewa nā hewa:

Nā hana hoʻopunipuni no ka hana ʻana i nā ana ma Kapacitor

A emi iho ka hewa ke nānā ʻia me nā noi kūleʻa:

Nā hana hoʻopunipuni no ka hana ʻana i nā ana ma Kapacitor

ʻAe, hiki iā mākou ke helu mua i ka helu ma ka lawelawe ma mua o ka grafana, a i kekahi mau manawa e hana kēia. Akā ʻaʻole i loko o kā mākou, no ka mea ... no kēlā me kēia kaila, ua manaʻo ʻia kona lakio ponoʻī he "maʻamau", a hana nā mākaʻikaʻi e like me nā waiwai static (ʻimi mākou iā lākou me ko mākou mau maka, hoʻololi iā lākou inā he mau makaʻala pinepine).

He mau laʻana kēia o "maʻamau" no nā kaha like ʻole:

Nā hana hoʻopunipuni no ka hana ʻana i nā ana ma Kapacitor

Nā hana hoʻopunipuni no ka hana ʻana i nā ana ma Kapacitor

Hoʻokuʻu mākou i ka helu mua a manaʻo e like ke kiʻi "maʻamau" no nā mea hoʻolako āpau. I kēia manawa ua maikaʻi nā mea a pau, a hiki iā mākou ke loaʻa me nā mākaʻikaʻi ma grafana?
Hiki iā mākou, akā ʻaʻole makemake mākou, no ka mea pono mākou e koho i kekahi o nā koho:
a) e hana kaʻawale i nā kiʻi no kēlā me kēia kahawai (a hele pū me lākou me ka ʻeha)
b) waiho i hoʻokahi pakuhi me nā ala āpau (a nalowale i nā laina waihoʻoluʻu a me nā mākaʻikaʻi maʻamau)

Nā hana hoʻopunipuni no ka hana ʻana i nā ana ma Kapacitor

Pehea ʻoe i hana ai?

Eia hou, aia kahi kumu hoʻomaka maikaʻi i ka palapala (Ke helu ʻana i nā uku ma nā pūʻulu hui), hiki ke nānā a lawe ʻia i kumu no nā pilikia like.

He aha kā mākou i hana ai i ka hopena:

  • e hui pū i ʻelua pūʻulu i loko o kekahi mau hola, e hui pū ʻia ma nā kahawai;
  • hoʻopiha i ka moʻo ma ka hui inā ʻaʻohe ʻikepili;
  • hoʻohālikelike i ka median o nā minuke 10 hope loa me ka ʻikepili mua;
  • uwa makou ina loaa kekahi mea;
  • kākau mākou i nā helu helu a me nā makaʻala i hana ʻia ma influxdb;
  • e hoʻouna i kahi leka maikaʻi e lohi.

I koʻu manaʻo, ua hiki iā mākou ke hoʻokō i nā mea āpau a mākou e makemake ai e loaʻa i ka hopena (a ʻoi aku ka liʻiliʻi me nā mea lawelawe maʻamau) me ka nani loa.

Hiki iā ʻoe ke nānā ma github.com laʻana code и kaapuni liʻiliʻi (graphviz) ka hua palapala.

He laʻana o ke code hopena:

dbrp "supplier"."autogen"
var name = 'requests.rate'
var grafana_dash = 'pczpmYZWU/mydashboard'
var grafana_panel = '26'
var period = 8h
var todayPeriod = 10m
var every = 1m
var warnAlert = 15
var warnReset = 5
var reqQuery = 'SELECT sum("count") AS value FROM "supplier"."autogen"."requests"'
var errQuery = 'SELECT sum("count") AS value FROM "supplier"."autogen"."errors"'

var prevErr = batch
    |query(errQuery)
        .period(period)
        .every(every)
        .groupBy(1m, 'channel', 'supplier')

var prevReq = batch
    |query(reqQuery)
        .period(period)
        .every(every)
        .groupBy(1m, 'channel', 'supplier')

var rates = prevReq
    |join(prevErr)
        .as('req', 'err')
        .tolerance(1m)
        .fill('null')
    // заполняем значения нулями, если их не было
    |default()
        .field('err.value', 0.0)
        .field('req.value', 0.0)
    // if в lambda: считаем рейт, только если ошибки были
    |eval(lambda: if("err.value" > 0, 100.0 * (float("req.value") - float("err.value")) / float("req.value"), 100.0))
        .as('rate')

// записываем посчитанные значения в инфлюкс
rates
    |influxDBOut()
        .quiet()
        .create()
        .database('kapacitor')
        .retentionPolicy('autogen')
        .measurement('rates')

// выбираем данные за последние 10 минут, считаем медиану
var todayRate = rates
    |where(lambda: duration((unixNano(now()) - unixNano("time")) / 1000, 1u) < todayPeriod)
    |median('rate')
        .as('median')

var prevRate = rates
    |median('rate')
        .as('median')

var joined = todayRate
    |join(prevRate)
        .as('today', 'prev')
    |httpOut('join')

var trigger = joined
    |alert()
        .warn(lambda: ("prev.median" - "today.median") > warnAlert)
        .warnReset(lambda: ("prev.median" - "today.median") < warnReset)
        .flapping(0.25, 0.5)
        .stateChangesOnly()
        // собираем в message ссылку на график дашборда графаны
        .message(
            '{{ .Level }}: {{ index .Tags "channel" }} err/req ratio ({{ index .Tags "supplier" }})
{{ if eq .Level "OK" }}It is ok now{{ else }}
'+string(todayPeriod)+' median is {{ index .Fields "today.median" | printf "%0.2f" }}%, by previous '+string(period)+' is {{ index .Fields "prev.median" | printf "%0.2f" }}%{{ end }}
http://grafana.ostrovok.in/d/'+string(grafana_dash)+
'?var-supplier={{ index .Tags "supplier" }}&var-channel={{ index .Tags "channel" }}&panelId='+string(grafana_panel)+'&fullscreen&tz=UTC%2B03%3A00'
        )
        .id('{{ index .Tags "name" }}/{{ index .Tags "channel" }}')
        .levelTag('level')
        .messageField('message')
        .durationField('duration')
        .topic('slack_graph')

// "today.median" дублируем как "value", также пишем в инфлюкс остальные филды алерта (keep)
trigger
    |eval(lambda: "today.median")
        .as('value')
        .keep()
    |influxDBOut()
        .quiet()
        .create()
        .database('kapacitor')
        .retentionPolicy('autogen')
        .measurement('alerts')
        .tag('alertName', name)

He aha ka hopena?

Maikaʻi ʻo Kapacitor i ka hoʻokō ʻana i nā mākaʻikaʻi me nā hui pūʻulu, e hana ana i nā helu hou aʻe e pili ana i nā metric i hoʻopaʻa ʻia, hana i nā hana maʻamau a me nā palapala holo (udf).

ʻAʻole kiʻekiʻe ka pale i ke komo ʻana - e hoʻāʻo inā ʻaʻole hoʻokō pono ʻo grafana a i ʻole nā ​​​​mea hana ʻē aʻe i kou makemake.

Source: www.habr.com

Pākuʻi i ka manaʻo hoʻopuka