Dabaru don sarrafa awo a cikin Kapacitor

Mafi mahimmanci, a yau babu wanda ya tambayi dalilin da yasa ya zama dole don tattara ma'aunin sabis. Mataki mai ma'ana na gaba shine saita faɗakarwa don ma'aunin da aka tattara, wanda zai sanar da duk wani sabani a cikin bayanan da ke cikin tashoshi masu dacewa da ku (mail, Slack, Telegram). A cikin sabis ɗin otal ɗin kan layi Ostrovok.ru Ana zuba duk awo na ayyukanmu a cikin InfluxDB kuma ana nunawa a Grafana, kuma ana saita faɗakarwar asali a can. Don ayyuka kamar "kana buƙatar lissafin wani abu kuma kwatanta shi," muna amfani da Kapacitor.

Dabaru don sarrafa awo a cikin Kapacitor
Kapacitor wani ɓangare ne na tarin TICK wanda zai iya sarrafa awo daga InfluxDB. Yana iya haɗa ma'auni da yawa tare (haɗa), lissafin wani abu mai amfani daga bayanan da aka karɓa, rubuta sakamakon baya zuwa InfluxDB, aika faɗakarwa zuwa Slack/Telegram/mail.

Duk tarin yana da kyau da daki-daki takardun shaida, amma koyaushe za a sami abubuwa masu amfani waɗanda ba a fayyace su a cikin littattafan ba. A cikin wannan labarin, na yanke shawarar tattara adadin irin waɗannan fa'idodin masu amfani, nasihohi marasa fa'ida (an bayyana ainihin ma'anar TICKscipt. a nan) da kuma nuna yadda za a iya amfani da su ta amfani da misalin magance ɗaya daga cikin matsalolinmu.

Bari mu tafi!

iyo & int, kurakuran lissafi

Matsala madaidaiciya, wanda aka warware ta hanyar siminti:

var alert_float = 5.0
var alert_int = 10
data|eval(lambda: float("value") > alert_float OR float("value") < float("alert_int"))

Amfani da tsoho ()

Idan ba a cika alamar/filin ba, kurakuran lissafi zasu faru:

|default()
        .tag('status', 'empty')
        .field('value', 0)

cika shiga (ciki vs waje)

Ta hanyar tsoho, haɗawa zai watsar da wuraren da babu bayanai (na ciki).
Tare da cika ('null'), za a yi haɗin waje, bayan haka kuna buƙatar yin tsoho () kuma ku cika ƙimar da ba komai:

var data = res1
    |join(res2)
        .as('res1', 'res2)
        .fill('null')
    |default()
        .field('res1.value', 0.0)
        .field('res2.value', 100.0)

Har yanzu akwai nuance a nan. A cikin misalin da ke sama, idan ɗaya daga cikin jerin (res1 ko res2) ya zama fanko, jerin da aka samu (bayanai) su ma za su zama fanko. Akwai tikiti da yawa akan wannan batu akan Github (1633, 1871, 6967) - muna jiran gyarawa da wahala kadan.

Amfani da yanayi a cikin lissafi (idan a cikin lambda)

|eval(lambda: if("value" > 0, true, false)

Mintuna biyar na ƙarshe daga bututun don lokacin

Misali, kuna buƙatar kwatanta ƙimar mintuna biyar na ƙarshe da makon da ya gabata. Kuna iya ɗaukar batches biyu na bayanai a cikin batches guda biyu ko cire wani ɓangare na bayanan daga babban lokaci:

 |where(lambda: duration((unixNano(now()) - unixNano("time"))/1000, 1u) < 5m)

Wani madadin na mintuna biyar na ƙarshe shine amfani da BarrierNode, wanda ke yanke bayanai kafin ƙayyadadden lokacin:

|barrier()
        .period(5m)

Misalai na amfani da samfuran Go a cikin saƙo

Samfura sun dace da tsari daga fakitin rubutu.samfuriA ƙasa akwai wasu wasanin gwada ilimi akai-akai.

idan-kuma

Muna tsara abubuwa kuma ba mu sake jawo mutane da rubutu ba:

|alert()
    ...
    .message(
        '{{ if eq .Level "OK" }}It is ok now{{ else }}Chief, everything is broken{{end}}'
    )

Lambobi biyu bayan maki goma a cikin saƙo

Inganta iya karanta saƙon:

|alert()
    ...
    .message(
        'now value is {{ index .Fields "value" | printf "%0.2f" }}'
    )

Fadada masu canji a saƙo

Muna nuna ƙarin bayani a cikin saƙon don amsa tambayar "Me yasa kuke ihu"?

var warnAlert = 10
  |alert()
    ...
    .message(
       'Today value less then '+string(warnAlert)+'%'
    )

Mai gano faɗakarwa na musamman

Wannan abu ne da ya wajaba idan akwai rukuni sama da ɗaya a cikin bayanan, in ba haka ba za a samar da faɗakarwa ɗaya kawai:

|alert()
      ...
      .id('{{ index .Tags "myname" }}/{{ index .Tags "myfield" }}')

Mai sarrafa al'ada

Babban jerin masu sarrafa sun haɗa da exec, wanda ke ba ku damar aiwatar da rubutun ku tare da sigogin da aka wuce (stdin) - kerawa kuma babu wani abu!

Ɗaya daga cikin al'adun mu shine ƙaramin rubutun Python don aika sanarwa zuwa slack.
Da farko, muna son aika hoton grafana mai kariya a cikin saƙo. Bayan haka, rubuta Ok a cikin zaren zuwa faɗakarwar da ta gabata daga rukuni ɗaya, ba azaman saƙo na daban ba. Daga baya kadan - ƙara zuwa saƙon kuskuren da aka fi sani a cikin mintuna X na ƙarshe.

Wani batu na daban shine sadarwa tare da wasu ayyuka da duk wani aiki da faɗakarwa ya fara (kawai idan saka idanu yana aiki sosai).
Misalin bayanin mai kulawa, inda slack_handler.py shine rubutun mu na kanmu:

topic: slack_graph
id: slack_graph.alert
match: level() != INFO AND changed() == TRUE
kind: exec
options:
  prog: /sbin/slack_handler.py
  args: ["-c", "CHANNELID", "--graph", "--search"]

Yadda za a gyara kuskure?

Zaɓin tare da fitowar log

|log()
      .level("error")
      .prefix("something")

Watch (cli): kapacitor -url host-or-ip:9092 rajistan ayyukan lvl=kuskure

Zabi tare da httpOut

Yana nuna bayanai a cikin bututun na yanzu:

|httpOut('something')

Duba (samu): host-or-ip:9092/kapacitor/v1/tasks/task_name/thing

Tsarin aiwatarwa

  • Kowane ɗawainiya yana mayar da itacen kisa tare da lambobi masu amfani a cikin tsari zane.
  • Dauki toshe dot.
  • Manna shi a cikin mai kallo, ji dadin.

A ina kuma za ku iya samun rake?

timestamp a cikin influxdb akan rubutawa

Misali, muna saita faɗakarwa don jimlar buƙatun awa ɗaya (groupBy(1h)) kuma muna son yin rikodin faɗakarwar da ta faru a cikin influxdb (don nuna gaskiyar matsalar akan jadawali a grafana).

influxDBOut() zai rubuta ƙimar lokacin daga faɗakarwa zuwa tambarin lokaci; saboda haka, za a rubuta batu akan ginshiƙi tun da farko/bayan faɗakarwar ta iso.

Lokacin da ake buƙatar daidaito: muna aiki a kusa da wannan matsala ta hanyar kiran mai kula da al'ada, wanda zai rubuta bayanai zuwa influxdb tare da tambarin lokaci na yanzu.

docker, ginawa da turawa

A farawa, kapacitor na iya ɗaukar ayyuka, samfuri da masu aiki daga kundin adireshi da aka ƙayyade a cikin saitin a cikin toshe [load].

Don ƙirƙirar ɗawainiya daidai, kuna buƙatar abubuwa masu zuwa:

  1. Sunan fayil – an faɗaɗa shi zuwa id/name na rubutun
  2. Nau'in - rafi / tsari
  3. dbrp - keyword don nuna wace bayanan bayanai + manufofin da rubutun ke gudana (dbrp "mai ba da kaya.""autogen")

Idan wasu aikin batch ba su ƙunshi layi tare da dbrp ba, duk sabis ɗin zai ƙi farawa kuma da gaske za su rubuta game da shi a cikin log ɗin.

A cikin chronograf, akasin haka, wannan layin bai kamata ya kasance ba; ba a karɓa ta hanyar dubawa kuma yana haifar da kuskure.

Hack lokacin gina akwati: Dockerfile yana fita tare da -1 idan akwai layi tare da //.+dbrp, wanda zai ba ku damar fahimtar dalilin rashin nasarar yayin hada ginin.

shiga daya zuwa da yawa

Misalin ɗawainiya: kuna buƙatar ɗaukar kashi 95 na lokacin aiki na sabis na mako guda, kwatanta kowane minti na 10 na ƙarshe da wannan ƙimar.

Ba za ku iya yin haɗin kai ɗaya-da-yawa ba, na ƙarshe/ma'ana/matsakaici akan rukunin maki yana juya kumburin zuwa rafi, kuskuren “ba zai iya ƙara gefuna da bai dace da yara ba: batch -> rafi” za a dawo.

Sakamakon tsari, a matsayin mai canzawa a cikin lambda, shima ba a maye gurbinsa ba.

Akwai zaɓi don adana lambobi masu mahimmanci daga rukunin farko zuwa fayil ta udf kuma loda wannan fayil ta hanyar lodin gefe.

Me muka warware da wannan?

Muna da masu samar da otal kusan 100, kowannensu yana iya samun haɗin gwiwa da yawa, bari mu kira shi tashar. Akwai kusan 300 na waɗannan tashoshi, kowane tashoshi na iya faɗuwa. Daga cikin duk ma'aunin da aka yi rikodin, za mu saka idanu akan ƙimar kuskure (buƙatun da kurakurai).

Me yasa ba grafana?

Kuskuren faɗakarwar da aka saita a cikin Grafana suna da asara da yawa. Wasu suna da mahimmanci, wasu za ku iya rufe idanunku, dangane da yanayin.

Grafana bai san yadda ake lissafta tsakanin ma'auni + faɗakarwa ba, amma muna buƙatar ƙimar (buƙatun-kurakurai)/ buƙatun.

Kurakurai suna da kyau:

Dabaru don sarrafa awo a cikin Kapacitor

Kuma ƙarancin mugunta idan an duba shi tare da buƙatun nasara:

Dabaru don sarrafa awo a cikin Kapacitor

Da kyau, za mu iya ƙididdige ƙimar sabis kafin grafana, kuma a wasu lokuta wannan zai yi aiki. Amma ba a namu ba, saboda... ga kowane tashoshi nasa rabo ana daukarsa "al'ada", kuma faɗakarwa aiki bisa ga a tsaye dabi'u (muna neman su da idanunmu, canza su idan akwai m faɗakarwa).

Waɗannan su ne misalan "al'ada" don tashoshi daban-daban:

Dabaru don sarrafa awo a cikin Kapacitor

Dabaru don sarrafa awo a cikin Kapacitor

Mun yi watsi da batu na baya kuma muna ɗauka cewa hoton "al'ada" yayi kama da duk masu kaya. Yanzu komai yana da kyau, kuma za mu iya samun ta tare da faɗakarwa a cikin grafana?
Za mu iya, amma da gaske ba ma so, saboda dole ne mu zaɓi ɗaya daga cikin zaɓuɓɓuka:
a) Yi hotuna da yawa don kowane tashoshi daban (kuma tare da su cikin raɗaɗi)
b) bar ginshiƙi ɗaya tare da duk tashoshi (kuma ku ɓace cikin layukan launuka da faɗakarwa na musamman)

Dabaru don sarrafa awo a cikin Kapacitor

Yaya kuka yi?

Har ila yau, akwai misali mai kyau na farawa a cikin takardun (Ƙididdiga ƙididdiga a cikin jerin haɗin gwiwa), ana iya leƙawa ko ɗauka azaman tushe a cikin irin waɗannan matsalolin.

Abin da muka yi a ƙarshe:

  • shiga jerin biyu a cikin ƴan sa'o'i kaɗan, haɗa ta tashoshi;
  • cika jerin ta rukuni idan babu bayanai;
  • kwatanta matsakaicin minti 10 na ƙarshe tare da bayanan baya;
  • muna ihu idan mun sami wani abu;
  • muna rubuta ƙididdigar ƙididdiga da faɗakarwa waɗanda suka faru a cikin influxdb;
  • aika sako mai amfani ga rashin hankali.

A ganina, mun sami nasarar cimma duk abin da muke so mu samu a ƙarshe (har ma da ɗan ƙarami tare da masu sarrafa al'ada) da kyau sosai.

Kuna iya gani akan github.com misali code и ƙaramin kewayawa (graphviz) rubutun sakamakon.

Misalin lambar da aka samu:

dbrp "supplier"."autogen"
var name = 'requests.rate'
var grafana_dash = 'pczpmYZWU/mydashboard'
var grafana_panel = '26'
var period = 8h
var todayPeriod = 10m
var every = 1m
var warnAlert = 15
var warnReset = 5
var reqQuery = 'SELECT sum("count") AS value FROM "supplier"."autogen"."requests"'
var errQuery = 'SELECT sum("count") AS value FROM "supplier"."autogen"."errors"'

var prevErr = batch
    |query(errQuery)
        .period(period)
        .every(every)
        .groupBy(1m, 'channel', 'supplier')

var prevReq = batch
    |query(reqQuery)
        .period(period)
        .every(every)
        .groupBy(1m, 'channel', 'supplier')

var rates = prevReq
    |join(prevErr)
        .as('req', 'err')
        .tolerance(1m)
        .fill('null')
    // заполняем значения нулями, если их не было
    |default()
        .field('err.value', 0.0)
        .field('req.value', 0.0)
    // if в lambda: считаем рейт, только если ошибки были
    |eval(lambda: if("err.value" > 0, 100.0 * (float("req.value") - float("err.value")) / float("req.value"), 100.0))
        .as('rate')

// записываем посчитанные значения в инфлюкс
rates
    |influxDBOut()
        .quiet()
        .create()
        .database('kapacitor')
        .retentionPolicy('autogen')
        .measurement('rates')

// выбираем данные за последние 10 минут, считаем медиану
var todayRate = rates
    |where(lambda: duration((unixNano(now()) - unixNano("time")) / 1000, 1u) < todayPeriod)
    |median('rate')
        .as('median')

var prevRate = rates
    |median('rate')
        .as('median')

var joined = todayRate
    |join(prevRate)
        .as('today', 'prev')
    |httpOut('join')

var trigger = joined
    |alert()
        .warn(lambda: ("prev.median" - "today.median") > warnAlert)
        .warnReset(lambda: ("prev.median" - "today.median") < warnReset)
        .flapping(0.25, 0.5)
        .stateChangesOnly()
        // собираем в message ссылку на график дашборда графаны
        .message(
            '{{ .Level }}: {{ index .Tags "channel" }} err/req ratio ({{ index .Tags "supplier" }})
{{ if eq .Level "OK" }}It is ok now{{ else }}
'+string(todayPeriod)+' median is {{ index .Fields "today.median" | printf "%0.2f" }}%, by previous '+string(period)+' is {{ index .Fields "prev.median" | printf "%0.2f" }}%{{ end }}
http://grafana.ostrovok.in/d/'+string(grafana_dash)+
'?var-supplier={{ index .Tags "supplier" }}&var-channel={{ index .Tags "channel" }}&panelId='+string(grafana_panel)+'&fullscreen&tz=UTC%2B03%3A00'
        )
        .id('{{ index .Tags "name" }}/{{ index .Tags "channel" }}')
        .levelTag('level')
        .messageField('message')
        .durationField('duration')
        .topic('slack_graph')

// "today.median" дублируем как "value", также пишем в инфлюкс остальные филды алерта (keep)
trigger
    |eval(lambda: "today.median")
        .as('value')
        .keep()
    |influxDBOut()
        .quiet()
        .create()
        .database('kapacitor')
        .retentionPolicy('autogen')
        .measurement('alerts')
        .tag('alertName', name)

Menene ƙarshe?

Kapacitor yana da kyau a yin faɗakarwar sa ido tare da gungun ƙungiyoyi, yin ƙarin ƙididdiga bisa ma'aunin da aka riga aka yi rikodin, yin ayyuka na al'ada da rubutun gudana (udf).

Shamakin shiga ba shi da tsayi sosai - gwada shi idan grafana ko wasu kayan aikin ba su cika cika sha'awar ku ba.

source: www.habr.com

Add a comment