Muna haɓaka mafi dacewa mu'amala a duniya* don duba rajistan ayyukan

Muna haɓaka mafi dacewa mu'amala a duniya* don duba rajistan ayyukan Idan kun taɓa yin amfani da mu'amalar yanar gizo don duba rajistan ayyukan, to tabbas kun lura da yadda, a matsayinka na mai mulki, waɗannan mu'amala suna da wahala kuma (sau da yawa) ba su dace sosai da amsawa ba. Wasu za ku iya amfani da su, wasu suna da muni sosai, amma ga alama a gare ni cewa dalilin duk matsalolin shine mun kusanci aikin duba rajistan ayyukan ba daidai ba: muna ƙoƙari mu ƙirƙiri hanyar yanar gizo inda CLI (madaidaicin layin umarni) yana aiki mafi kyau. Ni da kaina ina jin daɗin yin aiki tare da wutsiya, grep, awk da sauransu, sabili da haka a gare ni ingantacciyar hanyar sadarwa don aiki tare da rajistan ayyukan zai zama wani abu mai kama da wutsiya da grep, amma wanda kuma ana iya amfani dashi don karanta rajistan ayyukan da suka fito daga sabobin da yawa. Wato, ba shakka, karanta su daga ClickHouse!

*bisa ga ra'ayin mai amfani da habra ROCK

Haɗu da logscli

Ban fito da suna don dubawa ta ba, kuma, a gaskiya, yana wanzuwa ta hanyar samfuri, amma idan kuna son ganin lambar tushe nan da nan, to kuna maraba: https://github.com/YuriyNasretdinov/logscli (Layi 350 na lambar Go da aka zaɓa).

Ayyukan

Burina shi ne in yi masarrafar da za ta zama sananne ga waɗanda aka yi amfani da su zuwa wutsiya/grep, wato, don tallafawa abubuwa masu zuwa:

  1. Duba duk rajistan ayyukan, ba tare da tacewa ba.
  2. Bar layukan da ke ɗauke da ƙayyadaddun igiyar ƙasa (tuta -F у grep).
  3. Bar layukan da suka dace da magana ta yau da kullun (tuta -E у grep).
  4. Ta hanyar tsoho, kallo yana komawa baya tsarin lokaci, tunda mafi yawan rajistan ayyukan kwanan nan galibi suna da sha'awa da farko.
  5. Nuna mahallin kusa da kowane layi (zaɓuɓɓuka -A, -B и -C у grep, Buga layin N kafin, bayan, da kuma kewaye da kowane layin da ya dace, bi da bi).
  6. Duba rajistan ayyukan masu shigowa cikin ainihin lokaci, tare da ko ba tare da tacewa (mahimmanci tail -f | grep).
  7. Dole ne abin dubawa ya dace da shi less, head, tail da sauransu - ta hanyar tsoho, ya kamata a dawo da sakamakon ba tare da ƙuntatawa akan adadin su ba; ana buga layuka azaman rafi muddin mai amfani yana sha'awar karɓar su; sigina SIGPIPE ya kamata a yi shiru katse ayyukan log, kamar yadda suke yi tail, grep da sauran kayan aikin UNIX.

Aiwatarwa

Zan ɗauka cewa kun riga kun san yadda ake sadar da rajistan ayyukan zuwa ClickHouse. Idan ba haka ba, ina ba da shawarar gwada shi lsd и gidan kittenKuma wannan labarin game da isar da log.

Da farko kuna buƙatar yanke shawara akan tsarin tushe. Tun da yawanci kuna son karɓar rajistan ayyukan da aka jera su ta lokaci, yana da ma'ana don adana su haka. Idan akwai nau'ikan log da yawa kuma dukkansu iri ɗaya ne, to zaku iya yin nau'in log ɗin azaman ginshiƙi na farko na maɓallin farko - wannan zai ba ku damar samun tebur ɗaya maimakon da yawa, wanda zai zama babban ƙari lokacin sakawa cikin ClickHouse (akan sabobin da ke da rumbun kwamfyuta, ana ba da shawarar saka bayanai ba fiye da ~ 1 sau a sakan daya ba. ga dukan uwar garken).

Wato, muna buƙatar kusan tsarin tebur mai zuwa:

CREATE TABLE logs(
    category LowCardinality(String), -- категория логов (опционально)
    time DateTime, -- время события
    millis UInt16, -- миллисекунды (могут быть и микросекунды, и т.д.): рекомендуется хранить, если событий много, чтобы было легче различать события между собой
    ..., -- ваши собственные поля, например имя сервера, уровень логирования, и так далее
    message String -- текст сообщения
) ENGINE=MergeTree()
ORDER BY (category, time, millis)

Abin takaici, nan da nan ba zan iya samun buɗaɗɗen tushe tare da ingantattun rajistan ayyukan da zan iya ɗauka da saukewa ba, don haka na ɗauki wannan a matsayin misali. sake dubawa na samfurori daga Amazon har zuwa 2015. Tabbas, tsarin su ba daidai yake da na rajistan ayyukan rubutu ba, amma don dalilai na kwatanta wannan ba shi da mahimmanci.

umarnin don loda bita na Amazon zuwa ClickHouse

Bari mu ƙirƙiri tebur:

CREATE TABLE amazon(
   review_date Date,
   time DateTime DEFAULT toDateTime(toUInt32(review_date) * 86400 + rand() % 86400),
   millis UInt16 DEFAULT rand() % 1000,
   marketplace LowCardinality(String),
   customer_id Int64,
   review_id String,
   product_id LowCardinality(String),
   product_parent Int64,
   product_title String,
   product_category LowCardinality(String),
   star_rating UInt8,
   helpful_votes UInt32,
   total_votes UInt32,
   vine FixedString(1),
   verified_purchase FixedString(1),
   review_headline String,
   review_body String
)
ENGINE=MergeTree()
ORDER BY (time, millis)
SETTINGS index_granularity=8192

A cikin bayanan Amazon akwai kwanan wata don bita, amma babu takamaiman lokaci, don haka bari mu cika wannan bayanan tare da randon.

Ba dole ba ne ka zazzage duk fayilolin tsv kuma ka iyakance kanka zuwa ~ 10-20 na farko don samun babban saitin bayanan da ba zai dace da 16 GB na RAM ba. Don loda fayilolin TSV na yi amfani da umarni mai zuwa:

for i in *.tsv; do
    echo $i;
    tail -n +2 $i | pv |
    clickhouse-client --input_format_allow_errors_ratio 0.5 --query='INSERT INTO amazon(marketplace,customer_id,review_id,product_id,product_parent,product_title,product_category,star_rating,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body,review_date) FORMAT TabSeparated'
done

A kan daidaitaccen Disk na dindindin (wanda shine HDD) a cikin Google Cloud mai girman 1000 GB (Na ɗauki wannan girman musamman don gudun ya ɗan yi girma, kodayake watakila SSD na girman da ake buƙata zai kasance mai rahusa) lodawa. gudun ya kusan ~ 75 MB/sec akan 4 cores.

  • Dole ne in yi ajiyar cewa ina aiki a Google, amma na yi amfani da asusun sirri kuma wannan labarin ba shi da alaƙa da aikina a kamfanin.

Zan samar da duk misalai tare da wannan takamaiman bayanan, tunda wannan shine abin da nake da shi a hannu.

Nuna ci gaban binciken bayanai

Tun da a ClickHouse za mu yi amfani da cikakken scan akan tebur tare da rajistan ayyukan, kuma wannan aiki na iya ɗaukar lokaci mai yawa kuma bazai haifar da wani sakamako na dogon lokaci ba idan an sami 'yan ashana, yana da kyau a iya nuna alamar. ci gaban tambayar har sai an karɓi layuka na farko tare da sakamakon. Don yin wannan, akwai ma'auni a cikin mahallin HTTP wanda ke ba ku damar aika ci gaba a cikin taken HTTP: send_progress_in_http_headers=1. Abin takaici, daidaitaccen ɗakin karatu na Go ba zai iya karanta rubutun kai kamar yadda aka karɓa ba, amma HTTP 1.0 interface (kar a ruɗe shi da 1.1!) ClickHouse yana da goyan bayan, don haka za ku iya buɗe haɗin TCP mai sauƙi zuwa ClickHouse kuma aika shi a can. GET /?query=... HTTP/1.0nn da karɓar taken amsawa da jiki ba tare da tserewa ko ɓoyewa ba, don haka a wannan yanayin ba ma buƙatar amfani da madaidaicin ɗakin karatu ba.

Rubutun yawo daga ClickHouse

ClickHouse ya sami ingantawa don tambayoyi tare da ORDER BY na dogon lokaci (tun 2019?), don haka tambaya kamar

SELECT time, millis, message
FROM logs
WHERE message LIKE '%something%'
ORDER BY time DESC, millis DESC

Nan da nan za ta fara dawo da layukan da ke da substring "wani abu" a cikin saƙonsu, ba tare da jiran binciken ya ƙare ba.

Hakanan, zai zama dacewa sosai idan ClickHouse da kansa ya soke buƙatar lokacin da aka rufe haɗin kai, amma wannan ba dabi'ar tsoho bane. Ana iya kunna sokewar buƙatar ta atomatik ta amfani da zaɓi cancel_http_readonly_queries_on_client_close=1.

Daidaitaccen sarrafa SIGPIPE a cikin Go

Lokacin da kuke aiwatarwa, faɗi, umarnin some_cmd | head -n 10, daidai yadda umarnin some_cmd yana dakatar da aiwatarwa lokacin head cire layi 10? Amsar ita ce mai sauƙi: yaushe head yana ƙarewa, bututun yana rufe, kuma stdout na umarnin some_cmd ya fara nunawa, cikin sharadi, "zuwa babu inda". Yaushe some_cmd yayi ƙoƙarin rubutawa a rufaffiyar bututu, yana karɓar siginar SIGPIPE, wanda ke dakatar da shirin ta hanyar tsohuwa.

A cikin Go wannan ma yana faruwa ne ta hanyar tsoho, amma mai sarrafa siginar SIGPIPE shima yana buga "signal: SIGPIPE" ko makamancin sa a karshen, kuma don share wannan sakon kawai muna buƙatar mu rike SIGPIPE kanmu yadda muke so, wato, shiru kawai. fita:

ch := make(chan os.Signal)
signal.Notify(ch, syscall.SIGPIPE)
go func() {
    <-ch
    os.Exit(0)
}()

Nuna mahallin saƙo

Sau da yawa kuna son ganin mahallin da wasu kurakurai suka faru a cikinsa (misali, wace buƙata ta haifar da firgita, ko kuma waɗanne matsaloli masu alaƙa da aka gani kafin hatsarin), da grep Ana yin wannan ta amfani da zaɓin -A, -B, da -C, waɗanda ke nuna ƙayyadaddun adadin layukan bayan, kafin, da kewayen saƙon, bi da bi.

Abin takaici, ban sami hanya mai sauƙi don yin haka ba a ClickHouse, don haka don nuna mahallin, ana aika ƙarin buƙatun irin wannan zuwa kowane layi na sakamakon (bayanin bayanai sun dogara da rarrabuwa da kuma ko an nuna mahallin a baya). ko bayan):

SELECT time,millis,review_body FROM amazon
WHERE (time = 'ВРЕМЯ_СОБЫТИЯ' AND millis < МИЛЛИСЕКУНДЫ_СОБЫТИЯ) OR (time < 'ВРЕМЯ_СОБЫТИЯ')
ORDER BY time DESC, millis DESC
LIMIT КОЛИЧЕСТВО_СТРОК_КОНТЕКСТА
SETTINGS max_threads=1

Tunda ana aika buƙatar kusan nan da nan bayan ClickHouse ya dawo da layin da ya dace, yana ƙarewa a cikin cache kuma gabaɗaya ana aiwatar da buƙatar da sauri kuma tana cinye ɗan CPU kaɗan (yawanci buƙatar tana ɗaukar kusan ~ 6 ms akan injina na kama-da-wane).

Nuna sabbin saƙonni a ainihin lokacin

Domin nuna saƙonni masu shigowa cikin (kusan) ainihin lokaci, kawai muna aiwatar da buƙatar sau ɗaya a kowane daƙiƙa kaɗan, tare da tunawa da tambarin lokaci na ƙarshe da muka ci karo da shi a baya.

Misalin umarni

Menene ainihin umarnin logscli yayi kama da aiki?

Idan kun zazzage saitin bayanan Amazon wanda na ambata a farkon labarin, zaku iya aiwatar da umarni masu zuwa:

# Показать строки, где встречается слово walmart
$ logscli -F 'walmart' | less

# Показать самые свежие 10 строк, где встречается "terrible"
$ logscli -F terrible -limit 10

# То же самое без -limit:
$ logscli -F terrible | head -n 10

# Показать все строки, подходящие под /times [0-9]/, написанные для vine и у которых высокий рейтинг
$ logscli -E 'times [0-9]' -where="vine='Y' AND star_rating>4" | less

# Показать все строки со словом "panic" и 3 строки контекста вокруг
$ logscli -F 'panic' -C 3 | less

# Непрерывно показывать новые строки со словом "5-star"
$ logscli -F '5-star' -tailf

nassoshi

Ana samun lambar amfani (ba tare da takaddun shaida ba) akan github a https://github.com/YuriyNasretdinov/logscli. Zan yi farin cikin jin ra'ayinku game da ra'ayina don ƙirar wasan bidiyo don duba rajistan ayyukan bisa ClickHouse.

source: www.habr.com

Add a comment