Waxaan soo saareynaa isku xirka ugu habboon adduunka* si loo daawado diiwaannada

Waxaan soo saareynaa isku xirka ugu habboon adduunka* si loo daawado diiwaannada Haddii aad waligaa u isticmaashay is-dhexgalka shabakada si aad u daawato diiwaanka, waxa ay u badantahay in aad dareentay sida ay u dhib badan yihiin iyo (badanaa) aan gaar ahaan isticmaale-saaxiibtinimo ama jawaab celin. Qaarkood way fududahay in la qabsado, kuwa kalena waa kuwo aad u xun, laakiin waxaan aaminsanahay in dhammaan dhibaatooyinkan oo dhan ay yihiin in aan ku soo dhowaaneyno aragtida log si khaldan: iskudayga in la abuuro interface web halkaas oo CLI (command line interface) si fiican u shaqeeyo. Shakhsi ahaan, aad ayaan ugu qanacsanahay inaan ku shaqeeyo dabada, grep, awk, iyo wixii la mid ah, markaa aniga ahaan, interface-ka ugu fiican ee ku shaqeynta logyada waxay noqon doontaa wax la mid ah dabada iyo grep, laakiin sidoo kale loo isticmaali karo akhrinta diiwaannada server-yo badan. Taas macnaheedu waa, dabcan, iyaga oo ka akhrinaya ClickHouse!

* sida ku cad ra'yiga shakhsi ahaaneed ee isticmaalaha Habr ROCK

La kulan logscli

Maan la iman magac isku-xidhkayga, iyo si daacad ah, waa wax badan oo tusaale ah, laakiin haddii aad rabto inaad aragto koodka isha isla markiiba, waa lagu soo dhaweynayaa: https://github.com/YuriyNasretdinov/logscli (350 sadar oo Go code la doortay).

Qaababka

Hadafkayagu waxa uu ahaa in aan sameeyo interface dareensan in ay yaqaanaan kuwa loo isticmaalo dabada/grep, taasoo la macno ah in ay taageerto waxyaabaha soo socda:

  1. Eeg dhammaan diiwaannada adiga oo aan shaandhayn.
  2. Hayso xariiqyo ka kooban xargo hoosaad go'an (calan -F Ρƒ grep).
  3. Ka dhig xariiqyo ku habboon tibaaxaha caadiga ah (calanka -E Ρƒ grep).
  4. Sida caadiga ah, daawashada waxay u socotaa siday u kala horreeyeen, maadaama qoraaladii ugu dambeeyay ay inta badan xiiseeyaan marka hore.
  5. Muuji macnaha guud ee ku xiga khad kasta (doorashooyinka) -A, -B ΠΈ -C Ρƒ grep, Daabacaadda N ka hor, ka dib, iyo agagaarka xariiq kasta oo ku habboon, siday u kala horreeyaan).
  6. Arag diiwaannada soo galaya wakhtiga dhabta ah, leh ama la'aan shaandhayn (asal ahaan tail -f | grep).
  7. Interface waa in uu la jaanqaadaa less, head, tail iyo kuwa kale - sida caadiga ah, natiijooyinka waa in lagu soo celiyaa iyada oo aan wax xaddidnayn tiradooda; khadadka waxaa lagu daabacaa qulqulka ilaa inta isticmaaluhu uu xiiseynayo inuu helo; calaamad SIGPIPE waa in si aamusnaan ah u joojiyaa diiwaannada qulqulka, sida ay sameeyaan tail, grep iyo adeegyada kale ee UNIX.

РСализация

Waxaan u qaadan doonaa inaad hore u haysatid hab aad ku geyn karto diiwaannada ClickHouse. Haddaysan ahayn, waxaan ku talinayaa inaad tijaabiso. lsd ΠΈ kittenhouseMarkaasay maqaalkan ku saabsan geynta log.

Marka hore, waxaad u baahan tahay inaad go'aan ka gaarto schema database. Maaddaama logu caadi ahaan lagu kala soocaa waqti, waxay u muuqataa mid macquul ah in sidaas lagu kaydiyo. Haddii aad leedahay qaybo badan oo log ah oo ay dhammaantood isku nooc yihiin, waxaad ka dhigi kartaa qaybta log tiirka koowaad ee furaha aasaasiga ah. Tani waxay kuu ogolaaneysaa inaad haysato hal miis halkii aad ka heli lahayd dhowr, taas oo noqon doonta faa'iido weyn markaad geliso ClickHouse (serverrada leh darawallada adag, waxaa lagula talinayaa inaad geliso xogta wax ka badan ~ 1 mar ilbiriqsi kasta). serverka oo dhan).

Taasi waa, waxaan u baahanahay qiyaastii nidaamka miiska soo socda:

CREATE TABLE logs(
    category LowCardinality(String), -- катСгория Π»ΠΎΠ³ΠΎΠ² (ΠΎΠΏΡ†ΠΈΠΎΠ½Π°Π»ΡŒΠ½ΠΎ)
    time DateTime, -- врСмя события
    millis UInt16, -- миллисСкунды (ΠΌΠΎΠ³ΡƒΡ‚ Π±Ρ‹Ρ‚ΡŒ ΠΈ микросСкунды, ΠΈ Ρ‚.Π΄.): рСкомСндуСтся Ρ…Ρ€Π°Π½ΠΈΡ‚ΡŒ, Ссли событий ΠΌΠ½ΠΎΠ³ΠΎ, Ρ‡Ρ‚ΠΎΠ±Ρ‹ Π±Ρ‹Π»ΠΎ Π»Π΅Π³Ρ‡Π΅ Ρ€Π°Π·Π»ΠΈΡ‡Π°Ρ‚ΡŒ события ΠΌΠ΅ΠΆΠ΄Ρƒ собой
    ..., -- ваши собствСнныС поля, Π½Π°ΠΏΡ€ΠΈΠΌΠ΅Ρ€ имя сСрвСра, ΡƒΡ€ΠΎΠ²Π΅Π½ΡŒ логирования, ΠΈ Ρ‚Π°ΠΊ Π΄Π°Π»Π΅Π΅
    message String -- тСкст сообщСния
) ENGINE=MergeTree()
ORDER BY (category, time, millis)

Nasiib darro, isla markiiba ma helin ilo furan oo leh qoraallo macquul ah oo aan soo dejin karo, markaa tan beddelkeeda waxaan u adeegsaday tusaale ahaan. Qiimaynta alaabta Amazon ka hor 2015Dabcan, qaabdhismeedkoodu maaha mid la mid ah kan qoraallada qoraalka ah, laakiin tusaale ahaan tani muhiim maaha.

Tilmaamaha soo dejinta Amazon dib u eegista ClickHouse

Aan samayno miis:

CREATE TABLE amazon(
   review_date Date,
   time DateTime DEFAULT toDateTime(toUInt32(review_date) * 86400 + rand() % 86400),
   millis UInt16 DEFAULT rand() % 1000,
   marketplace LowCardinality(String),
   customer_id Int64,
   review_id String,
   product_id LowCardinality(String),
   product_parent Int64,
   product_title String,
   product_category LowCardinality(String),
   star_rating UInt8,
   helpful_votes UInt32,
   total_votes UInt32,
   vine FixedString(1),
   verified_purchase FixedString(1),
   review_headline String,
   review_body String
)
ENGINE=MergeTree()
ORDER BY (time, millis)
SETTINGS index_granularity=8192

Xogta Amazon waxa ay ka kooban tahay taariikhda dib u eegista, laakiin ma aha wakhtiga saxda ah, markaa waxa aanu xogtan ku buuxin doonaa randon.

Uma baahnid inaad soo dejiso dhammaan faylasha TSV; Waxaad kaliya soo dejisan kartaa 10-20ka ugu horreeya, taas oo ku siin doonta xog badan oo kugu filan oo aan ku habboonayn 16 GB ee RAM. Si aan u geliyo faylasha TSV, waxaan isticmaalay amarka soo socda:

for i in *.tsv; do
    echo $i;
    tail -n +2 $i | pv |
    clickhouse-client --input_format_allow_errors_ratio 0.5 --query='INSERT INTO amazon(marketplace,customer_id,review_id,product_id,product_parent,product_title,product_category,star_rating,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body,review_date) FORMAT TabSeparated'
done

Disk-ka caadiga ah ee joogtada ah (oo ah HDD) ee Google Cloud oo cabbirkiisu yahay 1000 GB (waxaan u doortay cabbirkan inta badan si aan xawaaruhu wax yar uga sarreeyo, inkasta oo laga yaabo in SSD ee awoodda loo baahan yahay ay ka jaban tahay) xawaaruhu wuxuu ahaa qiyaastii ~ 75 MB / sec ee 4 cores.

  • Waa inaan tilmaamaa inaan u shaqeeyo Google, laakiin waxaan isticmaalay akoon shakhsi ah, maqaalkani shaqo kuma laha shaqadayda shirkadda.

Waxaan soo saari doonaa dhammaan sawirada anigoo isticmaalaya xogtan, maadaama ay tani tahay waxa aan gacanta ku hayo.

Muuji horumarka iskaanka xogta

Maadaama aan isticmaaleyno iskaanka buuxa ee miiska log ee ClickHouse, oo qalliinkan uu qaadan karo waqti aad u badan waxaana laga yaabaa in aan wax natiijo ah soo celin muddo dheer haddii dhowr kulan la helo, waxaa lagu talinayaa in la muujiyo horumarka weydiinta ilaa safafka natiijada ugu horreysa la soo celiyo. Ujeedadan awgeed, interface-ka HTTP wuxuu leeyahay halbeeg kuu ogolaanaya inaad ku muujiso horumarka madaxyada HTTP: send_progress_in_http_headers=1Nasiib darro, maktabadda caadiga ah ee Go ma akhrin karto madax sida loo helay, laakiin HTTP 1.0 interface (aan lagu khaldin 1.1!) Waxaa taageera ClickHouse, si aad u furto xiriir TCP cayriin ClickHouse oo u dir halkaas. GET /?query=... HTTP/1.0nn oo hel madaxyada jawaabta iyo jidhka iyada oo aan wax baxsan ama qarsoodi ah, markaa kiiskan xitaa uma baahnid isticmaalka maktabadda caadiga ah.

Diiwaanada qulqulka ee ClickHouse

ClickHouse waxa ay haysay wanaajinta su'aalaha ORDER BY muddo dheer hadda (ilaa 2019?), marka su'aal la mid ah

SELECT time, millis, message
FROM logs
WHERE message LIKE '%something%'
ORDER BY time DESC, millis DESC

Isla markiiba waxay bilaabi doontaa soo celinta khadadka leh xarafka-hoosaadka "wax" fariinta, iyada oo aan la sugin in sawirku dhammeeyo.

Waxa kale oo aad u habboonaan lahayd haddii ClickHouse uu si toos ah u baajiyo codsiga markii xidhiidhka la xidhay, laakiin tani maaha habdhaqanka caadiga ah. Baabi'inta tooska ah ee codsiga waa la dami karaa iyadoo la isticmaalayo ikhtiyaarka cancel_http_readonly_queries_on_client_close=1.

Si sax ah ula tacaalida SIGPIPE gudaha Go

Markaad fulinayso, dheh, amar some_cmd | head -n 10, sida saxda ah kooxda some_cmd ay joojiso dilkeeda marka head Akhri 10 sadar? Jawaabtu waa sahlan tahay: goorma head dhamaado, tuubada ayaa xirta, iyo stdout ee amarka some_cmd wuxuu bilaabmaa inuu tilmaamo, caadiyan, "meelna ma jiro." Goorma some_cmd isku dayaya inuu wax u qoro tuubo xidhan, waxay helaysaa calaamada SIGPIPE, kaas oo si caadi ah u joojiya barnaamijka.

Go, tani waxay sidoo kale ku dhacdaa si caadi ah, laakiin maamulaha calaamadaha SIGPIPE wuxuu sidoo kale daabacaa "signal: SIGPIPE" ama fariin la mid ah dhamaadka, iyo si aad u saarto fariintan, kaliya waxaad u baahan tahay inaad u qabato SIGPIPE naftaada sida aad rabto, taas oo ah, si aamusnaan ah uga bax:

ch := make(chan os.Signal)
signal.Notify(ch, syscall.SIGPIPE)
go func() {
    <-ch
    os.Exit(0)
}()

Muuji fariinta macnaha guud

Badana waxa aad rabtaa in aad aragto macnaha guud ee uu khaladku ku dhacay (tusaale, codsigaa keenay argagaxa, ama dhibaatooyinka la xidhiidha shilka ka hor) grep Xulashooyinka -A, -B, iyo -C ayaa loo isticmaalaa ujeedadan, kuwaas oo muujinaya tirada la cayimay ee khadadka ka dib, ka hor, iyo hareeraha fariinta, siday u kala horreeyaan.

Nasiib darro, ma helin hab fudud oo lagu sameeyo ClickHouse, si loo muujiyo macnaha guud, waydiimo dheeraad ah ayaa loo diraa saf kasta oo natiijada ah, wax sidan oo kale ah (faahfaahintu waxay ku xiran tahay kala-soocidda iyo haddii macnaha guud la muujiyay ka hor ama ka dib):

SELECT time,millis,review_body FROM amazon
WHERE (time = 'Π’Π Π•ΠœΠ―_Π‘ΠžΠ‘Π«Π’Π˜Π―' AND millis < ΠœΠ˜Π›Π›Π˜Π‘Π•ΠšΠ£ΠΠ”Π«_Π‘ΠžΠ‘Π«Π’Π˜Π―) OR (time < 'Π’Π Π•ΠœΠ―_Π‘ΠžΠ‘Π«Π’Π˜Π―')
ORDER BY time DESC, millis DESC
LIMIT ΠšΠžΠ›Π˜Π§Π•Π‘Π’Π’Πž_БВРОК_ΠšΠžΠΠ’Π•ΠšΠ‘Π’Π
SETTINGS max_threads=1

Maadaama codsiga la soo diro isla markiiba ka dib markii ClickHouse uu soo celiyo safka u dhigma, wuxuu ku dhamaanayaa kaydka iyo, guud ahaan, codsiga si cadaalad ah ayaa loo fuliyaa waxayna isticmaashaa CPU yar (sida caadiga ah codsigu wuxuu qaataa ~ 6 ms mashiinka farsamada gacanta).

Muuji fariimaha cusub wakhtiga dhabta ah

Si loo muujiyo fariimaha soo socda (ku dhawaad) wakhtiga dhabta ah, waxaanu si fudud u fulinaa codsiga dhawrkii ilbiriqsi kasta, anagoo xasuusanayna shaambada wakhtiga ee ugu dambaysay ee aanu hore ula kulanay.

Tusaalooyinka amarka

Sidee buu u eg yahay amarada logscli ee caadiga ah?

Haddii aad soo dejisay kaydka xogta Amazon ee aan ku sheegay bilowga maqaalka, waxaad socodsiin kartaa amarada soo socda:

# ΠŸΠΎΠΊΠ°Π·Π°Ρ‚ΡŒ строки, Π³Π΄Π΅ встрСчаСтся слово walmart
$ logscli -F 'walmart' | less

# ΠŸΠΎΠΊΠ°Π·Π°Ρ‚ΡŒ самыС свСТиС 10 строк, Π³Π΄Π΅ встрСчаСтся "terrible"
$ logscli -F terrible -limit 10

# Во ТС самоС бСз -limit:
$ logscli -F terrible | head -n 10

# ΠŸΠΎΠΊΠ°Π·Π°Ρ‚ΡŒ всС строки, подходящиС ΠΏΠΎΠ΄ /times [0-9]/, написанныС для vine ΠΈ Ρƒ ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Ρ… высокий Ρ€Π΅ΠΉΡ‚ΠΈΠ½Π³
$ logscli -E 'times [0-9]' -where="vine='Y' AND star_rating>4" | less

# ΠŸΠΎΠΊΠ°Π·Π°Ρ‚ΡŒ всС строки со словом "panic" ΠΈ 3 строки контСкста Π²ΠΎΠΊΡ€ΡƒΠ³
$ logscli -F 'panic' -C 3 | less

# НСпрСрывно ΠΏΠΎΠΊΠ°Π·Ρ‹Π²Π°Ρ‚ΡŒ Π½ΠΎΠ²Ρ‹Π΅ строки со словом "5-star"
$ logscli -F '5-star' -tailf

tixraacyada

Koodhka utility (la'aan dukumeenti) ayaa laga heli karaa github at https://github.com/YuriyNasretdinov/logscliWaxaan jeclaan lahaa inaan maqlo fikradahaaga ku saabsan fikradayda ku saabsan isku xirka konsole-ku-saleysan ClickHouse ee daawashada diiwaannada

Source: www.habr.com

U soo iibso martigelin lagu kalsoonaan karo oo loogu talagalay bogagga leh ilaalinta DDoS, VPS VDS servers πŸ”₯ Iibso martigelin degel oo lagu kalsoonaan karo oo leh ilaalinta DDoS, VPS VDS servers | ProHoster