Sakha isixhumi esibonakalayo esikahle kakhulu emhlabeni* sokubuka izingodo

Sakha isixhumi esibonakalayo esikahle kakhulu emhlabeni* sokubuka izingodo Uma uke wasebenzisa izixhumanisi zewebhu ukuze ubuke izingodo, khona-ke mhlawumbe uye waqaphela ukuthi, njengomthetho, lezi zixhumanisi zinzima futhi (ngokuvamile) azilula kakhulu futhi aziphenduli. Ezinye ungazijwayeza, ezinye zimbi impela, kepha kimina kubonakala sengathi isizathu sazo zonke izinkinga ukuthi sisondela ngokungalungile emsebenzini wokubuka izingodo: sizama ukwakha isixhumi esibonakalayo sewebhu lapho i-CLI (i-command line interface) isebenza kangcono. Mina ngokwami ​​ngikhululekile kakhulu ukusebenza nomsila, i-grep, i-awk nabanye, ngakho-ke kimi isixhumi esibonakalayo esifanelekile sokusebenza ngamalogi kungaba into efana nomsila ne-grep, kodwa engasetshenziswa futhi ukufunda izingodo ezivela kumaseva amaningi . Okusho ukuthi, vele, uzifunde ku-ClickHouse!

*ngokombono womuntu osebenzisa i-habra uyashisa

Hlangana ne-logscli

Angizange ngiqhamuke negama lesixhumi esibonakalayo sami, futhi, uma ngikhuluma iqiniso, likhona ngendlela ye-prototype, kodwa uma ufuna ukubona ngokushesha ikhodi yomthombo, wamukelekile: https://github.com/YuriyNasretdinov/logscli (imigqa engu-350 yekhodi ye-Go ekhethiwe).

Izici

Umgomo wami bekuwukwenza ukuxhumana okungabonakala kujwayelekile kulabo abajwayele umsila/grep, okungukuthi, ukusekela izinto ezilandelayo:

  1. Buka wonke amalogi, ngaphandle kokuhlunga.
  2. Shiya imigqa equkethe uchungechunge olungaphansi olungashintshi (ifulegi -F у grep).
  3. Shiya imigqa ehambelana nenkulumo evamile (ifulegi -E у grep).
  4. Ngokuzenzakalela, ukubuka kulandelana ngokuhlehla, njengoba amalogi akamuva ngokuvamile anentshisekelo kuqala.
  5. Bonisa umongo eduze komugqa ngamunye (izinketho -A, -B и -C у grep, ukuphrinta imigqa engu-N ngaphambi, ngemva, naseduze komugqa ngamunye ofanayo, ngokulandelana).
  6. Buka izingodo ezingenayo ngesikhathi sangempela, ngokuhlunga noma ngaphandle kwazo (empeleni tail -f | grep).
  7. I-interface kufanele ihambisane nayo less, head, tail nabanye - ngokuzenzakalelayo, imiphumela kufanele ibuyiselwe ngaphandle kwemingcele enombolweni yabo; imigqa iphrintwa njengomfudlana inqobo nje uma umsebenzisi enentshisekelo yokuyithola; isignali SIGPIPE kufanele iphazamise buthule ukusakazwa kwelogi, njengoba benza tail, grep nezinye izinsiza ze-UNIX.

Ukuqaliswa

Ngizothatha ngokuthi usuvele wazi ngandlela thize ukuletha izingodo ku-ClickHouse. Uma kungenjalo, ngincoma ukuyizama lsd и indlu yekatiFuthi lesi sihloko mayelana nokulethwa kwelogi.

Okokuqala udinga ukunquma ngesikimu sesisekelo. Njengoba ngokuvamile ufuna ukwamukela amalogi ahlungwe ngesikhathi, kubonakala kunengqondo ukuwagcina kanjalo. Uma kunezigaba eziningi zamalogi futhi zonke ziwuhlobo olufanayo, khona-ke ungenza isigaba selogi njengekholomu yokuqala yokhiye oyinhloko - lokhu kuzokuvumela ukuthi ube netafula elilodwa esikhundleni sokumbalwa, okuzoba ukuhlanganisa okukhulu lapho ukufaka ku-ClickHouse (kumaseva anama-hard drive, kunconywa ukuthi ufake idatha izikhathi ezingaphezu kuka-~1 ngomzuzwana kuyo yonke iseva).

Okusho ukuthi, sidinga cishe uhlelo lwethebula elilandelayo:

CREATE TABLE logs(
    category LowCardinality(String), -- категория логов (опционально)
    time DateTime, -- время события
    millis UInt16, -- миллисекунды (могут быть и микросекунды, и т.д.): рекомендуется хранить, если событий много, чтобы было легче различать события между собой
    ..., -- ваши собственные поля, например имя сервера, уровень логирования, и так далее
    message String -- текст сообщения
) ENGINE=MergeTree()
ORDER BY (category, time, millis)

Ngeshwa, angikwazanga ukuthola ngokushesha noma yimiphi imithombo evulekile enamalogu angokoqobo engingakwazi ukuwabamba futhi ngiwalande, ngakho ngithathe lokhu njengesibonelo. ukubuyekezwa kwemikhiqizo evela e-Amazon ngaphambi kuka-2015. Yiqiniso, ukwakheka kwabo akufani ncamashi naleyo yezingodo zombhalo, kodwa ngezinhloso zemifanekiso lokhu akubalulekile.

imiyalo yokulayisha ukubuyekezwa kwe-Amazon ku-ClickHouse

Masidale itafula:

CREATE TABLE amazon(
   review_date Date,
   time DateTime DEFAULT toDateTime(toUInt32(review_date) * 86400 + rand() % 86400),
   millis UInt16 DEFAULT rand() % 1000,
   marketplace LowCardinality(String),
   customer_id Int64,
   review_id String,
   product_id LowCardinality(String),
   product_parent Int64,
   product_title String,
   product_category LowCardinality(String),
   star_rating UInt8,
   helpful_votes UInt32,
   total_votes UInt32,
   vine FixedString(1),
   verified_purchase FixedString(1),
   review_headline String,
   review_body String
)
ENGINE=MergeTree()
ORDER BY (time, millis)
SETTINGS index_granularity=8192

Kudathasethi ye-Amazon kunedethi yokubuyekezwa kuphela, kodwa asikho isikhathi esiqondile, ngakho-ke masigcwalise le datha nge-randon.

Awudingi ukulanda wonke amafayela e-tsv futhi uzikhawulele kokungu-~10-20 wokuqala ukuze uthole isethi enkulu yedatha engeke ingene ku-16 GB we-RAM. Ukulayisha amafayela e-TSV ngisebenzise umyalo olandelayo:

for i in *.tsv; do
    echo $i;
    tail -n +2 $i | pv |
    clickhouse-client --input_format_allow_errors_ratio 0.5 --query='INSERT INTO amazon(marketplace,customer_id,review_id,product_id,product_parent,product_title,product_category,star_rating,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body,review_date) FORMAT TabSeparated'
done

Ku-Persistent Disk ejwayelekile (okuyi-HDD) ku-Google Cloud enosayizi ongu-1000 GB (ngithathe lo sayizi ikakhulukazi ukuze isivinini sibe phezulu kancane, nakuba mhlawumbe i-SSD yosayizi odingekayo ibingaba ishibhile) ukulayisha. isivinini sasicishe sibe ngu-~ 75 MB/isekhondi kuma-cores angu-4.

  • Kumelwe ngibhukhe ukuthi ngisebenza kwa-Google, kodwa ngasebenzisa i-akhawunti yomuntu siqu futhi lesi sihloko asihlangene nhlobo nomsebenzi wami enkampanini.

Ngizokhiqiza yonke imifanekiso ngale dathasethi, ngoba yilokhu kuphela enganginakho.

Bonisa ukuqhubeka kokuskena idatha

Njengoba ku-ClickHouse sizosebenzisa ukuskena okugcwele etafuleni elinamalogi, futhi lokhu kusebenza kungathatha isikhathi esiningi futhi kungase kungakhiqizi noma yimiphi imiphumela isikhathi eside uma kutholakala okufanayo okumbalwa, kuhle ukwazi ukukhombisa ukuqhubeka kombuzo kuze kube yilapho kutholwa imigqa yokuqala enomphumela. Ukwenza lokhu, kunepharamitha kusixhumi esibonakalayo se-HTTP esikuvumela ukuthi uthumele inqubekelaphambili kumaheda e-HTTP: send_progress_in_http_headers=1. Ngeshwa, umtapo wezincwadi we-Go ojwayelekile awukwazi ukufunda izihloko njengoba zamukelwa, kodwa isixhumi esibonakalayo se-HTTP 1.0 (akufanele kudidaniswe ne-1.1!) sisekelwa yi-ClickHouse, ukuze ukwazi ukuvula uxhumano lwe-TCP eluhlaza ku-ClickHouse bese uyithumela lapho. GET /?query=... HTTP/1.0nn futhi uthole izihloko zempendulo nomzimba ngokuphendula ngaphandle kokubaleka noma ukubethela, ngakho-ke kulesi simo asidingi nokuthi sisebenzise umtapo wolwazi ojwayelekile.

Ukusakaza izingodo kusuka ku-ClickHouse

I-ClickHouse ibe nokwenza kahle kwemibuzo nge-ORDER BY isikhathi eside uma kuqhathaniswa (kusukela ngo-2019?), ngakho-ke umbuzo ofana

SELECT time, millis, message
FROM logs
WHERE message LIKE '%something%'
ORDER BY time DESC, millis DESC

Izoqala ngokushesha ukubuyisela imigqa enochungechunge oluncane "okuthile" kumlayezo wayo, ngaphandle kokulinda ukuthi ukuskena kuqedwe.

Futhi, kungaba kuhle kakhulu uma i-ClickHouse ngokwayo ikhansele isicelo lapho uxhumano lwayo luvaliwe, kodwa lokhu akukona ukuziphatha okuzenzakalelayo. Ukukhanselwa kwesicelo okuzenzakalelayo kunganikwa amandla kusetshenziswa inketho cancel_http_readonly_queries_on_client_close=1.

Ukuphatha okulungile kwe-SIGPIPE ku-Go

Lapho ukhipha, yisho, umyalo some_cmd | head -n 10, kanjani umyalo some_cmd iyeka ukubulawa lapho head ukhiphe imigqa eyi-10? Impendulo ilula: nini head iphela, ipayipi liyavala, futhi i-stdout yomyalo we-some_cmd iqala ukukhomba, ngokwemibandela, "kuya ndawo". Nini some_cmd uzama ukubhalela ipayipi elivaliwe, ithola isignali ye-SIGPIPE, evala uhlelo ngokuzenzakalelayo.

Ku-Go lokhu futhi kwenzeka ngokuzenzakalelayo, kodwa isibambi sesignali ye-SIGPIPE siphinde siphrinte "isiginali: SIGPIPE" noma umyalezo ofanayo ekugcineni, futhi ukusula lo mlayezo sidinga nje ukuthi siphathe i-SIGPIPE ngokwethu ngendlela esifuna ngayo, okungukuthi, buthule. Phuma:

ch := make(chan os.Signal)
signal.Notify(ch, syscall.SIGPIPE)
go func() {
    <-ch
    os.Exit(0)
}()

Bonisa umongo womlayezo

Ngokuvamile ufuna ukubona umongo lapho kwenzeke iphutha elithile (isibonelo, isiphi isicelo esibangele ukwethuka, noma yiziphi izinkinga ezihlobene ezibonakale ngaphambi kokuphahlazeka), futhi grep Lokhu kwenziwa kusetshenziswa okukhethwa kukho okuthi -A, -B, kanye no-C, okubonisa inani elishiwo lemigqa ngemva, ngaphambi, naseduze komlayezo, ngokulandelana.

Ngeshwa, angikayitholi indlela elula yokwenza okufanayo ku-ClickHouse, ngakho-ke ukuze ubonise umongo, isicelo esengeziwe esinjengalesi sithunyelwa kulayini ngamunye womphumela (imininingwane incike ekuhlungeni kanye nokuthi umongo ubonisiwe ngaphambili. noma ngemva):

SELECT time,millis,review_body FROM amazon
WHERE (time = 'ВРЕМЯ_СОБЫТИЯ' AND millis < МИЛЛИСЕКУНДЫ_СОБЫТИЯ) OR (time < 'ВРЕМЯ_СОБЫТИЯ')
ORDER BY time DESC, millis DESC
LIMIT КОЛИЧЕСТВО_СТРОК_КОНТЕКСТА
SETTINGS max_threads=1

Njengoba isicelo sithunyelwa cishe ngokushesha ngemva kokuba i-ClickHouse ibuyisele umugqa ohambisanayo, igcina isigciniwe futhi ngokuvamile isicelo senziwa ngokushesha futhi sisebenzisa i-CPU encane (imvamisa isicelo sithatha cishe ~ 6 ms emshinini wami we-virtual).

Bonisa imilayezo emisha ngesikhathi sangempela

Ukuze ubonise imilayezo engenayo (cishe) ngesikhathi sangempela, sivele senze isicelo kanye njalo emizuzwaneni embalwa, sikhumbula isitembu sesikhathi sokugcina esihlangabezane naso ngaphambili.

Izibonelo zomyalo

Injani imiyalo ye-logscli evamile ekusebenzeni?

Uma ulande idathasethi ye-Amazon engikhulume ngayo ekuqaleni kwesihloko, ungasebenzisa imiyalo elandelayo:

# Показать строки, где встречается слово walmart
$ logscli -F 'walmart' | less

# Показать самые свежие 10 строк, где встречается "terrible"
$ logscli -F terrible -limit 10

# То же самое без -limit:
$ logscli -F terrible | head -n 10

# Показать все строки, подходящие под /times [0-9]/, написанные для vine и у которых высокий рейтинг
$ logscli -E 'times [0-9]' -where="vine='Y' AND star_rating>4" | less

# Показать все строки со словом "panic" и 3 строки контекста вокруг
$ logscli -F 'panic' -C 3 | less

# Непрерывно показывать новые строки со словом "5-star"
$ logscli -F '5-star' -tailf

izithenjwa

Ikhodi yokusetshenziswa (ngaphandle kwemibhalo) iyatholakala ku-github ku https://github.com/YuriyNasretdinov/logscli. Ngingajabula ukuzwa imicabango yakho ngombono wami we-console interface yokubuka izingodo ezisekelwe ku-ClickHouse.

Source: www.habr.com

Engeza amazwana