ClickHouse Database for Humans, okanye Alien Technologies

U-Alexey Lizunov, intloko yeziko lobuchule kumajelo eenkonzo ezikude zoLawulo lweTekhnoloji yoLwazi lwe-ICB

ClickHouse Database for Humans, okanye Alien Technologies

Njengenye indlela ye-ELK stack (ElasticSearch, Logstash, Kibana), senza uphando ngokusebenzisa i-ClickHouse database njengendawo yokugcina idatha kwiilogi.

Kweli nqaku singathanda ukuthetha ngamava ethu ngokusebenzisa i-ClickHouse database kunye neziphumo zokuqala ukusuka ekusebenzeni komqhubi. Kuyafaneleka ukuba uqaphele kwangoko ukuba iziphumo ziyamangalisa.


ClickHouse Database for Humans, okanye Alien Technologies

Okulandelayo siza kuchaza ngokweenkcukacha ngakumbi indlela inkqubo yethu eqwalaselwe ngayo kwaye ibandakanya ntoni na amalungu. Kodwa ngoku ndingathanda ukuthetha kancinci malunga nale database xa iyonke, kwaye kutheni kufanelekile ukunikela ingqalelo. I-database ye-ClickHouse yi-database ephezulu yohlalutyo lwekholamu evela kwi-Yandex. Isetyenziswe kwiinkonzo zeYandex, ekuqaleni oku kukugcina idatha ephambili yeYandex.Metrica. Inkqubo yomthombo ovulekileyo, simahla. Ukusuka kwimbono yomphuhlisi, bendihlala ndizibuza ukuba bayenze njani le nto, kuba kukho idatha enkulu kakhulu. Kwaye ujongano lomsebenzisi weMetrica ngokwalo lubhetyebhetye kwaye lusebenza ngokukhawuleza. Xa uqala ukuqhelana nale database, ufumana umbono: "Ewe, ekugqibeleni! Yenzelwe “abantu”! Ukusuka kwinkqubo yokufakela ukuya ekuthumeleni izicelo.”

Le database inomqobo ophantsi kakhulu wokungena. Nokuba umphuhlisi ophakathi angafaka le database kwimizuzu embalwa kwaye aqalise ukuyisebenzisa. Yonke into isebenza kakuhle. Nokuba abantu abatsha kwiLinux banokukhawuleza ukumelana nofakelo kwaye benze imisebenzi elula. Ukuba ngaphambili, xa besiva amagama athi Big Data, Hadoop, Google BigTable, HDFS, umphuhlisi oqhelekileyo wayenombono wokuba bathetha ngezinye iiterabytes, i-petabytes, ukuba abanye abantu abanamandla angaphezu kwawomntu babandakanyeka ekusekweni nasekuphuhliseni ezi nkqubo, ngoko Ukufika kweClickHouse database sifumene isixhobo esilula, esiqondakalayo onokusombulula ngaso uluhlu lweengxaki ebezingenakufikelelwa. Ekuphela kwento efunekayo ngumatshini omnye ngokufanelekileyo kunye nemizuzu emihlanu ukuyifaka. Oko kukuthi, sifumene i-database efana, umzekelo, i-MySql, kodwa kuphela ukugcina iibhiliyoni zeerekhodi! A uhlobo superarchiver ngolwimi SQL. Kuba ngathi abantu banikwa izixhobo zasemzini.

Malunga nenkqubo yethu yokuqokelela log

Ukuqokelela ulwazi, iifayile zelogi ze-IIS zezicelo zewebhu zefomathi esemgangathweni zisetyenziswa (kwangoku sibandakanyeka ekucazululeni iilogi zezicelo, kodwa eyona njongo yethu iphambili kwinqanaba lokulinga kukuqokelela iilogi ze-IIS).

Asikwazanga ukushiya ngokupheleleyo isitaki se-ELK ngenxa yezizathu ezahlukeneyo, kwaye siyaqhubeka sisebenzisa i-LogStash kunye ne-Filebeat components, eziye zazibonakalisa kakuhle kwaye zisebenza ngokuthembekileyo nangokuqikelelekayo.

Iskimu sokugawulwa kwemithi ngokubanzi sibonisiwe kumzobo ongezantsi:

ClickHouse Database for Humans, okanye Alien Technologies

Isici sedatha yokurekhoda kwi-database ye-ClickHouse yi-infrequent (kanye ngesibini) ukufakwa kweerekhodi kwiibhetshi ezinkulu. Oku, ngokucacileyo, yeyona nxalenye "yingxaki" odibana nayo xa usebenza neClickHouse database okokuqala: iskimu siba nzima ngakumbi.
I-plugin ye-LogStash, efaka ngokuthe ngqo idatha kwi-ClickHouse, incede kakhulu apha. Eli candelo lisetyenziswe kwiseva efanayo njengesiseko sedatha ngokwayo. Ke, xa sithetha ngokubanzi, akukhuthazwa ukwenza oku, kodwa ngokwembono ebonakalayo, ukuze ungenzi iiseva ezihlukeneyo ngelixa zibekwe kwiseva enye. Asikhange siqwalasele nakuphi na ukungaphumeleli okanye ukungqubana kwemithombo novimba weenkcukacha. Ukongezelela, kufuneka kuqatshelwe ukuba i-plugin inendlela yokubuyisela xa kukho iimpazamo. Kwaye kwimeko yeempazamo, iplagin ibhala kwidisk i-batch yedatha engakwazi ukufakwa (ifomati yefayile ifanelekile: emva kokuhlela, unokufaka ngokulula ibhetshi echanekileyo usebenzisa i-clickhouse-client).

Uluhlu olupheleleyo lwesoftware esetyenziswa kwiskim luthiwe thaca kwitheyibhile:

Uluhlu lwesoftware esetyenziswayo

Isihloko

inkcazelo

Ikhonkco kulwabiwo

NGINX

I-reverse-proxy yokuthintela ufikelelo ngezibuko kunye nogunyaziso lolungiselelo

Okwangoku ayisetyenziswanga kwiskim

https://nginx.org/ru/download.html

https://nginx.org/download/nginx-1.16.0.tar.gz

FileBeat

Ukugqithiselwa kweelog zefayile.

https://www.elastic.co/downloads/beats/filebeat (usasazo lweWindows 64bit).

https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.3.0-windows-x86_64.zip

LogStash

Umqokeleli welogi.

Isetyenziselwa ukuqokelela iilogi kwiFayileBeat, kunye nokuqokelela iilogi ukusuka kumgca we-RabbitMQ (kwiiseva eziku-DMZ.)

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

Logstash- imveliso- clickhouse

Loagstash plugin yokudlulisela iingodo kwiClickHouse database kwiibhetshi

https://github.com/mikechris/logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin faka logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin install logstash-filter-prune

/usr/share/logstash/bin/logstash-plugin faka logstash-filter-multiline

Cofa indlu

Ukugcinwa kwelog https://clickhouse.yandex/docs/ru/

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-server-19.5.3.8-1.el7.x86_64.rpm

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-client-19.5.3.8-1.el7.x86_64.rpm

Phawula. Ukususela ngo-Agasti 2018, i-rpm "eqhelekileyo" yakha i-RHEL ibonakala kwindawo yokugcina i-Yandex, ngoko unokuzama ukuyisebenzisa. Ngexesha lofakelo besisebenzisa iipakethe ezihlanganiswe nguAltinity.

IGrafana

Ukubonwa kweelog. Ukumisela iideshibhodi

https://grafana.com/

https://grafana.com/grafana/download

Redhat & Centos(64 Bit) – inguqulelo yamva nje

Umthombo wedatha weClickHouse weGrafana 4.6+

Iplagi yeGrafana enomthombo wedatha weClickHouse

https://grafana.com/plugins/vertamedia-clickhouse-datasource

https://grafana.com/api/plugins/vertamedia-clickhouse-datasource/versions/1.8.1/download

LogStash

Loga umzila ukusuka kwiFayileBeat ukuya kumgca weRabbitMQ.

Phawula. Ngelishwa iFayileBeat ayinayo imveliso ngokuthe ngqo kwi-RabbitMQ, ngoko ke ikhonkco eliphakathi ngendlela yeLogstash liyafuneka.

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

UmvundlaMQ

Umgca womyalezo. Esi sisithinteli samangeno elogi kwi-DMZ

https://www.rabbitmq.com/download.html

https://github.com/rabbitmq/rabbitmq-server/releases/download/v3.7.14/rabbitmq-server-3.7.14-1.el7.noarch.rpm

Erlang Runtime (Iyafuneka kwiRabbitMQ)

Erlang ixesha lokusebenza. Iyimfuneko ukuze iRabbitMQ isebenze

http://www.erlang.org/download.html

https://www.rabbitmq.com/install-rpm.html#install-erlang http://www.erlang.org/downloads/21.3

Ulungelelwaniso lomncedisi ngeClickHouse database inikwe kolu luhlu lulandelayo:

Isihloko

Nentsingiselo

Qaphela:

Isimo

I-HDD: 40GB
I-RAM: 8GB
Iprosesa: Core 2 2Ghz

Kufuneka ubeke ingqalelo kwiingcebiso zokusebenzisa i-ClickHouse database (https://clickhouse.yandex/docs/ru/operations/tips/)

Inkqubo ebanzi yesoftware

OS:Red Hat Enterprise Linux Server (Maipo)

I-JRE (iJava 8)

 

Njengoko ubona, le yindawo yokusebenza eqhelekileyo.

Ulwakhiwo lwetheyibhile yokugcina iinkuni lulolu hlobo lulandelayo:

log_web.sql

CREATE TABLE log_web (
  logdate Date,
  logdatetime DateTime CODEC(Delta, LZ4HC),
   
  fld_log_file_name LowCardinality( String ),
  fld_server_name LowCardinality( String ),
  fld_app_name LowCardinality( String ),
  fld_app_module LowCardinality( String ),
  fld_website_name LowCardinality( String ),
 
  serverIP LowCardinality( String ),
  method LowCardinality( String ),
  uriStem String,
  uriQuery String,
  port UInt32,
  username LowCardinality( String ),
  clientIP String,
  clientRealIP String,
  userAgent String,
  referer String,
  response String,
  subresponse String,
  win32response String,
  timetaken UInt64
   
  , uriQuery__utm_medium String
  , uriQuery__utm_source String
  , uriQuery__utm_campaign String
  , uriQuery__utm_term String
  , uriQuery__utm_content String
  , uriQuery__yclid String
  , uriQuery__region String
 
) Engine = MergeTree()
PARTITION BY toYYYYMM(logdate)
ORDER BY (fld_app_name, fld_app_module, logdatetime)
SETTINGS index_granularity = 8192;

Sisebenzisa amaxabiso angagqibekanga ukwahlulahlula (ngenyanga) kunye nesalathiso segranularity. Zonke iindawo zingqinelana ne-IIS log entries yokurekhoda izicelo ze-http. Ngokwahlukileyo, siqaphela ukuba kukho iindawo ezihlukeneyo zokugcina iithegi ze-utm (zihlanjululwe kwinqanaba lokufaka kwitafile ukusuka kwintsimi yomtya wombuzo).

Kwakhona, iindawo ezininzi zesistim zongezwe kwitheyibhile ukugcina ulwazi malunga neenkqubo, amalungu, kunye nabancedisi. Ngengcaciso yale mimandla, jonga le theyibhile ingezantsi. Kwitheyibhile enye sigcina iingodo kwiinkqubo ezininzi.

Isihloko

inkcazelo

Umzekelo:

fld_app_name

Isicelo/igama lenkqubo
Amaxabiso asebenzayo:

  • Isayithi1.domain.com Indawo yangaphandle 1
  • Isayithi2.domain.com Indawo yangaphandle 2
  • indawo yangaphakathi1.domain.local indawo yangaphakathi 1

indawo1.domain.com

fld_app_modyuli

Imodyuli yenkqubo
Amaxabiso asebenzayo:

  • iwebhu-Iwebhusayithi
  • svc-Inkonzo yewebhu yewebhu
  • intgr — Inkonzo yodibaniso lwewebhu
  • bo — Umlawuli (Iofisi yangasemva)

Kwiwebhu

fld_igama_lewebhusayithi

Igama lesiza kwi-IIS

Iinkqubo ezininzi zinokubekwa kumncedisi omnye, okanye iimeko ezininzi zemodyuli enye yesixokelelwano

web-main

fld_server_igama

Igama leseva

web1.domain.com

fld_log_file_name

Indlela eya kwifayile yelog kumncedisi

Ukusuka:inetpublogsLogFiles
W3SVC1u_ex190711.log

Oku kukuvumela ukuba wenze ngokufanelekileyo iigrafu eGrafana. Umzekelo, jonga izicelo ukusuka kumphambili wenkqubo ethile. Oku kufana ne-site counter kwi-Yandex.Metrica.

Nazi ezinye izibalo zokusetyenziswa kwedatha yeenyanga ezimbini.

Inani leerekhodi ngokwenkqubo namacandelo

SELECT
    fld_app_name,
    fld_app_module,
    count(fld_app_name) AS rows_count
FROM log_web
GROUP BY
    fld_app_name,
    fld_app_module
    WITH TOTALS
ORDER BY
    fld_app_name ASC,
    rows_count DESC
 
┌─fld_app_name─────┬─fld_app_module─┬─rows_count─┐
│ site1.domain.ru  │ web            │     131441 │
│ site2.domain.ru  │ web            │    1751081 │
│ site3.domain.ru  │ web            │  106887543 │
│ site3.domain.ru  │ svc            │   44908603 │
│ site3.domain.ru  │ intgr          │    9813911 │
│ site4.domain.ru  │ web            │     772095 │
│ site5.domain.ru  │ web            │   17037221 │
│ site5.domain.ru  │ intgr          │     838559 │
│ site5.domain.ru  │ bo             │       7404 │
│ site6.domain.ru  │ web            │     595877 │
│ site7.domain.ru  │ web            │   27778858 │
└──────────────────┴────────────────┴────────────┘
 
Totals:
┌─fld_app_name─┬─fld_app_module─┬─rows_count─┐
│              │                │  210522593 │
└──────────────┴────────────────┴────────────┘
 
11 rows in set. Elapsed: 4.874 sec. Processed 210.52 million rows, 421.67 MB (43.19 million rows/s., 86.51 MB/s.)

Umthamo wedatha yediski

SELECT
    formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed,
    formatReadableSize(sum(data_compressed_bytes)) AS compressed,
    sum(rows) AS total_rows
FROM system.parts
WHERE table = 'log_web'
 
┌─uncompressed─┬─compressed─┬─total_rows─┐
│ 54.50 GiB    │ 4.86 GiB   │  211427094 │
└──────────────┴────────────┴────────────┘
 
1 rows in set. Elapsed: 0.035 sec.

Umlinganiselo woxinzelelo lwekholamu yedatha

SELECT
    name,
    formatReadableSize(data_uncompressed_bytes) AS uncompressed,
    formatReadableSize(data_compressed_bytes) AS compressed,
    data_uncompressed_bytes / data_compressed_bytes AS compress_ratio
FROM system.columns
WHERE table = 'log_web'
 
┌─name───────────────────┬─uncompressed─┬─compressed─┬─────compress_ratio─┐
│ logdate                │ 401.53 MiB   │ 1.80 MiB   │ 223.16665968777315 │
│ logdatetime            │ 803.06 MiB   │ 35.91 MiB  │ 22.363966401202305 │
│ fld_log_file_name      │ 220.66 MiB   │ 2.60 MiB   │  84.99905736932571 │
│ fld_server_name        │ 201.54 MiB   │ 50.63 MiB  │  3.980924816977078 │
│ fld_app_name           │ 201.17 MiB   │ 969.17 KiB │ 212.55518183686877 │
│ fld_app_module         │ 201.17 MiB   │ 968.60 KiB │ 212.67805817411906 │
│ fld_website_name       │ 201.54 MiB   │ 1.24 MiB   │  162.7204926761546 │
│ serverIP               │ 201.54 MiB   │ 50.25 MiB  │  4.010824061219731 │
│ method                 │ 201.53 MiB   │ 43.64 MiB  │  4.617721053304486 │
│ uriStem                │ 5.13 GiB     │ 832.51 MiB │  6.311522291936919 │
│ uriQuery               │ 2.58 GiB     │ 501.06 MiB │  5.269731450124478 │
│ port                   │ 803.06 MiB   │ 3.98 MiB   │ 201.91673864241824 │
│ username               │ 318.08 MiB   │ 26.93 MiB  │ 11.812513794583598 │
│ clientIP               │ 2.35 GiB     │ 82.59 MiB  │ 29.132328640073343 │
│ clientRealIP           │ 2.49 GiB     │ 465.05 MiB │  5.478382297052563 │
│ userAgent              │ 18.34 GiB    │ 764.08 MiB │  24.57905114484208 │
│ referer                │ 14.71 GiB    │ 1.37 GiB   │ 10.736792723669906 │
│ response               │ 803.06 MiB   │ 83.81 MiB  │  9.582334090987247 │
│ subresponse            │ 399.87 MiB   │ 1.83 MiB   │  218.4831068635027 │
│ win32response          │ 407.86 MiB   │ 7.41 MiB   │ 55.050315514606815 │
│ timetaken              │ 1.57 GiB     │ 402.06 MiB │ 3.9947395692010637 │
│ uriQuery__utm_medium   │ 208.17 MiB   │ 12.29 MiB  │ 16.936148912472955 │
│ uriQuery__utm_source   │ 215.18 MiB   │ 13.00 MiB  │ 16.548367623199912 │
│ uriQuery__utm_campaign │ 381.46 MiB   │ 37.94 MiB  │ 10.055156353418509 │
│ uriQuery__utm_term     │ 231.82 MiB   │ 10.78 MiB  │ 21.502540454070672 │
│ uriQuery__utm_content  │ 441.34 MiB   │ 87.60 MiB  │  5.038260760449327 │
│ uriQuery__yclid        │ 216.88 MiB   │ 16.58 MiB  │  13.07721335008116 │
│ uriQuery__region       │ 204.35 MiB   │ 9.49 MiB   │  21.52661903446796 │
└────────────────────────┴──────────────┴────────────┴────────────────────┘
 
28 rows in set. Elapsed: 0.005 sec.

Inkcazo yamacandelo asetyenzisiweyo

FileBeat. Ukuhanjiswa kweelog zefayile

Eli candelo lijonga utshintsho kwiifayile zelog kwidiski kwaye ligqithise ulwazi kwiLogStash. Ifakwe kuzo zonke iiseva apho iifayile zelog zibhalwayo (ngokuqhelekileyo IIS). Isebenza kwimodi yomsila (oko kukuthi, idlulisela kuphela iirekhodi ezongezelelweyo kwifayile). Kodwa ungayiqwalasela ngokwahlukeneyo ukuze udlulise zonke iifayile. Oku kulungele xa ufuna ukukhuphela idatha kwiinyanga ezidlulileyo. Faka nje ifayile yelog kwifolda kwaye iya kuyifunda yonke.

Xa inkonzo iyeka, idatha iyayeka ukudluliselwa kwindawo yokugcina.

Umzekelo woqwalaselo ujongeka ngolu hlobo:

filebeat.yml

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/W3SVC1/*.log
  exclude_files: ['.gz$','.zip$']
  tail_files: true
  ignore_older: 24h
  fields:
    fld_server_name: "site1.domain.ru"
    fld_app_name: "site1.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
 
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/__Import/access_log-*
  exclude_files: ['.gz$','.zip$']
  tail_files: false
  fields:
    fld_server_name: "site2.domain.ru"
    fld_app_name: "site2.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
    fld_logformat: "logformat__apache"
 
 
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
  reload.period: 2s
 
output.logstash:
  hosts: ["log.domain.com:5044"]
 
  ssl.enabled: true
  ssl.certificate_authorities: ["C:/filebeat/certs/ca.pem", "C:/filebeat/certs/ca-issuing.pem"]
  ssl.certificate: "C:/filebeat/certs/site1.domain.ru.cer"
  ssl.key: "C:/filebeat/certs/site1.domain.ru.key"
 
#================================ Processors =====================================
 
processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~

LogStash. Log Collector

Eli candelo lenzelwe ukufumana iirekhodi zelogi kwiFayileBeat (okanye ngokusebenzisa umgca we-RabbitMQ), uhlalutye kwaye uwafake kwiibhetshi kwi-ClickHouse database.

Ukufaka kwiClickHouse, sebenzisa iplagi yeLogstash-output-clickhouse. I-plugin ye-Logstash inendlela yokubuyisela izicelo, kodwa ngexesha lokuvalwa rhoqo, kungcono ukumisa inkonzo ngokwayo. Xa imisiwe, imiyalezo iya kuqokelela kwi-RabbitMQ emgceni, ngoko ke ukuba ukumisa ixesha elide, kungcono ukumisa iiFayilebeats kwiiseva. Kwiskimu apho i-RabbitMQ ingasetyenziswanga (kwinethiwekhi yendawo yeFayilebeat ithumela ngokuthe ngqo iilog kwi-Logstash), iiFayilebeats zisebenza ngokwamkelekileyo kwaye zikhuselekile, ngoko ke ukungabikho kwemveliso akunaziphumo.

Umzekelo woqwalaselo ujongeka ngolu hlobo:

log_web__filebeat_clickhouse.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/logstash/certs/ca.cer", "/etc/logstash/certs/ca-issuing.cer"]
        ssl_certificate => "/etc/logstash/certs/server.cer"
        ssl_key => "/etc/logstash/certs/server-pkcs8.key"
        ssl_verify_mode => "peer"
 
            add_field => {
                "fld_server_name" => "%{[fields][fld_server_name]}"
                "fld_app_name" => "%{[fields][fld_app_name]}"
                "fld_app_module" => "%{[fields][fld_app_module]}"
                "fld_website_name" => "%{[fields][fld_website_name]}"
                "fld_log_file_name" => "%{source}"
                "fld_logformat" => "%{[fields][fld_logformat]}"
            }
    }
 
    rabbitmq {
        host => "queue.domain.com"
        port => 5671
        user => "q-reader"
        password => "password"
        queue => "web_log"
        heartbeat => 30
        durable => true
        ssl => true
        #ssl_certificate_path => "/etc/logstash/certs/server.p12"
        #ssl_certificate_password => "password"
 
        add_field => {
            "fld_server_name" => "%{[fields][fld_server_name]}"
            "fld_app_name" => "%{[fields][fld_app_name]}"
            "fld_app_module" => "%{[fields][fld_app_module]}"
            "fld_website_name" => "%{[fields][fld_website_name]}"
            "fld_log_file_name" => "%{source}"
            "fld_logformat" => "%{[fields][fld_logformat]}"
        }
    }
 
}
 
filter { 
 
      if [message] =~ "^#" {
        drop {}
      }
 
      if [fld_logformat] == "logformat__iis_with_xrealip" {
     
          grok {
            match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken} %{NOTSPACE:xrealIP} %{NOTSPACE:xforwarderfor}"]
          }
      } else {
   
          grok {
             match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken}"]
          }
 
      }
 
      date {
        match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
          timezone => "Etc/UTC"
        remove_field => [ "log_timestamp", "@timestamp" ]
        target => [ "log_timestamp2" ]
      }
 
        ruby {
            code => "tstamp = event.get('log_timestamp2').to_i
                        event.set('logdatetime', Time.at(tstamp).strftime('%Y-%m-%d %H:%M:%S'))
                        event.set('logdate', Time.at(tstamp).strftime('%Y-%m-%d'))"
        }
 
      if [bytesSent] {
        ruby {
          code => "event['kilobytesSent'] = event['bytesSent'].to_i / 1024.0"
        }
      }
 
 
      if [bytesReceived] {
        ruby {
          code => "event['kilobytesReceived'] = event['bytesReceived'].to_i / 1024.0"
        }
      }
 
   
        ruby {
            code => "event.set('clientRealIP', event.get('clientIP'))"
        }
        if [xrealIP] {
            ruby {
                code => "event.set('clientRealIP', event.get('xrealIP'))"
            }
        }
        if [xforwarderfor] {
            ruby {
                code => "event.set('clientRealIP', event.get('xforwarderfor'))"
            }
        }
 
      mutate {
        convert => ["bytesSent", "integer"]
        convert => ["bytesReceived", "integer"]
        convert => ["timetaken", "integer"] 
        convert => ["port", "integer"]
 
        add_field => {
            "clientHostname" => "%{clientIP}"
        }
      }
 
        useragent {
            source=> "useragent"
            prefix=> "browser"
        }
 
        kv {
            source => "uriQuery"
            prefix => "uriQuery__"
            allow_duplicate_values => false
            field_split => "&"
            include_keys => [ "utm_medium", "utm_source", "utm_campaign", "utm_term", "utm_content", "yclid", "region" ]
        }
 
        mutate {
            join => { "uriQuery__utm_source" => "," }
            join => { "uriQuery__utm_medium" => "," }
            join => { "uriQuery__utm_campaign" => "," }
            join => { "uriQuery__utm_term" => "," }
            join => { "uriQuery__utm_content" => "," }
            join => { "uriQuery__yclid" => "," }
            join => { "uriQuery__region" => "," }
        }
 
}
 
output { 
  #stdout {codec => rubydebug}
    clickhouse {
      headers => ["Authorization", "Basic abcdsfks..."]
      http_hosts => ["http://127.0.0.1:8123"]
      save_dir => "/etc/logstash/tmp"
      table => "log_web"
      request_tolerance => 1
      flush_size => 10000
      idle_flush_time => 1
        mutations => {
            "fld_log_file_name" => "fld_log_file_name"
            "fld_server_name" => "fld_server_name"
            "fld_app_name" => "fld_app_name"
            "fld_app_module" => "fld_app_module"
            "fld_website_name" => "fld_website_name"
 
            "logdatetime" => "logdatetime"
            "logdate" => "logdate"
            "serverIP" => "serverIP"
            "method" => "method"
            "uriStem" => "uriStem"
            "uriQuery" => "uriQuery"
            "port" => "port"
            "username" => "username"
            "clientIP" => "clientIP"
            "clientRealIP" => "clientRealIP"
            "userAgent" => "userAgent"
            "referer" => "referer"
            "response" => "response"
            "subresponse" => "subresponse"
            "win32response" => "win32response"
            "timetaken" => "timetaken"
             
            "uriQuery__utm_medium" => "uriQuery__utm_medium"
            "uriQuery__utm_source" => "uriQuery__utm_source"
            "uriQuery__utm_campaign" => "uriQuery__utm_campaign"
            "uriQuery__utm_term" => "uriQuery__utm_term"
            "uriQuery__utm_content" => "uriQuery__utm_content"
            "uriQuery__yclid" => "uriQuery__yclid"
            "uriQuery__region" => "uriQuery__region"
        }
    }
 
}

imibhobho.yml

# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
#   https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html
 
- pipeline.id: log_web__filebeat_clickhouse
  path.config: "/etc/logstash/log_web__filebeat_clickhouse.conf"

ClickHouse. Ukugcinwa kwelog

Iilogi zazo zonke iinkqubo zigcinwa kwitafile enye (jonga ekuqaleni kwenqaku). Yenzelwe ukugcina ulwazi malunga nezicelo: zonke iiparameters ziyafana kwiifomati ezahlukeneyo, umzekelo iilogi ze-IIS, ii-apache kunye ne-nginx logs. Kwiilogi zesicelo apho, umzekelo, iimpazamo, imiyalezo yolwazi, izilumkiso zirekhodwa, itafile eyahlukileyo iya kunikwa isakhiwo esifanelekileyo (okwangoku kwinqanaba lokuyila).

Xa uyila itafile, kubaluleke kakhulu ukwenza isigqibo malunga nesitshixo esiphambili (apho idatha iya kulungiswa ngexesha lokugcinwa). Iqondo loxinzelelo lwedatha kunye nesantya sombuzo sixhomekeke koku. Kumzekelo wethu, undoqo
U-ORDER BY (fld_app_name, fld_app_module, logdatetime)
Oko kukuthi, ngegama lenkqubo, igama lecandelo lenkqubo kunye nomhla wesiganeko. Ekuqaleni, umhla womsitho weza kuqala. Emva kokuyisusa kwindawo yokugqibela, imibuzo yaqala ukusebenza phantse kabini ngokukhawuleza. Ukutshintsha iqhosha eliphambili kuya kufuna ukuphinda wenze itafile kwaye uphinde ulayishe idatha ukuze iClickHouse iphinde ihlele idatha kwidiski. Lo ngumsebenzi onzima, ngoko kuyacetyiswa ukuba ucinge ngononophelo kwangaphambili malunga noko kufuneka kufakwe kwisitshixo sohlobo.

Kufuneka kwakhona kuqatshelwe ukuba uhlobo lwedatha ye-LowCardinality luvele kwiinguqulelo zamva nje. Xa uyisebenzisa, ubungakanani bedatha ecinezelweyo buncitshiswe ngokukhawuleza kulawo mabala anekhadinality ephantsi (izinketho ezimbalwa).

Ngoku sisebenzisa uguqulelo 19.6 kwaye siceba ukuzama ukuhlaziya inguqulelo yamva nje. Baneempawu ezintle ezifana ne-Adaptive Granularity, Ukutsiba i-indices kunye ne-DoubleDelta codec, umzekelo.

Ngokungagqibekanga, ngexesha lofakelo inqanaba lokugawulwa koqwalaselo limiselwe ukulandelela. Izigodo zijikeleziswa kwaye zigcinwe, kodwa kwangaxeshanye zanda ukuya kwigigabyte. Ukuba akukho mfuneko, ngoko unokuseta umgangatho kwisilumkiso, ngoko ubungakanani belogi buya kuncipha ngokukhawuleza. Izicwangciso zokuloga zikhankanyiwe kwifayile ye config.xml:

<!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger. h#L105 -->
<level>warning</level>

Eminye imiyalelo eluncedo

Поскольку оригинальные пакеты установки собираются по Debian, то для других версий Linux необходимо использовать пакеты собранные компанией Altinity.
 
Вот по этой ссылке есть инструкции с ссылками на их репозиторий: https://www.altinity.com/blog/2017/12/18/logstash-with-clickhouse
sudo yum search clickhouse-server
sudo yum install clickhouse-server.noarch
  
1. проверка статуса
sudo systemctl status clickhouse-server
 
2. остановка сервера
sudo systemctl stop clickhouse-server
 
3. запуск сервера
sudo systemctl start clickhouse-server
 
Запуск для выполнения запросов в многострочном режиме (выполнение после знака ";")
clickhouse-client --multiline
clickhouse-client --multiline --host 127.0.0.1 --password pa55w0rd
clickhouse-client --multiline --host 127.0.0.1 --port 9440 --secure --user default --password pa55w0rd
 
Плагин кликлауза для логстеш в случае ошибки в одной строке сохраняет всю пачку в файл /tmp/log_web_failed.json
Можно вручную исправить этот файл и попробовать залить его в БД вручную:
clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /tmp/log_web_failed__fixed.json
 
sudo mv /etc/logstash/tmp/log_web_failed.json /etc/logstash/tmp/log_web_failed__fixed.json
sudo chown user_dev /etc/logstash/tmp/log_web_failed__fixed.json
sudo clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /etc/logstash/tmp/log_web_failed__fixed.json
sudo mv /etc/logstash/tmp/log_web_failed__fixed.json /etc/logstash/tmp/log_web_failed__fixed_.json
 
выход из командной строки
quit;
## Настройка TLS
https://www.altinity.com/blog/2019/3/5/clickhouse-networking-part-2
 
openssl s_client -connect log.domain.com:9440 < /dev/null

LogStash. Loga umzila ukusuka kwiFayileBeat ukuya kumgca weRabbitMQ

Eli candelo lisetyenziselwa ukuhambisa iilogi ezivela kwiFayileBeat ukuya kumgca weRabbitMQ. Kukho amanqaku amabini apha:

  1. Ngelishwa, iFayileBeat ayinayo iplagin ephumayo yokubhala ngokuthe ngqo kwiRabbitMQ. Kwaye ukusebenza okunjalo, ukugweba ngeposi kwi-github yabo, akucwangciswanga ukuphunyezwa. Kukho iplagin yeKafka, kodwa ngenxa yezizathu ezithile asinakuyisebenzisa ngokwethu.
  2. Kukho iimfuno zokuqokelela iilogi kwi-DMZ. Ngokusekelwe kuzo, iilogi maziqale zibekwe emgceni kwaye emva koko i-LogStash ifunde iirekhodi ukusuka kumgca ngaphandle.

Ke ngoko, ngokukodwa kwimeko yeeseva ezibekwe kwi-DMZ, kuyafuneka ukusebenzisa iskimu esinobunzima obuncinci. Umzekelo woqwalaselo ujongeka ngolu hlobo:

iis_w3c_logs__filebeat_rabbitmq.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/pki/tls/certs/app/ca.pem", "/etc/pki/tls/certs/app/ca-issuing.pem"]
        ssl_certificate => "/etc/pki/tls/certs/app/queue.domain.com.cer"
        ssl_key => "/etc/pki/tls/certs/app/queue.domain.com-pkcs8.key"
        ssl_verify_mode => "peer"
    }
 
}
 
output { 
  #stdout {codec => rubydebug}
 
    rabbitmq {
        host => "127.0.0.1"
        port => 5672
        exchange => "monitor.direct"
        exchange_type => "direct"
        key => "%{[fields][fld_app_name]}"
        user => "q-writer"
        password => "password"
        ssl => false
    }
}

UmvundlaMQ. Uluhlu lomyalezo

Eli candelo lisetyenziselwa ukuthintela ukungena kwelogi kwi-DMZ. Ukurekhoda kwenziwa ngeFayilebeat → LogStash ikhonkco. Ukufunda kwenziwa ngaphandle kweDMZ ngeLogStash. Xa usebenza ngeRabbitMQ, malunga nemiyalezo engamawaka angama-4 ngesekhondi icutshungulwa.

Ukuhanjiswa komyalezo kuqwalaselwe ngegama lenkqubo, oko kukuthi, ngokusekelwe kwidatha yokumisela iFayileBeat. Yonke imiyalezo ingena kumgca omnye. Ukuba ngesizathu esithile inkonzo yokufola iyekile, oku akuyi kukhokelela ekulahlekeni komyalezo: IiFayileBeats ziya kufumana iimpazamo zoqhagamshelwano kwaye ziya kuyeka ukuthumela okwethutyana. Kwaye i-LogStash, efundeka emgceni, iya kufumana kwakhona iimpazamo zenethiwekhi kwaye ilinde uxhulumaniso ukuba lubuyiselwe. Kule meko, ngokuqinisekileyo, idatha ayisayi kubhalwa kwisiseko sedatha.

Le miyalelo ilandelayo isetyenziswa ukwenza kunye nokuqwalasela imigca:

sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare exchange --vhost=/ name=monitor.direct type=direct sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare queue --vhost=/ name=web_log durable=true
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site1.domain.ru"
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site2.domain.ru"

Grafana. Dashboards

Eli candelo lisetyenziselwa ukujonga idatha yokubeka iliso. Kule meko, kufuneka ufake i-ClickHouse datasource ye-plugin ye-Grafana 4.6+. Kwafuneka siyilungise kancinci ukuphucula ukusebenza kakuhle kokucoca izihluzi zeSQL kwideshibhodi.

Umzekelo, sisebenzisa izinto eziguquguqukayo, kwaye ukuba azichazwanga kumhlaba wokucoca, ngoko singathanda ukuba ingavelisi imeko kwindawo APHO kwifom ( uriStem = "KUNYE uriStem != "). Kule meko, i-ClickHouse iya kufunda ikholamu ye-uriStem. Ngoko, sizame iindlela ezahlukeneyo kwaye ekugqibeleni silungise iplagin (i-$valueIfEmpty macro) ukubuyisela i-1 kwimeko yexabiso elingenanto, ngaphandle kokukhankanya ikholamu ngokwayo.

Kwaye ngoku ungasebenzisa lo mbuzo kwigrafu

$columns(response, count(*) c) from $table where $adhoc
and $valueIfEmpty($fld_app_name, 1, fld_app_name = '$fld_app_name')
and $valueIfEmpty($fld_app_module, 1, fld_app_module = '$fld_app_module') and $valueIfEmpty($fld_server_name, 1, fld_server_name = '$fld_server_name') and $valueIfEmpty($uriStem, 1, uriStem like '%$uriStem%')
and $valueIfEmpty($clientRealIP, 1, clientRealIP = '$clientRealIP')

eguqulelwa kwiSQL ngolu hlobo (qaphela ukuba imihlaba ye-uriStem engenanto iguqulelwa ku-1 nje)

SELECT
t,
groupArray((response, c)) AS groupArr
FROM (
SELECT
(intDiv(toUInt32(logdatetime), 60) * 60) * 1000 AS t, response,
count(*) AS c FROM default.log_web
WHERE (logdate >= toDate(1565061982)) AND (logdatetime >= toDateTime(1565061982)) AND 1 AND (fld_app_name = 'site1.domain.ru') AND (fld_app_module = 'web') AND 1 AND 1 AND 1
GROUP BY
t, response
ORDER BY
t ASC,
response ASC
)
GROUP BY t ORDER BY t ASC

isiphelo

Ukubonakala kweClickHouse database ibe yinto ephawulekayo kwimarike. Kwakunzima ukucinga ukuba ngephanyazo, ngaphandle kwentlawulo, sasixhobe ngesixhobo esinamandla nesisebenzayo sokusebenza ngedatha enkulu. Ewe, njengoko iimfuno zinyuka (umzekelo, ukwahlula kunye nokuphindaphinda kwiiseva ezininzi), iskimu siya kuba nzima ngakumbi. Kodwa ngokwemibono yokuqala, ukusebenza nale database kumnandi kakhulu. Kucacile ukuba imveliso yenzelwe "abantu".

Xa kuthelekiswa ne-ElasticSearch, iindleko zokugcina kunye nokucubungula iinkuni, ngokoqikelelo lokuqala, zincitshiswa izihlandlo ezihlanu ukuya kwezilishumi. Ngamanye amazwi, ukuba umthamo wangoku wedatha kuya kufuneka simise iqoqo loomatshini abaninzi, ngoko xa usebenzisa i-ClickHouse sifuna kuphela umatshini omnye ophantsi kwamandla. Ewe, kunjalo, i-ElasticSearch nayo ineendlela zokunyanzeliswa kwedatha kwi-disk kunye nezinye iimpawu ezinokunciphisa kakhulu ukusetyenziswa kwezixhobo, kodwa xa kuthelekiswa neClickHouse oku kuya kufuna iindleko ezinkulu.

Ngaphandle kokulungiswa okukhethekileyo kwicala lethu, kunye nezicwangciso ezingagqibekanga, ukulayisha idatha kunye nokubuyisela idatha kwisiseko sedatha isebenza ngesantya esimangalisayo. Asinayo idatha eninzi okwangoku (malunga neerekhodi ze-200 yezigidi), kodwa iseva ngokwayo ibuthathaka. Sinokusebenzisa esi sixhobo kwixa elizayo ngezinye iinjongo ezingahambelani nokugcinwa kweelog. Ngokomzekelo, kwi-analytics yokuphela kokuphela, kwintsimi yokhuseleko, ukufunda ngomatshini.

Ekugqibeleni, incinci malunga neenzuzo kunye neengxaki.

Минусы

  1. Ilayisha iirekhodi kwiibhetshi ezinkulu. Kwelinye icala, eli liphawu, kodwa kusafuneka usebenzise amacandelo awongezelelweyo ukufihla iirekhodi. Lo msebenzi awusoloko ulula, kodwa usenokusombulula. Kwaye ndingathanda ukwenza lula inkqubo.
  2. Ezinye izinto ezingaqhelekanga okanye izinto ezintsha zihlala ziqhekeka kwiinguqulelo ezintsha. Oku kuphakamisa iinkxalabo, ukunciphisa umnqweno wokunyusela kwinguqulelo entsha. Ngokomzekelo, i-injini yetafile ye-Kafka yinto eluncedo kakhulu evumela ukuba ufunde ngokuthe ngqo iziganeko ezivela eKafka, ngaphandle kokuphumeza abathengi. Kodwa ngokujonga inani leMiba kwi-Github, sisenoloyiko lokusebenzisa le njini kwimveliso. Nangona kunjalo, ukuba awukwenzi ukunyakaza ngokukhawuleza kwicala kwaye usebenzise umsebenzi osisiseko, ngoko usebenza ngokuzinzileyo.

Плюсы

  1. Ayicothi.
  2. Umda wokungena ophantsi.
  3. Vula Umnikezi.
  4. Mahala.
  5. I-scalable (ukwahlulwa/ukuphindwa ngaphandle kwebhokisi)
  6. Ibandakanyiwe kwirejista yesoftware yaseRussia ecetyiswa nguMphathiswa wezoNxibelelwano.
  7. Ukufumaneka kwenkxaso esemthethweni evela kwiYandex.

umthombo: www.habr.com

Yongeza izimvo