Isizindalwazi seClickHouse sabantu, noma ubuchwepheshe be-alien

U-Alexey Lizunov, inhloko yesikhungo samakhono seziteshi ezikude ze-Information Technology Directorate ye-ICB

Isizindalwazi seClickHouse sabantu, noma ubuchwepheshe be-alien

Njengenye indlela yesitaki se-ELK (ElasticSearch, Logstash, Kibana), senza ucwaningo ngokusebenzisa isizindalwazi se-ClickHouse njengendawo yokugcina idatha yamalogi.

Kulesi sihloko sithanda ukukhuluma ngolwazi lwethu sisebenzisa i-ClickHouse database kanye nemiphumela yokuqala evela ekusebenzeni komshayeli. Kuyafaneleka ukuqaphela zisuka nje ukuthi imiphumela yaba umxhwele.


Isizindalwazi seClickHouse sabantu, noma ubuchwepheshe be-alien

Okulandelayo sizochaza ngokuningiliziwe ukuthi isistimu yethu imiswa kanjani nokuthi iqukethe ziphi izingxenye. Kodwa manje ngithanda ukukhuluma kancane ngale database iyonke, nokuthi kungani kufanelekile ukuyinaka. Isizindalwazi seClickHouse siyisizindalwazi sekholamu yokuhlaziya esebenza kahle kakhulu evela ku-Yandex. Isetshenziswa ezinsizeni ze-Yandex, ekuqaleni lesi isitoreji sedatha esiyinhloko se-Yandex.Metrica. Isistimu yomthombo ovulekile, mahhala. Ngokombono womthuthukisi, bengihlala ngizibuza ukuthi bakusebenzisa kanjani lokhu, ngoba kunedatha enkulu ngendlela emangalisayo. Futhi i-interface yomsebenzisi ye-Metrica ngokwayo iguquguquka kakhulu futhi isebenza ngokushesha. Lapho uqala ukujwayelana nalesi sizindalwazi, uthola umbono: “Awu, ekugcineni! Yenzelwe “abantu”! Ukusuka ohlelweni lokufaka kuya ekuthumeleni izicelo.”

Le database inesithiyo sokungena esiphansi kakhulu. Ngisho nonjiniyela omaphakathi angafaka le database emizuzwini embalwa bese eqala ukuyisebenzisa. Konke kusebenza kahle. Ngisho nabantu abasha ku-Linux bangakwazi ukubhekana ngokushesha nokufaka futhi benze imisebenzi elula. Uma ngaphambili, lapho ezwa amagama athi Big Data, Hadoop, Google BigTable, HDFS, umthuthukisi ovamile wayenombono wokuthi bakhuluma ngamanye ama-terabytes, ama-petabytes, ukuthi amanye ama-superhumans ayehileleke ekumiseni nasekuthuthukiseni lezi zinhlelo, khona-ke ngokufika. kusizindalwazi seClickHouse sithole ithuluzi elilula, eliqondakalayo ongaxazulula ngalo uhla lwezinkinga ebezingafinyeleleki ngaphambilini. Okudingekayo nje umshini owodwa olingana kahle nemizuzu emihlanu ukuwufaka. Okusho ukuthi, sithole i-database efana, isibonelo, i-MySql, kodwa kuphela yokugcina izigidigidi zamarekhodi! Uhlobo lwe-superarchiver enolimi lwe-SQL. Kufana nokuthi abantu banikezwe izikhali zangaphandle.

Mayelana nesistimu yethu yokuqoqa amalogi

Ukuze kuqoqwe ulwazi, kusetshenziswa amafayela welogi we-IIS wezinhlelo zokusebenza zewebhu zefomethi evamile (njengamanje simatasatasa nokuhlaziya izingodo zohlelo lokusebenza, kodwa umgomo wethu oyinhloko esigabeni sokuhlola ukuqoqa amalogi e-IIS).

Asikwazanga ukushiya ngokuphelele isitaki se-ELK ngezizathu ezahlukahlukene, futhi siyaqhubeka sisebenzisa i-LogStash ne-Filebeat izingxenye, eziye zazibonakalisa kahle futhi zisebenza ngokuthembekile nangokubikezela.

Uhlelo olujwayelekile lokugawula luboniswa emfanekisweni ongezansi:

Isizindalwazi seClickHouse sabantu, noma ubuchwepheshe be-alien

Isici sokurekhoda idatha kusizindalwazi se-ClickHouse ukufakwa okungavamile (kanye ngomzuzwana) kwamarekhodi ngamaqoqo amakhulu. Lokhu, ngokusobala, yingxenye "eyinkinga" kakhulu ohlangabezana nayo lapho usebenza ne-ClickHouse database okokuqala ngqa: uhlelo luba nzima nakakhulu.
I-plugin ye-LogStash, efaka idatha ngokuqondile ku-ClickHouse, isize kakhulu lapha. Le ngxenye isetshenziswa kuseva efanayo nesizindalwazi ngokwaso. Ngakho-ke, ngokuvamile, akunconywa ukwenza lokhu, kodwa ngokombono osebenzayo, ukuze ungadali amaseva ahlukene ngenkathi kuthunyelwa kuseva efanayo. Asibonanga noma yikuphi ukwehluleka noma ukungqubuzana kwensiza nesizindalwazi. Ngaphezu kwalokho, kufanele kuqashelwe ukuthi i-plugin inomshini wokubuyisela uma kwenzeka amaphutha. Futhi uma kwenzeka amaphutha, i-plugin ibhalela kudiski inqwaba yedatha engakwazi ukufakwa (ifomethi yefayela ilungile: ngemva kokuhlela, ungakwazi ukufaka kalula inqwaba elungisiwe usebenzisa i-clickhouse-client).

Uhlu oluphelele lwesoftware esetshenziswa ohlelweni lwethulwa etafuleni:

Uhlu lwesoftware esetshenzisiwe

Isihloko

Incazelo

Xhumanisa ekusabalaliseni

NGINX

I-Reverse-proxy yokukhawulela ukufinyelela ngembobo nokuhlela ukugunyazwa

Okwamanje ayisetshenziswa ohlelweni

https://nginx.org/ru/download.html

https://nginx.org/download/nginx-1.16.0.tar.gz

I-FileBeat

Ukudluliswa kwamalogi wefayela.

https://www.elastic.co/downloads/beats/filebeat (ukusatshalaliswa kweWindows 64bit).

https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.3.0-windows-x86_64.zip

I-LogStash

Umqoqi welogi.

Isetshenziselwa ukuqoqa izingodo ku-FileBeat, kanye nokuqoqa izingodo kulayini we-RabbitMQ (kumaseva atholakala ku-DMZ.)

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

I-logstash-outout-clickhouse

I-plugin ye-Loagstash yokudlulisa izingodo kusizindalwazi se-ClickHouse ngamaqoqo

https://github.com/mikechris/logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin faka i-logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin faka i-logstash-filter-prune

/usr/share/logstash/bin/logstash-plugin faka i-logstash-filter-multiline

ChofozaHouse

Isitoreji selogi https://clickhouse.yandex/docs/ru/

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-server-19.5.3.8-1.el7.x86_64.rpm

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-client-19.5.3.8-1.el7.x86_64.rpm

Qaphela. Kusukela ngo-Agasti 2018, ukwakhiwa kwe-rpm "okujwayelekile" kwe-RHEL kuvele endaweni yokugcina ye-Yandex, ukuze uzame ukuzisebenzisa. Ngesikhathi sokufakwa besisebenzisa amaphakheji ahlanganiswe yi-Altinity.

UGrafana

Ukubona ngeso lengqondo izingodo. Ukusetha amadeshibhodi

https://grafana.com/

https://grafana.com/grafana/download

I-Redhat & Centos(64 Bit) - inguqulo yakamuva

Umthombo wedatha we-ClickHouse we-Grafana 4.6+

I-plugin ye-Grafana enomthombo wedatha we-ClickHouse

https://grafana.com/plugins/vertamedia-clickhouse-datasource

https://grafana.com/api/plugins/vertamedia-clickhouse-datasource/versions/1.8.1/download

I-LogStash

I-router yokungena kusuka ku-FileBeat kuya kumugqa we-RabbitMQ.

Qaphela. Ngeshwa i-FileBeat ayinakho okukhiphayo ngqo ku-RabbitMQ, ngakho-ke isixhumanisi esimaphakathi esisesimweni se-Logstash siyadingeka.

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

RabbitMQ

Ulayini womlayezo. Lesi isilondolozi sokufakiwe kwelogi ku-DMZ

https://www.rabbitmq.com/download.html

https://github.com/rabbitmq/rabbitmq-server/releases/download/v3.7.14/rabbitmq-server-3.7.14-1.el7.noarch.rpm

I-Erlang Runtime (Iyadingeka ku-RabbitMQ)

Isikhathi sokusebenza se-Erlang. Iyadingeka ukuze i-RabbitMQ isebenze

http://www.erlang.org/download.html

https://www.rabbitmq.com/install-rpm.html#install-erlang http://www.erlang.org/downloads/21.3

Ukucushwa kweseva ngesizindalwazi se-ClickHouse kuvezwa kuthebula elilandelayo:

Isihloko

Okushoyo

Ukubhala

Ukucushwa

I-HDD: 40GB
RAM: 8GB
Iprosesa: I-Core 2 2Ghz

Kufanele unake amathiphu okusebenzisa i-ClickHouse database (https://clickhouse.yandex/docs/ru/operations/tips/)

Isofthiwe yohlelo olubanzi

I-OS: Iseva ye-Red Hat Enterprise Linux (Maipo)

I-JRE (Java 8)

 

Njengoba ubona, lesi yisikhungo sokusebenza esijwayelekile.

Isakhiwo sethebula lokugcina izingodo simi kanje:

log_web.sql

CREATE TABLE log_web (
  logdate Date,
  logdatetime DateTime CODEC(Delta, LZ4HC),
   
  fld_log_file_name LowCardinality( String ),
  fld_server_name LowCardinality( String ),
  fld_app_name LowCardinality( String ),
  fld_app_module LowCardinality( String ),
  fld_website_name LowCardinality( String ),
 
  serverIP LowCardinality( String ),
  method LowCardinality( String ),
  uriStem String,
  uriQuery String,
  port UInt32,
  username LowCardinality( String ),
  clientIP String,
  clientRealIP String,
  userAgent String,
  referer String,
  response String,
  subresponse String,
  win32response String,
  timetaken UInt64
   
  , uriQuery__utm_medium String
  , uriQuery__utm_source String
  , uriQuery__utm_campaign String
  , uriQuery__utm_term String
  , uriQuery__utm_content String
  , uriQuery__yclid String
  , uriQuery__region String
 
) Engine = MergeTree()
PARTITION BY toYYYYMM(logdate)
ORDER BY (fld_app_name, fld_app_module, logdatetime)
SETTINGS index_granularity = 8192;

Sisebenzisa amanani azenzakalelayo wokuhlukanisa (nyanga zonke) kanye ne-index granularity. Zonke izinkambu zihambisana nokufakiwe kwelogi ye-IIS ukuze kuqoshwe izicelo ze-http. Ngokwehlukana, siphawula ukuthi kunezinkambu ezihlukene zokugcina amathegi e-utm (ahlukaniswa esigabeni sokufaka etafuleni kusukela kunkambu yentambo yombuzo).

Futhi, izinkambu zesistimu ezimbalwa zengezwe etafuleni ukuze kugcinwe ulwazi mayelana namasistimu, izingxenye, namaseva. Ukuze uthole incazelo yalezi zinkambu, bheka ithebula elingezansi. Kuthebula elilodwa sigcina izingodo zamasistimu amaningana.

Isihloko

Incazelo

Isibonelo:

fld_app_name

Igama lohlelo/lohlelo
Amanani avumelekile:

  • site1.domain.com Isiza sangaphandle 1
  • site2.domain.com Isiza sangaphandle 2
  • Isizinda-sangaphakathi1.domain.local Isizinda sangaphakathi 1

site1.domain.com

fld_app_module

Imojula yesistimu
Amanani avumelekile:

  • iwebhu - Iwebhusayithi
  • svc - Isevisi yewebhu yewebhusayithi
  • intgr — Isevisi yokuhlanganisa iwebhu
  • bo - Administrator (BackOffice)

web

fld_igama_lewebhusayithi

Igama lesayithi ku-IIS

Amasistimu amaningana angafakwa kuseva eyodwa, noma ngisho nezimo ezimbalwa zemojuli yesistimu eyodwa

web-main

fld_server_name

Igama leseva

web1.domain.com

fld_log_file_name

Indlela eya kufayela lokungena kuseva

Kusuka ku:inetpublogsLogFiles
W3SVC1u_ex190711.log

Lokhu kukuvumela ukuthi wakhe amagrafu ngempumelelo e-Grafana. Isibonelo, buka izicelo kusukela ekupheleni kwesistimu ethile. Lokhu kufana nekhawunta yesayithi ku-Yandex.Metrica.

Nazi ezinye izibalo zokusebenzisa isizindalwazi izinyanga ezimbili.

Inombolo yamarekhodi ngohlelo nengxenye

SELECT
    fld_app_name,
    fld_app_module,
    count(fld_app_name) AS rows_count
FROM log_web
GROUP BY
    fld_app_name,
    fld_app_module
    WITH TOTALS
ORDER BY
    fld_app_name ASC,
    rows_count DESC
 
┌─fld_app_name─────┬─fld_app_module─┬─rows_count─┐
│ site1.domain.ru  │ web            │     131441 │
│ site2.domain.ru  │ web            │    1751081 │
│ site3.domain.ru  │ web            │  106887543 │
│ site3.domain.ru  │ svc            │   44908603 │
│ site3.domain.ru  │ intgr          │    9813911 │
│ site4.domain.ru  │ web            │     772095 │
│ site5.domain.ru  │ web            │   17037221 │
│ site5.domain.ru  │ intgr          │     838559 │
│ site5.domain.ru  │ bo             │       7404 │
│ site6.domain.ru  │ web            │     595877 │
│ site7.domain.ru  │ web            │   27778858 │
└──────────────────┴────────────────┴────────────┘
 
Totals:
┌─fld_app_name─┬─fld_app_module─┬─rows_count─┐
│              │                │  210522593 │
└──────────────┴────────────────┴────────────┘
 
11 rows in set. Elapsed: 4.874 sec. Processed 210.52 million rows, 421.67 MB (43.19 million rows/s., 86.51 MB/s.)

Ivolumu yedatha yediski

SELECT
    formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed,
    formatReadableSize(sum(data_compressed_bytes)) AS compressed,
    sum(rows) AS total_rows
FROM system.parts
WHERE table = 'log_web'
 
┌─uncompressed─┬─compressed─┬─total_rows─┐
│ 54.50 GiB    │ 4.86 GiB   │  211427094 │
└──────────────┴────────────┴────────────┘
 
1 rows in set. Elapsed: 0.035 sec.

Isilinganiso sokuminyanisa idatha yekholomu

SELECT
    name,
    formatReadableSize(data_uncompressed_bytes) AS uncompressed,
    formatReadableSize(data_compressed_bytes) AS compressed,
    data_uncompressed_bytes / data_compressed_bytes AS compress_ratio
FROM system.columns
WHERE table = 'log_web'
 
┌─name───────────────────┬─uncompressed─┬─compressed─┬─────compress_ratio─┐
│ logdate                │ 401.53 MiB   │ 1.80 MiB   │ 223.16665968777315 │
│ logdatetime            │ 803.06 MiB   │ 35.91 MiB  │ 22.363966401202305 │
│ fld_log_file_name      │ 220.66 MiB   │ 2.60 MiB   │  84.99905736932571 │
│ fld_server_name        │ 201.54 MiB   │ 50.63 MiB  │  3.980924816977078 │
│ fld_app_name           │ 201.17 MiB   │ 969.17 KiB │ 212.55518183686877 │
│ fld_app_module         │ 201.17 MiB   │ 968.60 KiB │ 212.67805817411906 │
│ fld_website_name       │ 201.54 MiB   │ 1.24 MiB   │  162.7204926761546 │
│ serverIP               │ 201.54 MiB   │ 50.25 MiB  │  4.010824061219731 │
│ method                 │ 201.53 MiB   │ 43.64 MiB  │  4.617721053304486 │
│ uriStem                │ 5.13 GiB     │ 832.51 MiB │  6.311522291936919 │
│ uriQuery               │ 2.58 GiB     │ 501.06 MiB │  5.269731450124478 │
│ port                   │ 803.06 MiB   │ 3.98 MiB   │ 201.91673864241824 │
│ username               │ 318.08 MiB   │ 26.93 MiB  │ 11.812513794583598 │
│ clientIP               │ 2.35 GiB     │ 82.59 MiB  │ 29.132328640073343 │
│ clientRealIP           │ 2.49 GiB     │ 465.05 MiB │  5.478382297052563 │
│ userAgent              │ 18.34 GiB    │ 764.08 MiB │  24.57905114484208 │
│ referer                │ 14.71 GiB    │ 1.37 GiB   │ 10.736792723669906 │
│ response               │ 803.06 MiB   │ 83.81 MiB  │  9.582334090987247 │
│ subresponse            │ 399.87 MiB   │ 1.83 MiB   │  218.4831068635027 │
│ win32response          │ 407.86 MiB   │ 7.41 MiB   │ 55.050315514606815 │
│ timetaken              │ 1.57 GiB     │ 402.06 MiB │ 3.9947395692010637 │
│ uriQuery__utm_medium   │ 208.17 MiB   │ 12.29 MiB  │ 16.936148912472955 │
│ uriQuery__utm_source   │ 215.18 MiB   │ 13.00 MiB  │ 16.548367623199912 │
│ uriQuery__utm_campaign │ 381.46 MiB   │ 37.94 MiB  │ 10.055156353418509 │
│ uriQuery__utm_term     │ 231.82 MiB   │ 10.78 MiB  │ 21.502540454070672 │
│ uriQuery__utm_content  │ 441.34 MiB   │ 87.60 MiB  │  5.038260760449327 │
│ uriQuery__yclid        │ 216.88 MiB   │ 16.58 MiB  │  13.07721335008116 │
│ uriQuery__region       │ 204.35 MiB   │ 9.49 MiB   │  21.52661903446796 │
└────────────────────────┴──────────────┴────────────┴────────────────────┘
 
28 rows in set. Elapsed: 0.005 sec.

Incazelo yezingxenye ezisetshenzisiwe

I-FileBeat. Kudluliswa amalogu efayela

Le ngxenye iqapha izinguquko ukuze ungene amafayela kudiski futhi idlulisele ulwazi ku-LogStash. Kufakwe kuwo wonke amaseva lapho kubhalwe khona amafayela welogi (ngokuvamile i-IIS). Isebenza kumodi yomsila (okungukuthi, idlulisela kuphela amarekhodi angeziwe efayeleni). Kodwa ungayilungisa ngokuhlukana ukuze udlulise wonke amafayela. Lokhu kulula uma udinga ukulanda idatha yezinyanga ezedlule. Vele ufake ifayela lokungena kufolda futhi lizolifunda lonke.

Uma isevisi ima, idatha iyayeka ukudluliselwa kwisitoreji.

Ukucushwa kwesibonelo kubukeka kanje:

filebeat.yml

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/W3SVC1/*.log
  exclude_files: ['.gz$','.zip$']
  tail_files: true
  ignore_older: 24h
  fields:
    fld_server_name: "site1.domain.ru"
    fld_app_name: "site1.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
 
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/__Import/access_log-*
  exclude_files: ['.gz$','.zip$']
  tail_files: false
  fields:
    fld_server_name: "site2.domain.ru"
    fld_app_name: "site2.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
    fld_logformat: "logformat__apache"
 
 
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
  reload.period: 2s
 
output.logstash:
  hosts: ["log.domain.com:5044"]
 
  ssl.enabled: true
  ssl.certificate_authorities: ["C:/filebeat/certs/ca.pem", "C:/filebeat/certs/ca-issuing.pem"]
  ssl.certificate: "C:/filebeat/certs/site1.domain.ru.cer"
  ssl.key: "C:/filebeat/certs/site1.domain.ru.key"
 
#================================ Processors =====================================
 
processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~

I-LogStash. Umqoqi Welogi

Le ngxenye iklanyelwe ukwamukela amarekhodi elogi avela ku-FileBeat (noma ngomugqa we-RabbitMQ), ahlaziye futhi uwafake ngamaqoqo kusizindalwazi se-ClickHouse.

Ukuze ufake ku-ClickHouse, sebenzisa i-plugin ye-Logstash-output-clickhouse. I-plugin ye-Logstash inendlela yokubuyisela izicelo, kodwa ngesikhathi sokuvalwa okujwayelekile, kungcono ukumisa insiza ngokwayo. Uma imisiwe, imilayezo izonqwabelana kulayini we-RabbitMQ, ngakho-ke uma isitobhi sithatha isikhathi eside, kungcono ukumisa ama-Filebeats kumaseva. Kuhlelo lapho i-RabbitMQ ingasetshenziswa khona (kunethiwekhi yendawo i-Filebeat ithumela ngokuqondile izingodo ku-Logstash), ama-Filebeats asebenza ngokwamukelekile futhi aphephile, ngakho kubo ukungatholakali kokukhipha akunamiphumela.

Ukucushwa kwesibonelo kubukeka kanje:

log_web__filebeat_clickhouse.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/logstash/certs/ca.cer", "/etc/logstash/certs/ca-issuing.cer"]
        ssl_certificate => "/etc/logstash/certs/server.cer"
        ssl_key => "/etc/logstash/certs/server-pkcs8.key"
        ssl_verify_mode => "peer"
 
            add_field => {
                "fld_server_name" => "%{[fields][fld_server_name]}"
                "fld_app_name" => "%{[fields][fld_app_name]}"
                "fld_app_module" => "%{[fields][fld_app_module]}"
                "fld_website_name" => "%{[fields][fld_website_name]}"
                "fld_log_file_name" => "%{source}"
                "fld_logformat" => "%{[fields][fld_logformat]}"
            }
    }
 
    rabbitmq {
        host => "queue.domain.com"
        port => 5671
        user => "q-reader"
        password => "password"
        queue => "web_log"
        heartbeat => 30
        durable => true
        ssl => true
        #ssl_certificate_path => "/etc/logstash/certs/server.p12"
        #ssl_certificate_password => "password"
 
        add_field => {
            "fld_server_name" => "%{[fields][fld_server_name]}"
            "fld_app_name" => "%{[fields][fld_app_name]}"
            "fld_app_module" => "%{[fields][fld_app_module]}"
            "fld_website_name" => "%{[fields][fld_website_name]}"
            "fld_log_file_name" => "%{source}"
            "fld_logformat" => "%{[fields][fld_logformat]}"
        }
    }
 
}
 
filter { 
 
      if [message] =~ "^#" {
        drop {}
      }
 
      if [fld_logformat] == "logformat__iis_with_xrealip" {
     
          grok {
            match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken} %{NOTSPACE:xrealIP} %{NOTSPACE:xforwarderfor}"]
          }
      } else {
   
          grok {
             match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken}"]
          }
 
      }
 
      date {
        match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
          timezone => "Etc/UTC"
        remove_field => [ "log_timestamp", "@timestamp" ]
        target => [ "log_timestamp2" ]
      }
 
        ruby {
            code => "tstamp = event.get('log_timestamp2').to_i
                        event.set('logdatetime', Time.at(tstamp).strftime('%Y-%m-%d %H:%M:%S'))
                        event.set('logdate', Time.at(tstamp).strftime('%Y-%m-%d'))"
        }
 
      if [bytesSent] {
        ruby {
          code => "event['kilobytesSent'] = event['bytesSent'].to_i / 1024.0"
        }
      }
 
 
      if [bytesReceived] {
        ruby {
          code => "event['kilobytesReceived'] = event['bytesReceived'].to_i / 1024.0"
        }
      }
 
   
        ruby {
            code => "event.set('clientRealIP', event.get('clientIP'))"
        }
        if [xrealIP] {
            ruby {
                code => "event.set('clientRealIP', event.get('xrealIP'))"
            }
        }
        if [xforwarderfor] {
            ruby {
                code => "event.set('clientRealIP', event.get('xforwarderfor'))"
            }
        }
 
      mutate {
        convert => ["bytesSent", "integer"]
        convert => ["bytesReceived", "integer"]
        convert => ["timetaken", "integer"] 
        convert => ["port", "integer"]
 
        add_field => {
            "clientHostname" => "%{clientIP}"
        }
      }
 
        useragent {
            source=> "useragent"
            prefix=> "browser"
        }
 
        kv {
            source => "uriQuery"
            prefix => "uriQuery__"
            allow_duplicate_values => false
            field_split => "&"
            include_keys => [ "utm_medium", "utm_source", "utm_campaign", "utm_term", "utm_content", "yclid", "region" ]
        }
 
        mutate {
            join => { "uriQuery__utm_source" => "," }
            join => { "uriQuery__utm_medium" => "," }
            join => { "uriQuery__utm_campaign" => "," }
            join => { "uriQuery__utm_term" => "," }
            join => { "uriQuery__utm_content" => "," }
            join => { "uriQuery__yclid" => "," }
            join => { "uriQuery__region" => "," }
        }
 
}
 
output { 
  #stdout {codec => rubydebug}
    clickhouse {
      headers => ["Authorization", "Basic abcdsfks..."]
      http_hosts => ["http://127.0.0.1:8123"]
      save_dir => "/etc/logstash/tmp"
      table => "log_web"
      request_tolerance => 1
      flush_size => 10000
      idle_flush_time => 1
        mutations => {
            "fld_log_file_name" => "fld_log_file_name"
            "fld_server_name" => "fld_server_name"
            "fld_app_name" => "fld_app_name"
            "fld_app_module" => "fld_app_module"
            "fld_website_name" => "fld_website_name"
 
            "logdatetime" => "logdatetime"
            "logdate" => "logdate"
            "serverIP" => "serverIP"
            "method" => "method"
            "uriStem" => "uriStem"
            "uriQuery" => "uriQuery"
            "port" => "port"
            "username" => "username"
            "clientIP" => "clientIP"
            "clientRealIP" => "clientRealIP"
            "userAgent" => "userAgent"
            "referer" => "referer"
            "response" => "response"
            "subresponse" => "subresponse"
            "win32response" => "win32response"
            "timetaken" => "timetaken"
             
            "uriQuery__utm_medium" => "uriQuery__utm_medium"
            "uriQuery__utm_source" => "uriQuery__utm_source"
            "uriQuery__utm_campaign" => "uriQuery__utm_campaign"
            "uriQuery__utm_term" => "uriQuery__utm_term"
            "uriQuery__utm_content" => "uriQuery__utm_content"
            "uriQuery__yclid" => "uriQuery__yclid"
            "uriQuery__region" => "uriQuery__region"
        }
    }
 
}

amapayipi.yml

# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
#   https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html
 
- pipeline.id: log_web__filebeat_clickhouse
  path.config: "/etc/logstash/log_web__filebeat_clickhouse.conf"

ClickHouse. Isitoreji selogi

Amalogi awo wonke amasistimu agcinwa etafuleni elilodwa (bheka ekuqaleni kwesihloko). Iklanyelwe ukugcina ulwazi mayelana nezicelo: wonke amapharamitha ayafana kumafomethi ahlukene, isibonelo amalogi e-IIS, amalogi we-apache kanye ne-nginx. Kumalogi ohlelo lokusebenza lapho, isibonelo, amaphutha, imilayezo yolwazi, izixwayiso zirekhodwa, itafula elihlukile lizonikezwa ngesakhiwo esifanele (okwamanje esigabeni sokuklama).

Lapho uklama ithebula, kubaluleke kakhulu ukunquma ngokhiye oyinhloko (okuzohlungwa ngawo idatha ngesikhathi sokugcina). Izinga lokuminyaniswa kwedatha nesivinini sombuzo sincike kulokhu. Esibonelweni sethu, isihluthulelo siwukuthi
UKUHLELA NGO (fld_app_name, fld_app_module, logdatetime)
Okusho ukuthi, ngegama lesistimu, igama lengxenye yesistimu kanye nosuku lomcimbi. Ekuqaleni, usuku lomcimbi lwafika kuqala. Ngemva kokuyisusa endaweni yokugcina, imibuzo yaqala ukusebenza cishe ngokushesha okuphindwe kabili. Ukushintsha ukhiye oyinhloko kuzodinga ukudala kabusha ithebula futhi ulayishe kabusha idatha ukuze i-ClickHouse izohlela kabusha idatha kudiski. Lokhu kuwumsebenzi onzima, ngakho-ke kuhle ukucabanga ngokucophelela kusengaphambili mayelana nokuthi yini okufanele ifakwe kukhiye wokuhlunga.

Kufanele futhi kuqashelwe ukuthi uhlobo lwedatha ye-LowCardinality luvele kuzinguqulo zakamuva. Uma uyisebenzisa, ubukhulu bedatha ecindezelwe buncishiswa kakhulu kulawo masimu ane-cardinal ephansi (izinketho ezimbalwa).

Njengamanje sisebenzisa inguqulo 19.6 futhi sihlela ukuzama ukuthuthukela enguqulweni yakamuva. Banezici ezimangalisayo ezifana ne-Adaptive Granularity, izinkomba zokweqa kanye ne-DoubleDelta codec, isibonelo.

Ngokuzenzakalelayo, ngesikhathi sokufakwa izinga lokungena lokucushwa lisethwa ukuze lilandele umkhondo. Amalogi ajikeleziswa futhi agcinwe kungobo yomlando, kodwa ngesikhathi esifanayo akhula aze afike ku-gigabyte. Uma singekho isidingo, khona-ke ungasetha izinga lokuxwayisa, khona-ke usayizi welogi uzokwehla kakhulu. Izilungiselelo zokungena zicaciswe kufayela le-config.xml:

<!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger. h#L105 -->
<level>warning</level>

Eminye imiyalo ewusizo

Поскольку оригинальные пакеты установки собираются по Debian, то для других версий Linux необходимо использовать пакеты собранные компанией Altinity.
 
Вот по этой ссылке есть инструкции с ссылками на их репозиторий: https://www.altinity.com/blog/2017/12/18/logstash-with-clickhouse
sudo yum search clickhouse-server
sudo yum install clickhouse-server.noarch
  
1. проверка статуса
sudo systemctl status clickhouse-server
 
2. остановка сервера
sudo systemctl stop clickhouse-server
 
3. запуск сервера
sudo systemctl start clickhouse-server
 
Запуск для выполнения запросов в многострочном режиме (выполнение после знака ";")
clickhouse-client --multiline
clickhouse-client --multiline --host 127.0.0.1 --password pa55w0rd
clickhouse-client --multiline --host 127.0.0.1 --port 9440 --secure --user default --password pa55w0rd
 
Плагин кликлауза для логстеш в случае ошибки в одной строке сохраняет всю пачку в файл /tmp/log_web_failed.json
Можно вручную исправить этот файл и попробовать залить его в БД вручную:
clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /tmp/log_web_failed__fixed.json
 
sudo mv /etc/logstash/tmp/log_web_failed.json /etc/logstash/tmp/log_web_failed__fixed.json
sudo chown user_dev /etc/logstash/tmp/log_web_failed__fixed.json
sudo clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /etc/logstash/tmp/log_web_failed__fixed.json
sudo mv /etc/logstash/tmp/log_web_failed__fixed.json /etc/logstash/tmp/log_web_failed__fixed_.json
 
выход из командной строки
quit;
## Настройка TLS
https://www.altinity.com/blog/2019/3/5/clickhouse-networking-part-2
 
openssl s_client -connect log.domain.com:9440 < /dev/null

I-LogStash. I-router yokungena kusuka ku-FileBeat kuya kumugqa we-RabbitMQ

Le ngxenye isetshenziselwa ukuhambisa amalogi asuka ku-FileBeat aya kulayini we-RabbitMQ. Kunamaphuzu amabili lapha:

  1. Ngeshwa, i-FileBeat ayinayo i-plugin yokuphumayo yokubhala ngokuqondile ku-RabbitMQ. Futhi ukusebenza okunjalo, ukwahlulela ngokuthunyelwe ku-github yabo, akuhlelelwe ukuqaliswa. Kukhona i-plugin ye-Kafka, kodwa ngenxa yezizathu ezithile asikwazi ukuyisebenzisela thina.
  2. Kunezidingo zokuqoqa amalogi ku-DMZ. Ngokusekelwe kuwo, amalogi kufanele aqale afakwe kulayini bese i-LogStash ifunda amarekhodi avela kulayini ngaphandle.

Ngakho-ke, ikakhulukazi esimweni samaseva atholakala ku-DMZ, kuyadingeka ukusebenzisa uhlelo oluyinkimbinkimbi kangaka. Ukucushwa kwesibonelo kubukeka kanje:

iis_w3c_logs__filebeat_rabbitmq.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/pki/tls/certs/app/ca.pem", "/etc/pki/tls/certs/app/ca-issuing.pem"]
        ssl_certificate => "/etc/pki/tls/certs/app/queue.domain.com.cer"
        ssl_key => "/etc/pki/tls/certs/app/queue.domain.com-pkcs8.key"
        ssl_verify_mode => "peer"
    }
 
}
 
output { 
  #stdout {codec => rubydebug}
 
    rabbitmq {
        host => "127.0.0.1"
        port => 5672
        exchange => "monitor.direct"
        exchange_type => "direct"
        key => "%{[fields][fld_app_name]}"
        user => "q-writer"
        password => "password"
        ssl => false
    }
}

I-RabbitMQ. Umugqa Womlayezo

Le ngxenye isetshenziselwa ukulondoloza ukungena kwelogi ku-DMZ. Ukurekhoda kwenziwa ngesixhumanisi se-Filebeat → LogStash. Ukufunda kwenziwa ngaphandle kwe-DMZ nge-LogStash. Uma usebenza nge-RabbitMQ, cishe imilayezo eyizinkulungwane ezi-4 ngomzuzwana iyacutshungulwa.

Umzila womlayezo ulungiswa ngegama lesistimu, okungukuthi, ngokusekelwe kudatha yokumisa ye-FileBeat. Yonke imilayezo ingena kulayini owodwa. Uma ngesizathu esithile isevisi yokumisa imisiwe, lokhu ngeke kuholele ekulahlekeni komlayezo: I-FileBeats izothola amaphutha okuxhumana futhi izomisa okwesikhashana ukuthumela. Futhi i-LogStash, efundeka kulayini, izophinde ithole amaphutha enethiwekhi futhi ilinde ukuthi uxhumano lubuyiselwe. Kulokhu, kunjalo, idatha ngeke isabhalwa ku-database.

Imiyalo elandelayo isetshenziswa ukudala nokumisa olayini:

sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare exchange --vhost=/ name=monitor.direct type=direct sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare queue --vhost=/ name=web_log durable=true
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site1.domain.ru"
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site2.domain.ru"

Grafana. Amadeshibhodi

Le ngxenye isetshenziselwa ukubona ngeso lengqondo idatha yokuqapha. Kulokhu, udinga ukufaka umthombo wedatha we-ClickHouse we-plugin ye-Grafana 4.6+. Bekufanele siyilungise kancane ukuze sithuthukise ukusebenza kahle kokucubungula izihlungi ze-SQL kudeshibhodi.

Isibonelo, sisebenzisa okuguquguqukayo, futhi uma kungashiwongo endaweni yokuhlunga, ngakho-ke singathanda ukuthi ingakhiqizi isimo kokuthi LAPHO kwefomu ( uriStem = "FUTHI uriStem != "). Kulokhu, i-ClickHouse izofunda ikholomu ye-uriStem. Ngakho-ke, sizame izinketho ezihlukene futhi ekugcineni salungisa i-plugin (i-$valueIfEmpty macro) ukuze sibuyisele 1 uma kunenani elingenalutho, ngaphandle kokusho ikholomu ngokwayo.

Futhi manje ungasebenzisa lo mbuzo kugrafu

$columns(response, count(*) c) from $table where $adhoc
and $valueIfEmpty($fld_app_name, 1, fld_app_name = '$fld_app_name')
and $valueIfEmpty($fld_app_module, 1, fld_app_module = '$fld_app_module') and $valueIfEmpty($fld_server_name, 1, fld_server_name = '$fld_server_name') and $valueIfEmpty($uriStem, 1, uriStem like '%$uriStem%')
and $valueIfEmpty($clientRealIP, 1, clientRealIP = '$clientRealIP')

eguqulelwa ku-SQL kanje (qaphela ukuthi izinkambu ze-uriStem ezingenalutho ziguqulelwa ku-1 nje)

SELECT
t,
groupArray((response, c)) AS groupArr
FROM (
SELECT
(intDiv(toUInt32(logdatetime), 60) * 60) * 1000 AS t, response,
count(*) AS c FROM default.log_web
WHERE (logdate >= toDate(1565061982)) AND (logdatetime >= toDateTime(1565061982)) AND 1 AND (fld_app_name = 'site1.domain.ru') AND (fld_app_module = 'web') AND 1 AND 1 AND 1
GROUP BY
t, response
ORDER BY
t ASC,
response ASC
)
GROUP BY t ORDER BY t ASC

isiphetho

Ukubonakala kwesizindalwazi se-ClickHouse sekuyisenzakalo esiyingqopha-mlando emakethe. Kwakunzima ukucabanga ukuthi ngokuphazima kweso, mahhala ngokuphelele, sasihlome ngethuluzi elinamandla nelisebenzayo lokusebenza ngedatha enkulu. Kunjalo, njengoba izidingo zikhula (isibonelo, ukwaba nokuphindaphinda kumaseva amaningi), uhlelo luzoba nzima kakhulu. Kodwa ngokusho kokuvela kokuqala, ukusebenza nale database kumnandi kakhulu. Kuyacaca ukuthi umkhiqizo wenzelwe "abantu".

Uma kuqhathaniswa ne-ElasticSearch, izindleko zokugcina nokucubungula amalogi, ngokuya ngezilinganiso zokuqala, zehliswa izikhathi ezinhlanu kuya kweziyishumi. Ngamanye amazwi, uma ngevolumu yamanje yedatha kuzodingeka simise iqoqo lemishini eminingana, khona-ke lapho sisebenzisa i-ClickHouse sidinga umshini owodwa onamandla aphansi. Yebo, kunjalo, i-ElasticSearch nayo inezinqubo zokucindezelwa kwedatha eku-disk nezinye izici ezinganciphisa kakhulu ukusetshenziswa kwezinsiza, kodwa uma kuqhathaniswa ne-ClickHouse lokhu kuzodinga izindleko ezinkulu.

Ngaphandle kwanoma yikuphi ukulungiselelwa okukhethekile ngakithi, ngezilungiselelo ezizenzakalelayo, ukulayisha idatha kanye nokubuyisa idatha kusizindalwazi kusebenza ngesivinini esimangalisayo. Asinayo idatha eningi okwamanje (mayelana namarekhodi ayizigidi ezingu-200), kodwa iseva ngokwayo ayinamandla. Singasebenzisa leli thuluzi ngokuzayo ngezinye izinjongo ezingahlobene nokugcina amalogi. Isibonelo, ukuhlaziya kokuphela, emkhakheni wezokuphepha, ukufunda komshini.

Ekugcineni, kancane mayelana nezinzuzo nezingozi.

Минусы

  1. Ilayisha amarekhodi ngamaqoqo amakhulu. Ngakolunye uhlangothi, lesi isici, kodwa kusafanele usebenzise izingxenye ezengeziwe ukuze ugcine amarekhodi. Lo msebenzi awuhlali ulula, kodwa usaxazululeka. Futhi ngingathanda ukwenza isikimu sibe lula.
  2. Ezinye izici ezingavamile noma izici ezintsha zivame ukungena ezinguqulweni ezintsha. Lokhu kuphakamisa ukukhathazeka, kunciphisa isifiso sokuthuthukela enguqulweni entsha. Isibonelo, injini yetafula ye-Kafka iyisici esiwusizo kakhulu esikuvumela ukuthi ufunde ngokuqondile imicimbi evela e-Kafka, ngaphandle kokusebenzisa abathengi. Kodwa uma sibheka inani Lezinkinga ku-Github, sisaxwaya ukusebenzisa le njini ekukhiqizeni. Kodwa-ke, uma ungenzi ukunyakaza okungazelelwe ohlangothini futhi usebenzise ukusebenza okuyisisekelo, khona-ke kusebenza ngokuzinzile.

Плюсы

  1. Ayinciphisi.
  2. Umkhawulo wokungena ophansi.
  3. Umthombo ovulekile.
  4. Mahhala.
  5. I-Scalable (i-sharding/out-of-the-box replication)
  6. Kufakwe kurejista yesoftware yaseRussia enconywe uMnyango Wezokuxhumana.
  7. Ukutholakala kosekelo olusemthethweni oluvela ku-Yandex.

Source: www.habr.com

Engeza amazwana