ClickHouse Database ee Aadanaha, ama Tiknoolajiyada Alien

Aleksey Lizunov, Madaxa Xarunta Kartida ee Kanaalka Adeegga Fog ee Agaasinka Tignoolajiyada Macluumaadka ee MKB

ClickHouse Database ee Aadanaha, ama Tiknoolajiyada Alien

Beddelka kaydka ELK (ElasticSearch, Logstash, Kibana), waxaanu samaynaynaa cilmi-baadhis ku saabsan adeegsiga kaydinta xogta ee ClickHouse sida kaydka xogta ee logyada.

Maqaalkan, waxaan jeclaan lahayn inaan ka hadalno khibradayada isticmaalka xogta ClickHouse iyo natiijooyinka hordhaca ah ee hawlgalka tijaabada. Waa in isla markiiba la ogaadaa in natiijadu ay ahaayeen kuwo cajiib ah.


ClickHouse Database ee Aadanaha, ama Tiknoolajiyada Alien

Marka xigta, waxaan si faahfaahsan u sharxi doonaa sida nidaamkayaga loo habeeyey, iyo qaybaha uu ka kooban yahay. Laakiin hadda waxaan jeclaan lahaa inaan wax yar ka hadlo xogtan guud ahaan, iyo sababta ay u habboon tahay in fiiro gaar ah loo yeesho. Keydka ClickHouse waa kaydka tiirarka falanqaynta ee waxqabadka sare leh ee Yandex. Waxaa loo isticmaalaa adeegyada Yandex, marka hore waa kaydinta xogta ugu weyn ee Yandex.Metrica. Nidaamka il furan, bilaash ah. Marka laga eego aragtida horumariyaha, waxaan had iyo jeer la yaabay sida ay u hirgeliyeen, sababtoo ah waxaa jira xog fantastik ah oo weyn. Iyo is dhexgalka isticmaalaha Metrica laftiisa waa mid aad u dabacsan oo degdeg ah. Barashada ugu horreysa ee xogtan, aragtidu waa: "Hagaag, ugu dambeyntii! Dadka loo sameeyay! Laga bilaabo habka rakibidda oo ku dhammaanaysa codsiyada dirista.

Keydkani waxa uu leeyahay meel aad u hoosaysa. Xitaa horumariye xirfad leh ayaa ku rakibi kara xogtan dhowr daqiiqo gudahood oo bilaabi kara isticmaalkeeda. Wax walba si cad ayay u shaqeeyaan. Xitaa dadka ku cusub Linux waxay si dhakhso ah u maareyn karaan rakibaadda waxayna samayn karaan hawlgallada ugu fudud. Haddii hore, oo leh ereyada Big Data, Hadoop, Google BigTable, HDFS, horumariye caadi ah ayaa lahaa fikrado ah in ay ku saabsan tahay terabytes, petabytes, in qaar ka mid ah bini'aadanku ay ku hawlan yihiin dejinta iyo horumarinta nidaamyadan, ka dibna soo-saarka ClickHouse database, waxaan helnay qalab fudud, la fahmi karo kaas oo aad ku xallin karto hawlo kala duwan oo aan hore loo gaadhi karin. Kaliya waxay qaadataa hal mashiin oo dhexdhexaad ah iyo shan daqiiqo in la rakibo. Taasi waa, waxaan helnay xogta sida, tusaale ahaan, MySql, laakiin kaliya loogu talagalay kaydinta balaayiin rikoor ah! Arkiver gaar ah oo leh luqadda SQL. Waxay la mid tahay dadka loo gacan geliyay hubkii shisheeyaha.

Ku saabsan nidaamkayaga jaridda

Si loo ururiyo macluumaadka, faylasha IIS ee qaabka caadiga ah ee codsiyada webka ayaa la isticmaalaa (sidoo kale waxaan hadda ku jirnaa xisaabinta diiwaannada codsiga, laakiin ujeedada ugu weyn ee marxaladda tijaabada waa in la ururiyo IIS logs).

Sababo kala duwan dartood, gabi ahaanba kamanu tagin guntinta ELK, waxaanan sii wadeynaa adeegsiga LogStash iyo Qaybaha Filebeat, kuwaas oo si wanaagsan isu caddeeyey oo u shaqeeya si la isku halleyn karo oo la saadaalin karo.

Qorshaha guud ee jaridda ayaa lagu muujiyay sawirka hoose:

ClickHouse Database ee Aadanaha, ama Tiknoolajiyada Alien

Sifada xogta lagu qoro kaydka ClickHouse ayaa ah mid soo noqnoqota (hal ilbiriqsikiiba) gelinta diiwaannada ee dufcadaha waaweyn. Tani, sida muuqata, waa qaybta ugu "dhibaatada leh" ee aad la kulanto marka ugu horeysa ee aad la kulanto la shaqeynta xogta ClickHouse: nidaamku wuxuu noqonayaa mid ka sii adag.
Qalabka loogu talagalay LogStash, kaas oo si toos ah xogta u geliya ClickHouse, ayaa wax badan ka caawiyay halkan. Qaybtan waxa lagu shubay isla server-ka xogta xogta lafteeda. Sidaa darteed, guud ahaan marka la eego, laguma talinayo in la sameeyo, laakiin marka laga eego aragtida wax ku oolka ah, si aan loo soo saarin servero kala duwan marka la geeyo isla serverka. Ma aanaan arag wax guul daraystay ama iska hor imaadka kheyraadka ee xogta Intaa waxaa dheer, waa in la ogaadaa in plugin uu leeyahay habka dib u tijaabinta haddii ay dhacdo khaladaad. Iyo haddii ay dhacdo khaladaad, plugin-ku wuxuu u qoraa diskka xog xog ah oo aan la gelin karin (qaabka feylku waa ku habboon yahay: ka dib markaad tafatirto, waxaad si fudud u gelin kartaa qaybta saxda ah adoo isticmaalaya clickhouse-client).

Liis dhamaystiran oo software-ka loo isticmaalo nidaamka ayaa lagu soo bandhigay shaxda:

Liiska software-ka la isticmaalay

Title

Description

Xiriirinta qaybinta

NGINX

Dib u noqo wakiilka si loo xaddido gelitaanka dekedaha oo loo habeeyo oggolaanshaha

Hadda laguma isticmaalo nidaamka

https://nginx.org/ru/download.html

https://nginx.org/download/nginx-1.16.0.tar.gz

FileBeat

Wareejinta diiwaanka faylka.

https://www.elastic.co/downloads/beats/filebeat (xirmada qaybinta ee Windows 64bit).

https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.3.0-windows-x86_64.zip

logstash

Log ururiyaha.

Loo isticmaalo in laga soo ururiyo diiwaannada FileBeat, iyo sidoo kale in laga soo ururiyo diiwaannada safka RabbitMQ (serverrada ku jira DMZ.)

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

Logstash-output-clickhouse

Loagstash plugin loogu wareejinayo diiwaannada keydka ClickHouse ee dufcadaha

https://github.com/mikechris/logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin ku rakib logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin rakib logstash-filter-prune

/usr/share/logstash/bin/logstash-plugin rakib logstash-filter-multiline

GujiHouse

Kaydinta Log https://clickhouse.yandex/docs/ru/

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-server-19.5.3.8-1.el7.x86_64.rpm

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-client-19.5.3.8-1.el7.x86_64.rpm

Ogow. Laga bilaabo Agoosto 2018, "caadi" rpm ayaa u dhisan RHEL ka soo muuqday kaydka Yandex, markaa waxaad isku dayi kartaa inaad isticmaasho. Waqtiga rakibidda, waxaan isticmaalnay baakado ay dhistay Altinity.

Grafana

Log muuqaalaynta Dejinta dashboards

https://grafana.com/

https://grafana.com/grafana/download

Redhat & Centos(64 Bit) - nooca ugu dambeeyay

Ilaha xogta ClickHouse ee Grafana 4.6+

Plugin for Grafana oo leh isha xogta ClickHouse

https://grafana.com/plugins/vertamedia-clickhouse-datasource

https://grafana.com/api/plugins/vertamedia-clickhouse-datasource/versions/1.8.1/download

logstash

Ka gal router ka FileBeat ilaa safka RabbitMQ.

Ogow. Nasiib darro FileBeat si toos ah uguma soo baxdo RabbitMQ, markaa isku xirka dhexe ee qaabka Logstash ayaa loo baahan yahay

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

BakayleMQ

safka fariinta. Kani waa kaydiyaha log ee DMZ

https://www.rabbitmq.com/download.html

https://github.com/rabbitmq/rabbitmq-server/releases/download/v3.7.14/rabbitmq-server-3.7.14-1.el7.noarch.rpm

Erlang Runtime (Waxaa looga baahan yahay RabbitMQ)

Erlang runtime. Loo baahan yahay si uu RabbitMQ u shaqeeyo

http://www.erlang.org/download.html

https://www.rabbitmq.com/install-rpm.html#install-erlang http://www.erlang.org/downloads/21.3

Qaabeynta serverka ee leh keydka keydka ClickHouse waxaa lagu soo bandhigay shaxdan soo socota:

Title

qiimaha

tacliiq

Qaabeynta

HDD: 40GB
RAM: 8GB
Processor: Core 2 2Ghz

Waa lagama maarmaan in fiiro gaar ah loo yeesho talooyinka ku shaqeynta kaydinta ClickHouse (https://clickhouse.yandex/docs/ru/operations/tips/)

Software nidaamka guud

OS: Koofiyada Cas ee Linux Server Linux (Maipo)

JRE (Java 8)

 

Sida aad arki karto, tani waa goob shaqo oo caadi ah.

Qaab dhismeedka miiska loogu talagalay kaydinta logu waa sida soo socota:

log_web.sql

CREATE TABLE log_web (
  logdate Date,
  logdatetime DateTime CODEC(Delta, LZ4HC),
   
  fld_log_file_name LowCardinality( String ),
  fld_server_name LowCardinality( String ),
  fld_app_name LowCardinality( String ),
  fld_app_module LowCardinality( String ),
  fld_website_name LowCardinality( String ),
 
  serverIP LowCardinality( String ),
  method LowCardinality( String ),
  uriStem String,
  uriQuery String,
  port UInt32,
  username LowCardinality( String ),
  clientIP String,
  clientRealIP String,
  userAgent String,
  referer String,
  response String,
  subresponse String,
  win32response String,
  timetaken UInt64
   
  , uriQuery__utm_medium String
  , uriQuery__utm_source String
  , uriQuery__utm_campaign String
  , uriQuery__utm_term String
  , uriQuery__utm_content String
  , uriQuery__yclid String
  , uriQuery__region String
 
) Engine = MergeTree()
PARTITION BY toYYYYMM(logdate)
ORDER BY (fld_app_name, fld_app_module, logdatetime)
SETTINGS index_granularity = 8192;

Waxaan isticmaalnaa qaybinta caadiga ah (bishiiba) iyo tusmada granularity. Dhammaan goobaha si dhab ah waxay u dhigmaan gelitaanka IIS log ee codsiyada http. Si gooni ah, waxaan ogaanay inay jiraan goobo gaar ah oo loogu talagalay kaydinta utm-tags (waxay ku kala saarayaan marxaladda gelinta miiska goobta xargaha weydiinta).

Sidoo kale, dhowr goobood oo nidaamka ayaa lagu daray miiska si loo kaydiyo macluumaadka ku saabsan nidaamyada, qaybaha, server-yada. Fiiri shaxda hoose si aad u sharaxdo meelahan. Hal miis, waxaan ku kaydinnaa diiwaannada dhowr habab.

Title

Description

Tusaale:

fld_app_name

Codsiga/magaca nidaamka
Qiimaha saxda ah:

  • site1.domain.com Goobta dibadda 1
  • site2.domain.com Goobta dibadda 2
  • gudaha-site1.domain.maxali ah gudaha 1

site1.domain.com

fld_app_module

Habka nidaamka
Qiimaha saxda ah:

  • web - Website
  • svc - adeegga mareegaha
  • intgr - Adeegga Isdhexgalka Shabkada
  • bo - Admin (BackOffice)

web

fld_website_name

Magaca goobta ee IIS

Nidaamyo dhowr ah ayaa la geyn karaa hal server, ama xitaa dhowr tusaale oo hal modules ah

webka ugu weyn

fld_server_name

Magaca adeegaha

web1.domain.com

fld_log_file_name

Jidka loo maro faylka log ee server-ka

C:inetpublogsLogFiles
W3SVC1u_ex190711.log

Tani waxay kuu ogolaaneysaa inaad si hufan u dhisto garaafyada Grafana. Tusaale ahaan, arag codsiyada ka imanaya xagga hore ee nidaam gaar ah. Tani waxay la mid tahay miiska miiska ee Yandex.Metrica.

Waa kuwan qaar ka mid ah tirokoobyada isticmaalka xogta macluumaadka muddo laba bilood ah.

Tirada diiwaannada ay jabiyeen nidaamyada iyo qaybahooda

SELECT
    fld_app_name,
    fld_app_module,
    count(fld_app_name) AS rows_count
FROM log_web
GROUP BY
    fld_app_name,
    fld_app_module
    WITH TOTALS
ORDER BY
    fld_app_name ASC,
    rows_count DESC
 
┌─fld_app_name─────┬─fld_app_module─┬─rows_count─┐
│ site1.domain.ru  │ web            │     131441 │
│ site2.domain.ru  │ web            │    1751081 │
│ site3.domain.ru  │ web            │  106887543 │
│ site3.domain.ru  │ svc            │   44908603 │
│ site3.domain.ru  │ intgr          │    9813911 │
│ site4.domain.ru  │ web            │     772095 │
│ site5.domain.ru  │ web            │   17037221 │
│ site5.domain.ru  │ intgr          │     838559 │
│ site5.domain.ru  │ bo             │       7404 │
│ site6.domain.ru  │ web            │     595877 │
│ site7.domain.ru  │ web            │   27778858 │
└──────────────────┴────────────────┴────────────┘
 
Totals:
┌─fld_app_name─┬─fld_app_module─┬─rows_count─┐
│              │                │  210522593 │
└──────────────┴────────────────┴────────────┘
 
11 rows in set. Elapsed: 4.874 sec. Processed 210.52 million rows, 421.67 MB (43.19 million rows/s., 86.51 MB/s.)

Qadarka xogta ku jirta saxanka

SELECT
    formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed,
    formatReadableSize(sum(data_compressed_bytes)) AS compressed,
    sum(rows) AS total_rows
FROM system.parts
WHERE table = 'log_web'
 
┌─uncompressed─┬─compressed─┬─total_rows─┐
│ 54.50 GiB    │ 4.86 GiB   │  211427094 │
└──────────────┴────────────┴────────────┘
 
1 rows in set. Elapsed: 0.035 sec.

Darajada xogta isku xirka tiirarka

SELECT
    name,
    formatReadableSize(data_uncompressed_bytes) AS uncompressed,
    formatReadableSize(data_compressed_bytes) AS compressed,
    data_uncompressed_bytes / data_compressed_bytes AS compress_ratio
FROM system.columns
WHERE table = 'log_web'
 
┌─name───────────────────┬─uncompressed─┬─compressed─┬─────compress_ratio─┐
│ logdate                │ 401.53 MiB   │ 1.80 MiB   │ 223.16665968777315 │
│ logdatetime            │ 803.06 MiB   │ 35.91 MiB  │ 22.363966401202305 │
│ fld_log_file_name      │ 220.66 MiB   │ 2.60 MiB   │  84.99905736932571 │
│ fld_server_name        │ 201.54 MiB   │ 50.63 MiB  │  3.980924816977078 │
│ fld_app_name           │ 201.17 MiB   │ 969.17 KiB │ 212.55518183686877 │
│ fld_app_module         │ 201.17 MiB   │ 968.60 KiB │ 212.67805817411906 │
│ fld_website_name       │ 201.54 MiB   │ 1.24 MiB   │  162.7204926761546 │
│ serverIP               │ 201.54 MiB   │ 50.25 MiB  │  4.010824061219731 │
│ method                 │ 201.53 MiB   │ 43.64 MiB  │  4.617721053304486 │
│ uriStem                │ 5.13 GiB     │ 832.51 MiB │  6.311522291936919 │
│ uriQuery               │ 2.58 GiB     │ 501.06 MiB │  5.269731450124478 │
│ port                   │ 803.06 MiB   │ 3.98 MiB   │ 201.91673864241824 │
│ username               │ 318.08 MiB   │ 26.93 MiB  │ 11.812513794583598 │
│ clientIP               │ 2.35 GiB     │ 82.59 MiB  │ 29.132328640073343 │
│ clientRealIP           │ 2.49 GiB     │ 465.05 MiB │  5.478382297052563 │
│ userAgent              │ 18.34 GiB    │ 764.08 MiB │  24.57905114484208 │
│ referer                │ 14.71 GiB    │ 1.37 GiB   │ 10.736792723669906 │
│ response               │ 803.06 MiB   │ 83.81 MiB  │  9.582334090987247 │
│ subresponse            │ 399.87 MiB   │ 1.83 MiB   │  218.4831068635027 │
│ win32response          │ 407.86 MiB   │ 7.41 MiB   │ 55.050315514606815 │
│ timetaken              │ 1.57 GiB     │ 402.06 MiB │ 3.9947395692010637 │
│ uriQuery__utm_medium   │ 208.17 MiB   │ 12.29 MiB  │ 16.936148912472955 │
│ uriQuery__utm_source   │ 215.18 MiB   │ 13.00 MiB  │ 16.548367623199912 │
│ uriQuery__utm_campaign │ 381.46 MiB   │ 37.94 MiB  │ 10.055156353418509 │
│ uriQuery__utm_term     │ 231.82 MiB   │ 10.78 MiB  │ 21.502540454070672 │
│ uriQuery__utm_content  │ 441.34 MiB   │ 87.60 MiB  │  5.038260760449327 │
│ uriQuery__yclid        │ 216.88 MiB   │ 16.58 MiB  │  13.07721335008116 │
│ uriQuery__region       │ 204.35 MiB   │ 9.49 MiB   │  21.52661903446796 │
└────────────────────────┴──────────────┴────────────┴────────────────────┘
 
28 rows in set. Elapsed: 0.005 sec.

Sharaxaada qaybaha la isticmaalay

FileBeat. Wareejinta diiwaannada faylka

Qaybtani waxay la socotaa isbeddelada lagu galo galalka saxanka oo u gudbiya macluumaadka LogStash. Lagu rakibay dhammaan server-yada ay ku qoran yihiin galalka loggu (sida caadiga ah IIS). Waxay ku shaqeysaa qaabka dabada (tusaale, ku wareejinta kaliya diiwaannada lagu daray faylka). Laakiin si gooni gooni ah ayaa loo habeyn karaa si loo wareejiyo dhammaan faylasha. Tani waa faa'iido markaad u baahan tahay inaad soo dejiso xogta bilihii hore. Kaliya geli faylka log-ga gal oo wuu akhrin doonaa gabi ahaanba.

Marka adeega la joojiyo, xogta looma sii wareejinayo kaydinta.

Qaabaynta tusaalaha ayaa sidan u eg:

filebeat.yml

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/W3SVC1/*.log
  exclude_files: ['.gz$','.zip$']
  tail_files: true
  ignore_older: 24h
  fields:
    fld_server_name: "site1.domain.ru"
    fld_app_name: "site1.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
 
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/__Import/access_log-*
  exclude_files: ['.gz$','.zip$']
  tail_files: false
  fields:
    fld_server_name: "site2.domain.ru"
    fld_app_name: "site2.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
    fld_logformat: "logformat__apache"
 
 
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
  reload.period: 2s
 
output.logstash:
  hosts: ["log.domain.com:5044"]
 
  ssl.enabled: true
  ssl.certificate_authorities: ["C:/filebeat/certs/ca.pem", "C:/filebeat/certs/ca-issuing.pem"]
  ssl.certificate: "C:/filebeat/certs/site1.domain.ru.cer"
  ssl.key: "C:/filebeat/certs/site1.domain.ru.key"
 
#================================ Processors =====================================
 
processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~

logstash. Log ururiyaha

Qaybtan waxaa loogu talagalay inay ka hesho galitaanka galitaanka FileBeat (ama iyada oo loo marayo safka RabbitMQ), falanqaynta iyo galinta dufcadaha kaydka ClickHouse.

Si loo geliyo ClickHouse, Logstash-output-clickhouse plugin ayaa la isticmaalaa. Plugin-ka Logstash wuxuu leeyahay codsi dib-u-isku dayo ah, laakiin xirid joogto ah, waxaa fiican in la joojiyo adeegga laftiisa. Marka la joojiyo, fariimaha ayaa lagu ururin doonaa safka RabbitMQ, sidaas darteed haddii joogsigu uu yahay waqti dheer, ka dibna waxa fiican in la joojiyo Filebeats ee server-yada. Nidaam meesha RabbitMQ aan la isticmaalin (shabakadda maxalliga ah, Filebeat waxay si toos ah u soo dirtaa diiwaannada Logstash), Filebeat waxay u shaqeysaa mid la aqbali karo oo ammaan ah, sidaas darteed la'aanta wax soo saarka waxay dhaaftaa cawaaqib la'aan.

Qaabaynta tusaalaha ayaa sidan u eg:

log_web__filebeat_clickhouse.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/logstash/certs/ca.cer", "/etc/logstash/certs/ca-issuing.cer"]
        ssl_certificate => "/etc/logstash/certs/server.cer"
        ssl_key => "/etc/logstash/certs/server-pkcs8.key"
        ssl_verify_mode => "peer"
 
            add_field => {
                "fld_server_name" => "%{[fields][fld_server_name]}"
                "fld_app_name" => "%{[fields][fld_app_name]}"
                "fld_app_module" => "%{[fields][fld_app_module]}"
                "fld_website_name" => "%{[fields][fld_website_name]}"
                "fld_log_file_name" => "%{source}"
                "fld_logformat" => "%{[fields][fld_logformat]}"
            }
    }
 
    rabbitmq {
        host => "queue.domain.com"
        port => 5671
        user => "q-reader"
        password => "password"
        queue => "web_log"
        heartbeat => 30
        durable => true
        ssl => true
        #ssl_certificate_path => "/etc/logstash/certs/server.p12"
        #ssl_certificate_password => "password"
 
        add_field => {
            "fld_server_name" => "%{[fields][fld_server_name]}"
            "fld_app_name" => "%{[fields][fld_app_name]}"
            "fld_app_module" => "%{[fields][fld_app_module]}"
            "fld_website_name" => "%{[fields][fld_website_name]}"
            "fld_log_file_name" => "%{source}"
            "fld_logformat" => "%{[fields][fld_logformat]}"
        }
    }
 
}
 
filter { 
 
      if [message] =~ "^#" {
        drop {}
      }
 
      if [fld_logformat] == "logformat__iis_with_xrealip" {
     
          grok {
            match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken} %{NOTSPACE:xrealIP} %{NOTSPACE:xforwarderfor}"]
          }
      } else {
   
          grok {
             match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken}"]
          }
 
      }
 
      date {
        match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
          timezone => "Etc/UTC"
        remove_field => [ "log_timestamp", "@timestamp" ]
        target => [ "log_timestamp2" ]
      }
 
        ruby {
            code => "tstamp = event.get('log_timestamp2').to_i
                        event.set('logdatetime', Time.at(tstamp).strftime('%Y-%m-%d %H:%M:%S'))
                        event.set('logdate', Time.at(tstamp).strftime('%Y-%m-%d'))"
        }
 
      if [bytesSent] {
        ruby {
          code => "event['kilobytesSent'] = event['bytesSent'].to_i / 1024.0"
        }
      }
 
 
      if [bytesReceived] {
        ruby {
          code => "event['kilobytesReceived'] = event['bytesReceived'].to_i / 1024.0"
        }
      }
 
   
        ruby {
            code => "event.set('clientRealIP', event.get('clientIP'))"
        }
        if [xrealIP] {
            ruby {
                code => "event.set('clientRealIP', event.get('xrealIP'))"
            }
        }
        if [xforwarderfor] {
            ruby {
                code => "event.set('clientRealIP', event.get('xforwarderfor'))"
            }
        }
 
      mutate {
        convert => ["bytesSent", "integer"]
        convert => ["bytesReceived", "integer"]
        convert => ["timetaken", "integer"] 
        convert => ["port", "integer"]
 
        add_field => {
            "clientHostname" => "%{clientIP}"
        }
      }
 
        useragent {
            source=> "useragent"
            prefix=> "browser"
        }
 
        kv {
            source => "uriQuery"
            prefix => "uriQuery__"
            allow_duplicate_values => false
            field_split => "&"
            include_keys => [ "utm_medium", "utm_source", "utm_campaign", "utm_term", "utm_content", "yclid", "region" ]
        }
 
        mutate {
            join => { "uriQuery__utm_source" => "," }
            join => { "uriQuery__utm_medium" => "," }
            join => { "uriQuery__utm_campaign" => "," }
            join => { "uriQuery__utm_term" => "," }
            join => { "uriQuery__utm_content" => "," }
            join => { "uriQuery__yclid" => "," }
            join => { "uriQuery__region" => "," }
        }
 
}
 
output { 
  #stdout {codec => rubydebug}
    clickhouse {
      headers => ["Authorization", "Basic abcdsfks..."]
      http_hosts => ["http://127.0.0.1:8123"]
      save_dir => "/etc/logstash/tmp"
      table => "log_web"
      request_tolerance => 1
      flush_size => 10000
      idle_flush_time => 1
        mutations => {
            "fld_log_file_name" => "fld_log_file_name"
            "fld_server_name" => "fld_server_name"
            "fld_app_name" => "fld_app_name"
            "fld_app_module" => "fld_app_module"
            "fld_website_name" => "fld_website_name"
 
            "logdatetime" => "logdatetime"
            "logdate" => "logdate"
            "serverIP" => "serverIP"
            "method" => "method"
            "uriStem" => "uriStem"
            "uriQuery" => "uriQuery"
            "port" => "port"
            "username" => "username"
            "clientIP" => "clientIP"
            "clientRealIP" => "clientRealIP"
            "userAgent" => "userAgent"
            "referer" => "referer"
            "response" => "response"
            "subresponse" => "subresponse"
            "win32response" => "win32response"
            "timetaken" => "timetaken"
             
            "uriQuery__utm_medium" => "uriQuery__utm_medium"
            "uriQuery__utm_source" => "uriQuery__utm_source"
            "uriQuery__utm_campaign" => "uriQuery__utm_campaign"
            "uriQuery__utm_term" => "uriQuery__utm_term"
            "uriQuery__utm_content" => "uriQuery__utm_content"
            "uriQuery__yclid" => "uriQuery__yclid"
            "uriQuery__region" => "uriQuery__region"
        }
    }
 
}

dhuumaha.yml

# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
#   https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html
 
- pipeline.id: log_web__filebeat_clickhouse
  path.config: "/etc/logstash/log_web__filebeat_clickhouse.conf"

clickhouse. Kaydinta Log

Logsyada nidaamyada oo dhan waxay ku kaydsan yihiin hal miis (eeg bilowga maqaalka). Waxaa loogu talagalay in lagu kaydiyo macluumaadka ku saabsan codsiyada: dhammaan xuduudaha waxay la mid yihiin qaabab kala duwan, sida IIS logs, apache iyo nginx. Diiwaanada codsiyada, kuwaas oo, tusaale ahaan, khaladaadka, fariimaha macluumaadka, digniinaha la duubay, miis gaar ah ayaa la siin doonaa qaab-dhismeedka ku habboon (hadda marxaladda naqshadeynta).

Marka la naqshadeynayo miiska, aad bay muhiim u tahay in la go'aamiyo furaha aasaasiga ah (kaas oo xogta lagu kala sooci doono inta lagu jiro kaydinta). Heerka cadaadiska xogta iyo xawaaraha weydiintu waxay ku xiran tahay tan. Tusaalahayaga, furaha ayaa ah
Dalbashada BY (fld_app_name, fld_app_module, logdatetime)
Taasi waa, magaca nidaamka, magaca qaybta nidaamka iyo taariikhda dhacdada. Markii hore, taariikhda dhacdada ayaa ugu horreysay. Ka dib markii loo raray meeshii ugu dambaysay, weydiimaha waxay bilaabeen inay shaqeeyaan qiyaastii laba jeer si degdeg ah. Beddelidda furaha aasaasiga ah waxay u baahan doontaa dib-u-cusboonaysiinta miiska iyo dib-u-dejinta xogta si ClickHouse uu dib ugu habeeyo xogta diskka. Kani waa qalliin culus, marka waa fikrad wanaagsan in aad looga fikiro waxa lagu dari karo furaha kala-soocidda.

Waa in sidoo kale la ogaadaa in nooca xogta LowCardinality uu ka soo muuqday noocyadii dhawaa. Markaad isticmaalayso, cabbirka xogta la cufan ayaa si aad ah hoos loogu dhigayaa goobahaas leh cardinity hooseeya (doorasho yar).

Nooca 19.6 hadda waa la isticmaalayaa waxaanan qorsheyneynaa inaan isku dayno cusbooneysiinta nuqulkii ugu dambeeyay. Waxay leeyihiin astaamo cajiib ah sida La qabsiga Granularity, Indices-ka boodada iyo codec-ka DoubleDelta, tusaale ahaan.

Sida caadiga ah, inta lagu jiro rakibidda, heerka gelitaanka ayaa loo dejiyay in la raadiyo. Logu waa meerto oo kaydiyaa, laakiin isla mar ahaantaana waxay sii fidayaan ilaa gigabyte ah. Haddii aysan jirin baahi, markaa waxaad dejin kartaa heerka digniinta, ka dibna xajmiga log ayaa si weyn hoos ugu dhacay. Dejinta dejinta waxa lagu dejiyay faylka config.xml:

<!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger. h#L105 -->
<level>warning</level>

Qaar ka mid ah amarrada waxtarka leh

Поскольку оригинальные пакеты установки собираются по Debian, то для других версий Linux необходимо использовать пакеты собранные компанией Altinity.
 
Вот по этой ссылке есть инструкции с ссылками на их репозиторий: https://www.altinity.com/blog/2017/12/18/logstash-with-clickhouse
sudo yum search clickhouse-server
sudo yum install clickhouse-server.noarch
  
1. проверка статуса
sudo systemctl status clickhouse-server
 
2. остановка сервера
sudo systemctl stop clickhouse-server
 
3. запуск сервера
sudo systemctl start clickhouse-server
 
Запуск для выполнения запросов в многострочном режиме (выполнение после знака ";")
clickhouse-client --multiline
clickhouse-client --multiline --host 127.0.0.1 --password pa55w0rd
clickhouse-client --multiline --host 127.0.0.1 --port 9440 --secure --user default --password pa55w0rd
 
Плагин кликлауза для логстеш в случае ошибки в одной строке сохраняет всю пачку в файл /tmp/log_web_failed.json
Можно вручную исправить этот файл и попробовать залить его в БД вручную:
clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /tmp/log_web_failed__fixed.json
 
sudo mv /etc/logstash/tmp/log_web_failed.json /etc/logstash/tmp/log_web_failed__fixed.json
sudo chown user_dev /etc/logstash/tmp/log_web_failed__fixed.json
sudo clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /etc/logstash/tmp/log_web_failed__fixed.json
sudo mv /etc/logstash/tmp/log_web_failed__fixed.json /etc/logstash/tmp/log_web_failed__fixed_.json
 
выход из командной строки
quit;
## Настройка TLS
https://www.altinity.com/blog/2019/3/5/clickhouse-networking-part-2
 
openssl s_client -connect log.domain.com:9440 < /dev/null

logstash. Ka gal router ka FileBeat ilaa safka RabbitMQ

Qaybtan waxa loo isticmaalaa in lagu maro loguda ka imanaysa FileBeat ilaa safka RabbitMQ. Halkan waxaa ah laba qodob:

  1. Nasiib darro, FileBeat ma laha qalab wax soo saar ah oo si toos ah ugu qora RabbitMQ. Iyo shaqeynta noocan oo kale ah, oo lagu qiimeeyo arrinta githubkooda, looma qorsheynin fulinta. Waxaa jira plugin loogu talagalay Kafka, laakiin sabab qaar ka mid ah kuma isticmaali karno guriga.
  2. Waxaa jira shuruudo aruurinta diiwaannada gudaha DMZ. Iyada oo ku saleysan iyaga, logu waa in marka hore lagu daro safka ka dibna LogStash ayaa akhriya gelinta safka dibadda.

Sidaa darteed, waa kiiska meesha ay ku yaalliin server-yada DMZ in qofku isticmaalo nidaam yar oo dhib badan. Qaabaynta tusaalaha ayaa sidan u eg:

iis_w3c_logs__filebeat_rabbitmq.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/pki/tls/certs/app/ca.pem", "/etc/pki/tls/certs/app/ca-issuing.pem"]
        ssl_certificate => "/etc/pki/tls/certs/app/queue.domain.com.cer"
        ssl_key => "/etc/pki/tls/certs/app/queue.domain.com-pkcs8.key"
        ssl_verify_mode => "peer"
    }
 
}
 
output { 
  #stdout {codec => rubydebug}
 
    rabbitmq {
        host => "127.0.0.1"
        port => 5672
        exchange => "monitor.direct"
        exchange_type => "direct"
        key => "%{[fields][fld_app_name]}"
        user => "q-writer"
        password => "password"
        ssl => false
    }
}

RabbitMQ. safka fariinta

Qaybtan waxa loo istcimaalay in lagu xidho galinta log ee DMZ. Duubista waxaa lagu sameeyaa farabadan Filebeat → LogStash. Akhriska waxaa laga sameeyaa meel ka baxsan DMZ iyada oo loo marayo LogStash. Markaad ka shaqaynayso RabboitMQ, ilaa 4 kun oo farriimo ilbiriqsikiiba waa la farsameeyaa.

Dariiqinta fariinta waxaa lagu habeeyey magaca nidaamka, ie. ku salaysan xogta qaabeynta FileBeat. Dhammaan fariimaha waxay tagaan hal saf. Haddii sabab qaar ka mid ah adeegga safka la joojiyo, markaa tani ma horseedi doonto luminta fariimaha: FileBeats waxay heli doontaa khaladaadka isku xirka waxayna si ku meel gaar ah u joojin doontaa diritaanka. Iyo LogStash oo wax ka akhriya safka ayaa waliba heli doona khaladaadka shabakada oo sugi doona in xidhiidhka dib loo soo celiyo. Xaaladdan oo kale, xogta, dabcan, mar dambe laguma qori doono kaydka xogta.

Tilmaamaha soo socda ayaa loo isticmaalaa si loo abuuro loona habeeyo safafka:

sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare exchange --vhost=/ name=monitor.direct type=direct sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare queue --vhost=/ name=web_log durable=true
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site1.domain.ru"
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site2.domain.ru"

Grafana. Sabuuradaha

Qaybtan waxa loo isticmaalaa in lagu sawiro xogta la socodka. Xaaladdan oo kale, waxaad u baahan tahay inaad ku rakibto isha xogta ClickHouse ee Grafana 4.6+ plugin. Waxaan ku qasbanaay inaan xoogaa wax ka bedelno si aan u wanaajino wax ku oolnimada farsamaynta filtarrada SQL ee dashboard-ka.

Tusaale ahaan, waxaan isticmaalnaa doorsoomayaasha, oo haddii aan lagu dhejin goobta shaandhada, markaa waxaan jeclaan lahayn in aysan dhalin xaalad ku taal HALKEE foomka ( uriStem = » IYO uriStem! = »). Xaaladdan oo kale, ClickHouse ayaa akhrin doonta tiirka uriStem. Guud ahaan, waxaan isku daynay xulashooyin kala duwan waxaana ugu dambeyntii saxnay plugin ($ valueIfEmpty macro) si markaa qiimaha madhan uu soo celiyo 1, iyada oo aan la sheegin tiirka laftiisa.

Oo hadda waxaad u isticmaali kartaa weydiintan garaafyada

$columns(response, count(*) c) from $table where $adhoc
and $valueIfEmpty($fld_app_name, 1, fld_app_name = '$fld_app_name')
and $valueIfEmpty($fld_app_module, 1, fld_app_module = '$fld_app_module') and $valueIfEmpty($fld_server_name, 1, fld_server_name = '$fld_server_name') and $valueIfEmpty($uriStem, 1, uriStem like '%$uriStem%')
and $valueIfEmpty($clientRealIP, 1, clientRealIP = '$clientRealIP')

Kaas oo u turjumaya SQL-kan (xusuusnow in goobaha uriStem ee madhan loo beddelay 1 kaliya)

SELECT
t,
groupArray((response, c)) AS groupArr
FROM (
SELECT
(intDiv(toUInt32(logdatetime), 60) * 60) * 1000 AS t, response,
count(*) AS c FROM default.log_web
WHERE (logdate >= toDate(1565061982)) AND (logdatetime >= toDateTime(1565061982)) AND 1 AND (fld_app_name = 'site1.domain.ru') AND (fld_app_module = 'web') AND 1 AND 1 AND 1
GROUP BY
t, response
ORDER BY
t ASC,
response ASC
)
GROUP BY t ORDER BY t ASC

gunaanad

Muuqashada xogta keydka ee ClickHouse waxay noqotay dhacdo calaamad u ah suuqa. Way adkeyd in la qiyaaso in, gabi ahaanba bilaasha, isla markiiba aanu ku hubaynay qalab xoog leh oo wax ku ool ah oo lagu shaqeynayo xog weyn. Dabcan, baahida sii kordheysa (tusaale, shaandheynta iyo ku-celinta server-yo badan), nidaamku wuxuu noqon doonaa mid aad u adag. Laakiin aragtida ugu horreysa, la shaqaynta xogtan waa mid aad u wanaagsan. Waxaa la arki karaa in badeecada loo sameeyay "dadka."

Marka la barbar dhigo ElasticSearch, qiimaha kaydinta iyo habaynta qoraalada waxaa lagu qiyaasaa in la dhimay shan ilaa toban jeer. Si kale haddii loo dhigo, haddii xogta hadda jirta ay tahay inaan samayno koox mashiinno dhowr ah, markaa markaad isticmaaleyso ClickHouse, hal mashiin oo awood yar ayaa nagu filan. Haa, dabcan, ElasticSearch waxa kale oo ay leedahay habab isku xidhka xogta diskka iyo sifooyin kale oo si weyn u yarayn kara isticmaalka kheyraadka, laakiin marka la barbar dhigo ClickHouse, tani waxay noqon doontaa mid qaali ah.

Iyada oo aan wax hagaajin ah oo gaar ah dhinaceena ah, on goobaha default, loading xogta iyo xulashada database ka shaqeeya xawaare cajiib ah. Weli ma hayno xog badan (qiyaastii 200 milyan oo rikoodh), laakiin server-ka laftiisa ayaa daciif ah. Waxaan u isticmaali karnaa qalabkan mustaqbalka ujeedooyin kale oo aan la xiriirin kaydinta logyada. Tusaale ahaan, falanqaynta dhamaadka-ilaa-dhamaadka, dhanka amniga, barashada mashiinka.

Dhammaadka, wax yar oo ku saabsan faa'iidooyinka iyo khasaaraha.

Minusa

  1. Soodejinaya diiwaanada dufcooyin waaweyn Dhinaca kale, tani waa sifo, laakiin wali waa inaad u isticmaashaa qaybo dheeri ah si aad u kaydiso diiwaanada. Hawshani mar walba ma fududa, laakiin wali waa la xalin karaa. Oo waxaan jeclaan lahaa inaan fududeeyo nidaamka.
  2. Qaar ka mid ah shaqeynta qalaad ama sifooyin cusub ayaa inta badan jebiya noocyo cusub. Tani waxay keenaysaa walaac, yaraynta rabitaanka in loo cusboonaysiiyo nooc cusub. Tusaale ahaan, mashiinka miiska Kafka waa muuqaal aad u faa'iido leh oo kuu ogolaanaya inaad si toos ah u akhrido dhacdooyinka Kafka, adigoon hirgelin macaamiisha. Laakiin marka la eego tirada arrimaha ku saabsan github-ka, waxaan wali ka taxaddareynaa inaan u isticmaalno mashiinkan wax soo saarka. Si kastaba ha noqotee, haddii aadan samayn dhaqdhaqaaq lama filaan ah dhinaca oo aad isticmaasho shaqada ugu weyn, ka dibna waxay u shaqeysaa si deggan.

Maqaallo

  1. Ma gaabis.
  2. Xadka gelitaanka hooseeya.
  3. Il furan.
  4. Bilaash
  5. Si fiican u miisaama (kala qaybinta/ku-noqoshada ka baxsan sanduuqa)
  6. Waxaa ku jira diiwaanka software-ka Ruushka oo ay ku talisay Wasaaradda Isgaadhsiinta.
  7. Joogitaanka taageerada rasmiga ah ee Yandex.

Source: www.habr.com

Add a comment