ClickHouse Database bakeng sa Batho, kapa Alien Technologies

Alexey Lizunov, hlooho ea setsi sa bokhoni bakeng sa liteishene tsa litšebeletso tse hole tsa Tsamaiso ea Theknoloji ea Boitsebiso ea ICB

ClickHouse Database bakeng sa Batho, kapa Alien Technologies

E le mokhoa o mong oa ELK stack (ElasticSearch, Logstash, Kibana), re ntse re etsa lipatlisiso ka ho sebelisa database ea ClickHouse e le polokelo ea data bakeng sa li-logs.

Sehloohong sena re ka rata ho bua ka boiphihlelo ba rona re sebelisa database ea ClickHouse le liphetho tsa pele tsa ts'ebetso ea sefofane. Ke habohlokoa ho hlokomela hang-hang hore liphello e bile tse tsotehang.


ClickHouse Database bakeng sa Batho, kapa Alien Technologies

E latelang re tla hlalosa ka ho qaqileng haholoanyane hore na tsamaiso ea rona e hlophisitsoe joang le hore na e na le likarolo life. Empa joale ke rata ho bua hanyenyane ka database ena ka kakaretso, le hore na ke hobane'ng ha e le habohlokoa ho ela hloko. Sebaka sa polokelo ea boitsebiso ba ClickHouse ke setsi sa polokelo ea boitsebiso se phahameng sa ts'ebetso se tsoang ho Yandex. E sebelisoa litšebeletsong tsa Yandex, qalong ena ke polokelo ea data e ka sehloohong bakeng sa Yandex.Metrica. Sistimi ea mohloli o bulehileng, mahala. Ho ea ka pono ea moqapi, ke ne ke lula ke ipotsa hore na ba sebelisitse sena joang, hobane ho na le data e kholo haholo. Mme segokanyimmediamentsi sa Metrica ka boeona se bonolo haholo ebile se sebetsa kapele. Ha u qala ho tloaelana le database ena, u fumana maikutlo a reng: "Joale, qetellong! E etselitsoe “batho”! Ho tloha ts'ebetsong ea ho kenya ho isa ho ho romela likopo."

Database ena e na le tšitiso e tlase haholo ea ho kena. Esita le moqapi ea tloaelehileng a ka kenya database ena ka metsotso e seng mekae ebe o qala ho e sebelisa. Tsohle di sebetsa ka thelelo. Esita le batho ba bacha ho Linux ba ka sebetsana ka katleho le ho kenya le ho etsa mesebetsi e bonolo. Haeba pejana, ha a utloa mantsoe a Big Data, Hadoop, Google BigTable, HDFS, moqapi ea tloaelehileng o ne a e-na le maikutlo a hore ba bua ka li-terabyte tse ling, li-petabyte, hore batho ba bang ba phahametseng batho ba ne ba ameha ho theha le ho ntlafatsa mekhoa ena, joale ka ho fihla ha database ea ClickHouse re fumane sesebelisoa se bonolo, se utloisisehang seo ka sona u ka rarollang mathata a mangata a neng a ke ke a finyelloa. Seo u se hlokang feela ke mochine o le mong o lekaneng le metsotso e mehlano ho kenya. Ke hore, re na le database e kang, mohlala, MySql, empa feela bakeng sa ho boloka libilione tsa litlaleho! Mofuta oa superarchiver e nang le puo ea SQL. Ho tšoana le ha batho ba filoe libetsa tse tsoang linaheng tse ling.

Mabapi le mokhoa oa rona oa pokello ea li-log

Ho bokella tlhahisoleseling, ho sebelisoa lifaele tsa log tsa IIS tsa lits'ebetso tsa webo tsa sebopeho se tloaelehileng (hajoale re ntse re phathahane ho arola li-log tsa kopo, empa sepheo sa rona se seholo mothating oa sefofane ke ho bokella lintlha tsa IIS).

Ha rea ​​​​ka ra khona ho lahla ka ho feletseng stack ea ELK ka mabaka a sa tšoaneng, 'me re tsoela pele ho sebelisa likarolo tsa LogStash le Filebeat, tse ipakileng li le hantle' me li sebetsa ka botšepehi le ka mokhoa o sa lebelloang.

Morero o akaretsang oa ho rema lifate o bontšitsoe setšoantšong se ka tlase:

ClickHouse Database bakeng sa Batho, kapa Alien Technologies

Tšobotsi ea ho rekota data ho database ea ClickHouse ke ho kenngoa ha lirekoto ka bongata (hanngoe ka motsotsoana) khafetsa. Sena, kamoo ho bonahalang kateng, ke karolo e "mathata" ka ho fetisisa eo u kopanang le eona ha u sebetsa le database ea ClickHouse ka lekhetlo la pele: morero o fetoha o thata haholoanyane.
Plugin ea LogStash, e kenyang data ka kotloloho ho ClickHouse, e thusitse haholo mona. Karolo ena e romelloa ho seva e tšoanang le database ka boeona. Kahoo, ka kakaretso, ha ho khothalletsoe ho etsa sena, empa ho latela pono e sebetsang, e le hore u se ke ua theha li-server tse arohaneng ha li ntse li sebelisoa ho seva se le seng. Ha rea ​​hlokomela ho hloleha kapa likhohlano tsa lisebelisoa le database. Ho phaella moo, hoa lokela ho hlokomeloa hore plugin e na le mochine oa retray haeba ho na le liphoso. 'Me haeba ho e-na le liphoso, plugin e ngolla disk palo ea data e ke keng ea kenngoa (fomate ea faele e loketse: ka mor'a ho hlophisa, u ka kenya beche e lokisitsoeng habonolo u sebelisa Clickhouse-client).

Lethathamo le felletseng la software e sebelisitsoeng morerong le hlahisoa tafoleng:

Lethathamo la lisebelisoa tse sebelisoang

Sehlooho

tlhaloso

Khokahano ea ho ajoa

NGINX

Reverse-proxy bakeng sa ho thibela phihlello ka boema-kepe le ho hlophisa tumello

Hajoale ha e sebelisoe morerong

https://nginx.org/ru/download.html

https://nginx.org/download/nginx-1.16.0.tar.gz

FileBeat

Phetisetso ea lifaele tsa lifaele.

https://www.elastic.co/downloads/beats/filebeat (kabo ea Windows 64bit).

https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.3.0-windows-x86_64.zip

LogStash

Mopokelli wa logong.

E sebelisetsoa ho bokella li-logs ho FileBeat, hammoho le ho bokella lifate ho tloha RabbitMQ queue (bakeng sa li-server tse teng DMZ.)

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

Logstash- output- clickhouse

Loagstash plugin bakeng sa ho fetisetsa lits'oants'o ho database ea ClickHouse ka lihlopha

https://github.com/mikechris/logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin kenya logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin kenya logstash-filter-prune

/usr/share/logstash/bin/logstash-plugin kenya logstash-filter-multiline

TlanyaHouse

Polokelo ea logong https://clickhouse.yandex/docs/ru/

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-server-19.5.3.8-1.el7.x86_64.rpm

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-client-19.5.3.8-1.el7.x86_64.rpm

Hlokomela. Ho qala ka Phato 2018, "tloaelehileng" rpm e haha ​​bakeng sa RHEL e hlahile polokelong ea Yandex, kahoo o ka leka ho e sebelisa. Nakong ea ho kenya re ne re sebelisa liphutheloana tse hlophisitsoeng ke Altinity.

grafana

Ho bona likutu. Ho theha li-dashboards

https://grafana.com/

https://grafana.com/grafana/download

Redhat & Centos(64 Bit) - mofuta oa morao-rao

ClickHouse datasource bakeng sa Grafana 4.6+

Plugin ea Grafana e nang le mohloli oa data oa ClickHouse

https://grafana.com/plugins/vertamedia-clickhouse-datasource

https://grafana.com/api/plugins/vertamedia-clickhouse-datasource/versions/1.8.1/download

LogStash

Loka router ho tloha FileBeat ho ea RabbitMQ queue.

Hlokomela. Ka bomalimabe FileBeat ha e na tlhahiso ka kotloloho ho RabbitMQ, kahoo sehokelo se mahareng ka mokhoa oa Logstash sea hlokahala.

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

MmutlaMQ

Letoto la melaetsa. Ena ke buffer ea likenyo tsa marang-rang ho DMZ

https://www.rabbitmq.com/download.html

https://github.com/rabbitmq/rabbitmq-server/releases/download/v3.7.14/rabbitmq-server-3.7.14-1.el7.noarch.rpm

Erlang Runtime (E ea Hlokahala bakeng sa RabbitMQ)

Erlang nako ea ho sebetsa. E hlokehang hore RabbitMQ e sebetse

http://www.erlang.org/download.html

https://www.rabbitmq.com/install-rpm.html#install-erlang http://www.erlang.org/downloads/21.3

Sebopeho sa seva se nang le database ea ClickHouse se hlahisoa tafoleng e latelang:

Sehlooho

boleng

mantsoe

Moralo

HDD: 40GB
RAM: 8GB
processor: Core 2 2Ghz

U lokela ho ela hloko malebela a ho sebelisa database ea ClickHouse (https://clickhouse.yandex/docs/ru/operations/tips/)

Software e pharalletseng ea sistimi

OS: Red Hat Enterprise Linux Server (Maipo)

JRE (Java 8)

 

Joalokaha u ka bona, ena ke sebaka se tloaelehileng sa mosebetsi.

Sebopeho sa tafole bakeng sa ho boloka likutu ke tse latelang:

log_web.sql

CREATE TABLE log_web (
  logdate Date,
  logdatetime DateTime CODEC(Delta, LZ4HC),
   
  fld_log_file_name LowCardinality( String ),
  fld_server_name LowCardinality( String ),
  fld_app_name LowCardinality( String ),
  fld_app_module LowCardinality( String ),
  fld_website_name LowCardinality( String ),
 
  serverIP LowCardinality( String ),
  method LowCardinality( String ),
  uriStem String,
  uriQuery String,
  port UInt32,
  username LowCardinality( String ),
  clientIP String,
  clientRealIP String,
  userAgent String,
  referer String,
  response String,
  subresponse String,
  win32response String,
  timetaken UInt64
   
  , uriQuery__utm_medium String
  , uriQuery__utm_source String
  , uriQuery__utm_campaign String
  , uriQuery__utm_term String
  , uriQuery__utm_content String
  , uriQuery__yclid String
  , uriQuery__region String
 
) Engine = MergeTree()
PARTITION BY toYYYYMM(logdate)
ORDER BY (fld_app_name, fld_app_module, logdatetime)
SETTINGS index_granularity = 8192;

Re sebelisa litekanyetso tsa kamehla bakeng sa ho arola (khoeli le khoeli) le index granularity. Likarolo tsohle li lumellana hantle le li-entries tsa IIS bakeng sa ho rekota likopo tsa http. Ka thoko, rea hlokomela hore ho na le masimo a arohaneng a ho boloka li-tag tsa utm (li hlophisitsoe sethaleng sa ho kenya tafoleng ho tsoa lebaleng la likhoele tsa lipotso).

Hape, likarolo tse 'maloa tsa sistimi li kentsoe tafoleng ho boloka tlhahisoleseling mabapi le litsamaiso, likarolo le li-server. Bakeng sa tlhaloso ea masimo ana, sheba tafole e ka tlase. Tafoleng e le 'ngoe re boloka li-log bakeng sa litsamaiso tse' maloa.

Sehlooho

tlhaloso

Mohlala:

fld_app_name

Lebitso la kopo/tsamaiso
Lintlha tse nepahetseng:

  • site1.domain.com Sebaka sa kantle sa 1
  • site2.domain.com Sebaka sa kantle sa 2
  • ka hare-site1.domain.local Sebaka sa ka hare 1

site1.domain.com

fld_app_module

Module oa tsamaiso
Lintlha tse nepahetseng:

  • webosaete - Webosaete
  • svc - Ts'ebeletso ea webosaete
  • intgr - Ts'ebeletso ea khokahano ea webo
  • bo - Mookameli (BackOffice)

ho web

fld_website_lebitso

Lebitso la sebaka ho IIS

Litsamaiso tse 'maloa li ka sebelisoa ho seva se le seng, kapa esita le maemo a' maloa a module e le 'ngoe ea sistimi

web-ka sehloohong

fld_server_name

Lebitso la seva

web1.domain.com

fld_log_file_name

Tsela e eang ho faele ea log ho seva

Ho tsoa ho:inetpublogsLogFiles
W3SVC1u_ex190711.log

Sena se o lumella ho aha li-graph ka Grafana. Mohlala, sheba likopo ho tsoa pheletsong e ka pele ea sistimi e itseng. Sena se tšoana le k'haonte ea sebaka sa Yandex.Metrica.

Mona ke lipalo-palo tse mabapi le ts'ebeliso ea database bakeng sa likhoeli tse peli.

Palo ea lirekoto ho latela sistimi le karolo

SELECT
    fld_app_name,
    fld_app_module,
    count(fld_app_name) AS rows_count
FROM log_web
GROUP BY
    fld_app_name,
    fld_app_module
    WITH TOTALS
ORDER BY
    fld_app_name ASC,
    rows_count DESC
 
┌─fld_app_name─────┬─fld_app_module─┬─rows_count─┐
│ site1.domain.ru  │ web            │     131441 │
│ site2.domain.ru  │ web            │    1751081 │
│ site3.domain.ru  │ web            │  106887543 │
│ site3.domain.ru  │ svc            │   44908603 │
│ site3.domain.ru  │ intgr          │    9813911 │
│ site4.domain.ru  │ web            │     772095 │
│ site5.domain.ru  │ web            │   17037221 │
│ site5.domain.ru  │ intgr          │     838559 │
│ site5.domain.ru  │ bo             │       7404 │
│ site6.domain.ru  │ web            │     595877 │
│ site7.domain.ru  │ web            │   27778858 │
└──────────────────┴────────────────┴────────────┘
 
Totals:
┌─fld_app_name─┬─fld_app_module─┬─rows_count─┐
│              │                │  210522593 │
└──────────────┴────────────────┴────────────┘
 
11 rows in set. Elapsed: 4.874 sec. Processed 210.52 million rows, 421.67 MB (43.19 million rows/s., 86.51 MB/s.)

Bophahamo ba data ea Disk

SELECT
    formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed,
    formatReadableSize(sum(data_compressed_bytes)) AS compressed,
    sum(rows) AS total_rows
FROM system.parts
WHERE table = 'log_web'
 
┌─uncompressed─┬─compressed─┬─total_rows─┐
│ 54.50 GiB    │ 4.86 GiB   │  211427094 │
└──────────────┴────────────┴────────────┘
 
1 rows in set. Elapsed: 0.035 sec.

Karolelano ea khatello ea data ea kholomo

SELECT
    name,
    formatReadableSize(data_uncompressed_bytes) AS uncompressed,
    formatReadableSize(data_compressed_bytes) AS compressed,
    data_uncompressed_bytes / data_compressed_bytes AS compress_ratio
FROM system.columns
WHERE table = 'log_web'
 
┌─name───────────────────┬─uncompressed─┬─compressed─┬─────compress_ratio─┐
│ logdate                │ 401.53 MiB   │ 1.80 MiB   │ 223.16665968777315 │
│ logdatetime            │ 803.06 MiB   │ 35.91 MiB  │ 22.363966401202305 │
│ fld_log_file_name      │ 220.66 MiB   │ 2.60 MiB   │  84.99905736932571 │
│ fld_server_name        │ 201.54 MiB   │ 50.63 MiB  │  3.980924816977078 │
│ fld_app_name           │ 201.17 MiB   │ 969.17 KiB │ 212.55518183686877 │
│ fld_app_module         │ 201.17 MiB   │ 968.60 KiB │ 212.67805817411906 │
│ fld_website_name       │ 201.54 MiB   │ 1.24 MiB   │  162.7204926761546 │
│ serverIP               │ 201.54 MiB   │ 50.25 MiB  │  4.010824061219731 │
│ method                 │ 201.53 MiB   │ 43.64 MiB  │  4.617721053304486 │
│ uriStem                │ 5.13 GiB     │ 832.51 MiB │  6.311522291936919 │
│ uriQuery               │ 2.58 GiB     │ 501.06 MiB │  5.269731450124478 │
│ port                   │ 803.06 MiB   │ 3.98 MiB   │ 201.91673864241824 │
│ username               │ 318.08 MiB   │ 26.93 MiB  │ 11.812513794583598 │
│ clientIP               │ 2.35 GiB     │ 82.59 MiB  │ 29.132328640073343 │
│ clientRealIP           │ 2.49 GiB     │ 465.05 MiB │  5.478382297052563 │
│ userAgent              │ 18.34 GiB    │ 764.08 MiB │  24.57905114484208 │
│ referer                │ 14.71 GiB    │ 1.37 GiB   │ 10.736792723669906 │
│ response               │ 803.06 MiB   │ 83.81 MiB  │  9.582334090987247 │
│ subresponse            │ 399.87 MiB   │ 1.83 MiB   │  218.4831068635027 │
│ win32response          │ 407.86 MiB   │ 7.41 MiB   │ 55.050315514606815 │
│ timetaken              │ 1.57 GiB     │ 402.06 MiB │ 3.9947395692010637 │
│ uriQuery__utm_medium   │ 208.17 MiB   │ 12.29 MiB  │ 16.936148912472955 │
│ uriQuery__utm_source   │ 215.18 MiB   │ 13.00 MiB  │ 16.548367623199912 │
│ uriQuery__utm_campaign │ 381.46 MiB   │ 37.94 MiB  │ 10.055156353418509 │
│ uriQuery__utm_term     │ 231.82 MiB   │ 10.78 MiB  │ 21.502540454070672 │
│ uriQuery__utm_content  │ 441.34 MiB   │ 87.60 MiB  │  5.038260760449327 │
│ uriQuery__yclid        │ 216.88 MiB   │ 16.58 MiB  │  13.07721335008116 │
│ uriQuery__region       │ 204.35 MiB   │ 9.49 MiB   │  21.52661903446796 │
└────────────────────────┴──────────────┴────────────┴────────────────────┘
 
28 rows in set. Elapsed: 0.005 sec.

Tlhaloso ea likarolo tse sebelisitsoeng

FileBeat. Ho fetisa lifaele tsa lifaele

Karolo ena e hlokomela liphetoho ho kenya lifaele ho disk ebe e fetisetsa tlhahisoleseling ho LogStash. E kentsoe ho li-server tsohle moo lifaele tsa log li ngotsoeng (hangata IIS). E sebetsa ka mokhoa oa mohatla (ke hore, e fetisetsa feela litlaleho tse kenyelletsoeng faeleng). Empa o ka e hlophisa ka thoko ho fetisetsa lifaele tsohle. Sena se loketse ha o hloka ho khoasolla lintlha tsa likhoeli tse fetileng. Kenya feela faele ea log ka har'a foldara 'me e tla e bala ka botlalo.

Ha tšebeletso e emisa, data e emisa ho fetisetsoa ho ea pele ho polokelo.

Sebopeho sa mohlala se shebahala tjena:

filebeat.yml

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/W3SVC1/*.log
  exclude_files: ['.gz$','.zip$']
  tail_files: true
  ignore_older: 24h
  fields:
    fld_server_name: "site1.domain.ru"
    fld_app_name: "site1.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
 
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/__Import/access_log-*
  exclude_files: ['.gz$','.zip$']
  tail_files: false
  fields:
    fld_server_name: "site2.domain.ru"
    fld_app_name: "site2.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
    fld_logformat: "logformat__apache"
 
 
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
  reload.period: 2s
 
output.logstash:
  hosts: ["log.domain.com:5044"]
 
  ssl.enabled: true
  ssl.certificate_authorities: ["C:/filebeat/certs/ca.pem", "C:/filebeat/certs/ca-issuing.pem"]
  ssl.certificate: "C:/filebeat/certs/site1.domain.ru.cer"
  ssl.key: "C:/filebeat/certs/site1.domain.ru.key"
 
#================================ Processors =====================================
 
processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~

LogStash. Mokelli oa Lifate

Karolo ena e etselitsoe ho amohela lirekoto tsa log ho tsoa ho FileBeat (kapa ka mokoloko oa RabbitMQ), li hlalose le ho li kenya ka lihlopha ho database ea ClickHouse.

Ho kenya ho ClickHouse, sebelisa plugin ea Logstash-output-clickhouse. Logstash plugin e na le mokhoa oa ho khutlisa likopo, empa nakong ea ho koala khafetsa, ho molemo ho emisa ts'ebeletso ka boeona. Ha e emisoa, melaetsa e tla bokella lethathamong la RabbitMQ, kahoo haeba setopo se le nako e telele, joale ho molemo ho emisa Filebeats ho li-server. Lenaneong leo RabbitMQ e sa sebelisoeng (mocheng oa marang-rang oa sebaka sa Filebeat ka ho toba o romela li-logs ho Logstash), Filebeats e sebetsa ka mokhoa o amohelehang le o sireletsehileng, kahoo ho bona ho se fumanehe ha lihlahisoa ha ho na liphello.

Sebopeho sa mohlala se shebahala tjena:

log_web__filebeat_clickhouse.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/logstash/certs/ca.cer", "/etc/logstash/certs/ca-issuing.cer"]
        ssl_certificate => "/etc/logstash/certs/server.cer"
        ssl_key => "/etc/logstash/certs/server-pkcs8.key"
        ssl_verify_mode => "peer"
 
            add_field => {
                "fld_server_name" => "%{[fields][fld_server_name]}"
                "fld_app_name" => "%{[fields][fld_app_name]}"
                "fld_app_module" => "%{[fields][fld_app_module]}"
                "fld_website_name" => "%{[fields][fld_website_name]}"
                "fld_log_file_name" => "%{source}"
                "fld_logformat" => "%{[fields][fld_logformat]}"
            }
    }
 
    rabbitmq {
        host => "queue.domain.com"
        port => 5671
        user => "q-reader"
        password => "password"
        queue => "web_log"
        heartbeat => 30
        durable => true
        ssl => true
        #ssl_certificate_path => "/etc/logstash/certs/server.p12"
        #ssl_certificate_password => "password"
 
        add_field => {
            "fld_server_name" => "%{[fields][fld_server_name]}"
            "fld_app_name" => "%{[fields][fld_app_name]}"
            "fld_app_module" => "%{[fields][fld_app_module]}"
            "fld_website_name" => "%{[fields][fld_website_name]}"
            "fld_log_file_name" => "%{source}"
            "fld_logformat" => "%{[fields][fld_logformat]}"
        }
    }
 
}
 
filter { 
 
      if [message] =~ "^#" {
        drop {}
      }
 
      if [fld_logformat] == "logformat__iis_with_xrealip" {
     
          grok {
            match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken} %{NOTSPACE:xrealIP} %{NOTSPACE:xforwarderfor}"]
          }
      } else {
   
          grok {
             match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken}"]
          }
 
      }
 
      date {
        match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
          timezone => "Etc/UTC"
        remove_field => [ "log_timestamp", "@timestamp" ]
        target => [ "log_timestamp2" ]
      }
 
        ruby {
            code => "tstamp = event.get('log_timestamp2').to_i
                        event.set('logdatetime', Time.at(tstamp).strftime('%Y-%m-%d %H:%M:%S'))
                        event.set('logdate', Time.at(tstamp).strftime('%Y-%m-%d'))"
        }
 
      if [bytesSent] {
        ruby {
          code => "event['kilobytesSent'] = event['bytesSent'].to_i / 1024.0"
        }
      }
 
 
      if [bytesReceived] {
        ruby {
          code => "event['kilobytesReceived'] = event['bytesReceived'].to_i / 1024.0"
        }
      }
 
   
        ruby {
            code => "event.set('clientRealIP', event.get('clientIP'))"
        }
        if [xrealIP] {
            ruby {
                code => "event.set('clientRealIP', event.get('xrealIP'))"
            }
        }
        if [xforwarderfor] {
            ruby {
                code => "event.set('clientRealIP', event.get('xforwarderfor'))"
            }
        }
 
      mutate {
        convert => ["bytesSent", "integer"]
        convert => ["bytesReceived", "integer"]
        convert => ["timetaken", "integer"] 
        convert => ["port", "integer"]
 
        add_field => {
            "clientHostname" => "%{clientIP}"
        }
      }
 
        useragent {
            source=> "useragent"
            prefix=> "browser"
        }
 
        kv {
            source => "uriQuery"
            prefix => "uriQuery__"
            allow_duplicate_values => false
            field_split => "&"
            include_keys => [ "utm_medium", "utm_source", "utm_campaign", "utm_term", "utm_content", "yclid", "region" ]
        }
 
        mutate {
            join => { "uriQuery__utm_source" => "," }
            join => { "uriQuery__utm_medium" => "," }
            join => { "uriQuery__utm_campaign" => "," }
            join => { "uriQuery__utm_term" => "," }
            join => { "uriQuery__utm_content" => "," }
            join => { "uriQuery__yclid" => "," }
            join => { "uriQuery__region" => "," }
        }
 
}
 
output { 
  #stdout {codec => rubydebug}
    clickhouse {
      headers => ["Authorization", "Basic abcdsfks..."]
      http_hosts => ["http://127.0.0.1:8123"]
      save_dir => "/etc/logstash/tmp"
      table => "log_web"
      request_tolerance => 1
      flush_size => 10000
      idle_flush_time => 1
        mutations => {
            "fld_log_file_name" => "fld_log_file_name"
            "fld_server_name" => "fld_server_name"
            "fld_app_name" => "fld_app_name"
            "fld_app_module" => "fld_app_module"
            "fld_website_name" => "fld_website_name"
 
            "logdatetime" => "logdatetime"
            "logdate" => "logdate"
            "serverIP" => "serverIP"
            "method" => "method"
            "uriStem" => "uriStem"
            "uriQuery" => "uriQuery"
            "port" => "port"
            "username" => "username"
            "clientIP" => "clientIP"
            "clientRealIP" => "clientRealIP"
            "userAgent" => "userAgent"
            "referer" => "referer"
            "response" => "response"
            "subresponse" => "subresponse"
            "win32response" => "win32response"
            "timetaken" => "timetaken"
             
            "uriQuery__utm_medium" => "uriQuery__utm_medium"
            "uriQuery__utm_source" => "uriQuery__utm_source"
            "uriQuery__utm_campaign" => "uriQuery__utm_campaign"
            "uriQuery__utm_term" => "uriQuery__utm_term"
            "uriQuery__utm_content" => "uriQuery__utm_content"
            "uriQuery__yclid" => "uriQuery__yclid"
            "uriQuery__region" => "uriQuery__region"
        }
    }
 
}

liphaephe.yml

# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
#   https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html
 
- pipeline.id: log_web__filebeat_clickhouse
  path.config: "/etc/logstash/log_web__filebeat_clickhouse.conf"

ClickHouse. Polokelo ea logong

Li-log tsa litsamaiso tsohle li bolokiloe tafoleng e le 'ngoe (bona qalong ea sehlooho). E etselitsoe ho boloka tlhahisoleseling mabapi le likopo: litekanyo tsohle li tšoana bakeng sa lifomate tse fapaneng, mohlala, li-log tsa IIS, li-apache le nginx. Bakeng sa li-log tsa kopo tseo, ka mohlala, liphoso, melaetsa ea tlhahisoleseding, litemoso li tlalehiloeng, tafole e arohaneng e tla fanoa ka mohaho o nepahetseng (hona joale o sethaleng sa moralo).

Ha u theha tafole, ke habohlokoa haholo ho etsa qeto ka senotlolo sa mantlha (eo data e tla hlophisoa ka eona nakong ea polokelo). Tekanyo ea khatello ea data le lebelo la ho botsa li itšetlehile ka sena. Mohlala oa rona, senotlolo ke
LAELA KA (fld_app_name, fld_app_module, logdatetime)
Ke hore, ka lebitso la tsamaiso, lebitso la karolo ea tsamaiso le letsatsi la ketsahalo. Qalong, letsatsi la ketsahalo e ne e le la pele. Kamora ho e isa sebakeng sa ho qetela, lipotso li ile tsa qala ho sebetsa ka potlako habeli. Ho fetola senotlolo sa mantlha ho tla hloka ho bopa tafole hape le ho kenya data hape e le hore ClickHouse e tla hlophisa hape data ho disk. Ena ke ts'ebetso e thata, kahoo ho bohlokoa ho nahana ka hloko esale pele ka se lokelang ho kenyelletsoa senotlolo sa mofuta.

Hape hoa lokela ho hlokomeloa hore mofuta oa data oa LowCardinality o hlahile liphetolelong tsa morao-rao. Ha u e sebelisa, boholo ba data e hatelitsoeng bo fokotsehile haholo bakeng sa masimo a nang le khardinali e tlase (likhetho tse fokolang).

Hajoale re sebelisa mofuta oa 19.6 mme re rera ho leka ho nchafatsa mofuta oa morao-rao. Li na le likarolo tse ntle tse kang Adaptive Granularity, Skipping indices le DoubleDelta codec, mohlala.

Ka nako e sa lekanyetsoang, nakong ea ho kenya, boemo ba ho rema lifate bo behiloe ho latela mohlala. Li-log li pota-potiloe 'me li bolokiloe, empa ka nako e ts'oanang li atoloha ho fihlela ho gigabyte. Haeba ho se na tlhokahalo, joale u ka beha boemo ba ho lemosa, joale boholo ba log bo tla fokotseha haholo. Litlhophiso tsa ho rekota li hlalositsoe faeleng ea config.xml:

<!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger. h#L105 -->
<level>warning</level>

Litaelo tse ling tse molemo

Поскольку оригинальные пакеты установки собираются по Debian, то для других версий Linux необходимо использовать пакеты собранные компанией Altinity.
 
Вот по этой ссылке есть инструкции с ссылками на их репозиторий: https://www.altinity.com/blog/2017/12/18/logstash-with-clickhouse
sudo yum search clickhouse-server
sudo yum install clickhouse-server.noarch
  
1. проверка статуса
sudo systemctl status clickhouse-server
 
2. остановка сервера
sudo systemctl stop clickhouse-server
 
3. запуск сервера
sudo systemctl start clickhouse-server
 
Запуск для выполнения запросов в многострочном режиме (выполнение после знака ";")
clickhouse-client --multiline
clickhouse-client --multiline --host 127.0.0.1 --password pa55w0rd
clickhouse-client --multiline --host 127.0.0.1 --port 9440 --secure --user default --password pa55w0rd
 
Плагин кликлауза для логстеш в случае ошибки в одной строке сохраняет всю пачку в файл /tmp/log_web_failed.json
Можно вручную исправить этот файл и попробовать залить его в БД вручную:
clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /tmp/log_web_failed__fixed.json
 
sudo mv /etc/logstash/tmp/log_web_failed.json /etc/logstash/tmp/log_web_failed__fixed.json
sudo chown user_dev /etc/logstash/tmp/log_web_failed__fixed.json
sudo clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /etc/logstash/tmp/log_web_failed__fixed.json
sudo mv /etc/logstash/tmp/log_web_failed__fixed.json /etc/logstash/tmp/log_web_failed__fixed_.json
 
выход из командной строки
quit;
## Настройка TLS
https://www.altinity.com/blog/2019/3/5/clickhouse-networking-part-2
 
openssl s_client -connect log.domain.com:9440 < /dev/null

LogStash. Loka router ho tloha FileBeat ho ea RabbitMQ queue

Karolo ena e sebelisetsoa ho tsamaisa li-logs tse tsoang FileBeat ho ea mokolokong oa RabbitMQ. Ho na le lintlha tse peli mona:

  1. Ka bomalimabe, FileBeat ha e na plugin e hlahisoang bakeng sa ho ngola ka kotloloho ho RabbitMQ. 'Me ts'ebetso e joalo, ho ahlola ka poso ho github ea bona, ha e reroe bakeng sa ts'ebetsong. Ho na le plugin bakeng sa Kafka, empa ka mabaka a itseng re ke ke ra e sebelisa ka borona.
  2. Ho na le litlhoko tsa ho bokella lintlha ho DMZ. E ipapisitse le tsona, li-log li tlameha ho beoa pele pele ebe LogStash e bala lirekoto tse tsoang moleng kantle.

Ka hona, ka ho khetheha bakeng sa li-server tse fumanehang DMZ, hoa hlokahala ho sebelisa leano le rarahaneng joalo. Sebopeho sa mohlala se shebahala tjena:

iis_w3c_logs__filebeat_rabbitmq.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/pki/tls/certs/app/ca.pem", "/etc/pki/tls/certs/app/ca-issuing.pem"]
        ssl_certificate => "/etc/pki/tls/certs/app/queue.domain.com.cer"
        ssl_key => "/etc/pki/tls/certs/app/queue.domain.com-pkcs8.key"
        ssl_verify_mode => "peer"
    }
 
}
 
output { 
  #stdout {codec => rubydebug}
 
    rabbitmq {
        host => "127.0.0.1"
        port => 5672
        exchange => "monitor.direct"
        exchange_type => "direct"
        key => "%{[fields][fld_app_name]}"
        user => "q-writer"
        password => "password"
        ssl => false
    }
}

RabbitMQ. Lethathamo la Melaetsa

Karolo ena e sebelisoa ho thibela lipeeletso ho DMZ. Ho rekota ho etsoa ka Filebeat → LogStash link. Ho bala ho etsoa ka ntle ho DMZ ka LogStash. Ha o sebetsa ka RabbitMQ, melaetsa e ka bang likete tse 4 ka motsotsoana e sebetsoa.

Mokhoa oa ho tsamaisa melaetsa o lokisoa ka lebitso la tsamaiso, ke hore, ho latela data ea tlhophiso ea FileBeat. Melaetsa eohle e ea moleng o le mong. Haeba ka lebaka le itseng tšebeletso ea queuing e emisoa, sena se ke ke sa lebisa tahlehelong ea molaetsa: FileBeats e tla fumana liphoso tsa khokahanyo 'me e tla emisa ho romela ka nakoana. 'Me LogStash, e balang ho tloha moleng, e tla boela e fumane liphoso tsa marang-rang ebe e emela hore khokahanyo e tsosolosoe. Tabeng ena, ha e le hantle, data e ke ke ea hlola e ngoloa ho database.

Litaelo tse latelang li sebelisoa ho theha le ho hlophisa mela:

sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare exchange --vhost=/ name=monitor.direct type=direct sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare queue --vhost=/ name=web_log durable=true
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site1.domain.ru"
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site2.domain.ru"

Grafana. Li-dashboards

Karolo ena e sebelisetsoa ho bona lintlha tsa ho beha leihlo. Tabeng ena, o hloka ho kenya datasource ea ClickHouse bakeng sa plugin ea Grafana 4.6+. Re ile ra tlameha ho e lokisa hanyane ho ntlafatsa ts'ebetso ea ts'ebetso ea li-filters tsa SQL ho dashboard.

Ka mohlala, re sebelisa mefuta-futa, 'me haeba e sa hlalosoa sebakeng sa ho hloekisa, joale re ka rata hore e se ke ea hlahisa boemo HOKAE ea foromo ( uriStem = "LE uriStem != "). Tabeng ena, ClickHouse e tla bala kholomo ea uriStem. Kahoo, re lekile likhetho tse fapaneng mme qetellong ra lokisa plugin (the $valueIfEmpty macro) ho khutlisa 1 haeba ho na le boleng bo se nang letho, ntle le ho bua ka kholomo ka boeona.

'Me joale u ka sebelisa potso ena bakeng sa kerafo

$columns(response, count(*) c) from $table where $adhoc
and $valueIfEmpty($fld_app_name, 1, fld_app_name = '$fld_app_name')
and $valueIfEmpty($fld_app_module, 1, fld_app_module = '$fld_app_module') and $valueIfEmpty($fld_server_name, 1, fld_server_name = '$fld_server_name') and $valueIfEmpty($uriStem, 1, uriStem like '%$uriStem%')
and $valueIfEmpty($clientRealIP, 1, clientRealIP = '$clientRealIP')

e fetoletsoeng ho SQL joalo ka ena (hlokomela hore masimo a se nang letho a uriStem a fetoloa 1 feela)

SELECT
t,
groupArray((response, c)) AS groupArr
FROM (
SELECT
(intDiv(toUInt32(logdatetime), 60) * 60) * 1000 AS t, response,
count(*) AS c FROM default.log_web
WHERE (logdate >= toDate(1565061982)) AND (logdatetime >= toDateTime(1565061982)) AND 1 AND (fld_app_name = 'site1.domain.ru') AND (fld_app_module = 'web') AND 1 AND 1 AND 1
GROUP BY
t, response
ORDER BY
t ASC,
response ASC
)
GROUP BY t ORDER BY t ASC

fihlela qeto e

Ponahalo ea database ea ClickHouse e fetohile ketsahalo ea bohlokoa 'marakeng. Ho ne ho le thata ho nahana hore hang-hang, ntle le tefo, re ne re hlometse ka sesebelisoa se matla le se sebetsang sa ho sebetsa ka data e kholo. Ha e le hantle, ha litlhoko li ntse li eketseha (mohlala, ho sharding le ho pheta-pheta ho li-server tse ngata), morero o tla ba o rarahaneng haholoanyane. Empa ho ea ka maikutlo a pele, ho sebetsa le database ena ho monate haholo. Ho hlakile hore sehlahisoa se entsoe "bakeng sa batho".

Ha ho bapisoa le ElasticSearch, litšenyehelo tsa ho boloka le ho lokisa likutu, ho latela likhakanyo tsa pele, li fokotsehile ka makhetlo a mahlano ho isa ho a leshome. Ka mantsoe a mang, haeba bakeng sa palo ea hona joale ea data re tla tlameha ho theha sehlopha sa mechine e mengata, joale ha re sebelisa ClickHouse re hloka mochine o le mong feela oa matla a tlaase. E, ha e le hantle, ElasticSearch e boetse e na le mekhoa ea ho hatella data ho disk le likarolo tse ling tse ka fokotsang haholo tšebeliso ea lisebelisoa, empa ha li bapisoa le ClickHouse sena se tla hloka litšenyehelo tse kholoanyane.

Ntle le lintlafatso life kapa life tse khethehileng molemong oa rona, ka litlhophiso tsa kamehla, ho kenya data le ho khutlisa data ho tsoa ho database ho sebetsa ka lebelo le makatsang. Ha re na lintlha tse ngata hajoale (lirekoto tse ka bang limilione tse 200), empa seva ka boeona e fokola. Re ka sebelisa sesebelisoa sena nakong e tlang molemong oa merero e meng e sa amaneng le ho boloka li-log. Ka mohlala, bakeng sa li-analytics tsa ho qetela, tšimong ea ts'ireletso, ho ithuta mochine.

Qetellong, hanyenyane ka melemo le boiketlo.

Минусы

  1. E kenya lirekoto ka lihlopha tse kholo. Ka lehlakoreng le leng, ena ke karolo, empa o ntse o tlameha ho sebelisa likarolo tse ling ho boloka lirekoto. Mosebetsi ona ha o bonolo kamehla, empa o ntse o ka rarolloa. 'Me ke rata ho nolofatsa morero.
  2. Ts'ebetso e 'ngoe e sa tloaelehang kapa likarolo tse ncha hangata li hlaha liphetolelong tse ncha. Sena se hlahisa lipelaelo, ho fokotsa takatso ea ho ntlafatsa mofuta o mocha. Ka mohlala, enjene ea tafole ea Kafka ke ntho e molemo haholo e u lumellang hore u bale liketsahalo tsa Kafka ka ho toba, ntle le ho kenya ts'ebetsong bareki. Empa ho latela palo ea Mathata a Github, re ntse re le hlokolosi ho sebelisa enjene ena tlhahisong. Leha ho le joalo, haeba u sa etse metsamao ea tšohanyetso ka lehlakoreng le ho sebelisa ts'ebetso ea mantlha, joale e sebetsa ka mokhoa o tsitsitseng.

Плюсы

  1. Ha e khoehlise.
  2. Moeli o tlase oa ho kena.
  3. Mohloli o bulehileng.
  4. Mahala.
  5. Scalable (ho sharding/out-of-the-box replication)
  6. E kenyelelitsoe ho ngoliso ea software ea Serussia e khothalletsoang ke Lekala la Lipuisano.
  7. Ho fumaneha ha tšehetso ea semmuso ho tsoa ho Yandex.

Source: www.habr.com

Eketsa ka tlhaloso