ClickHouse Database pou moun, oswa teknoloji etranje

Aleksey Lizunov, Chèf Sant Konpetans pou Chanèl Sèvis Remote nan Direksyon Teknoloji Enfòmasyon nan MKB la.

ClickHouse Database pou moun, oswa teknoloji etranje

Kòm yon altènativ a pile ELK (ElasticSearch, Logstash, Kibana), nou ap fè rechèch sou itilizasyon baz done ClickHouse kòm yon magazen done pou mòso bwa.

Nan atik sa a, nou ta renmen pale sou eksperyans nou nan sèvi ak baz done ClickHouse ak rezilta preliminè operasyon pilòt la. Li ta dwe remake touswit ke rezilta yo te enpresyonan.


ClickHouse Database pou moun, oswa teknoloji etranje

Apre sa, nou pral dekri an plis detay ki jan sistèm nou an se configuré, ak ki konpozan li konsiste de. Men koulye a, mwen ta renmen pale yon ti kras sou baz done sa a kòm yon antye, ak poukisa li vo peye atansyon a. Baz done ClickHouse se yon baz done kolòn analyse wo-pèfòmans soti nan Yandex. Yo itilize li nan sèvis Yandex, okòmansman li se depo done prensipal la pou Yandex.Metrica. Sistèm sous louvri, gratis. Soti nan pwen de vi pwomotè a, mwen te toujou mande ki jan yo aplike li, paske gen done fantastikman gwo. Ak koòdone itilizatè Metrica a tèt li trè fleksib ak vit. Nan premye konesans ak baz done sa a, enpresyon a se: "Oke, finalman! Fè pou pèp la! Kòmanse nan pwosesis enstalasyon an epi fini ak voye demann.

Baz done sa a gen yon papòt antre trè ba. Menm yon pwomotè mwayèn kalifye ka enstale baz done sa a nan kèk minit epi kòmanse sèvi ak li. Tout bagay ap travay byen klè. Menm moun ki nouvo nan Linux ka byen vit jwenn nan enstalasyon an epi fè operasyon ki pi senp yo. Si pi bonè, ak mo Big Data, Hadoop, Google BigTable, HDFS, yon pwomotè òdinè te gen lide ke li te sou kèk teraoctet, petaoctet, ke kèk superhuman yo angaje nan anviwònman ak devlopman pou sistèm sa yo, Lè sa a, ak avènement ClickHouse la. baz done, nou te resevwa yon zouti senp, ki konprann ak ki ou ka rezoud yon seri travay ki te deja pa ka atenn. Li pran sèlman yon machin jistis mwayèn ak senk minit pou enstale. Sa vle di, nou te resevwa yon baz done tankou, pou egzanp, MySql, men se sèlman pou estoke dè milya de dosye! Yon sèten super-achiv ak lang SQL. Se tankou moun yo te remèt zam etranje yo.

Konsènan sistèm anrejistreman nou an

Pou kolekte enfòmasyon yo, yo itilize fichye boutèy IIS nan fòma estanda aplikasyon entènèt (n ap analize mòso bwa yo tou kounye a, men objektif prensipal la nan etap pilòt la se kolekte mòso bwa IIS).

Pou plizyè rezon, nou pa t 'kapab konplètman abandone pil ELK la, epi nou kontinye sèvi ak konpozan LogStash ak Filebeat, ki te pwouve tèt yo byen epi travay byen fyab ak previzib.

Yo montre plan an jeneral nan figi ki anba a:

ClickHouse Database pou moun, oswa teknoloji etranje

Yon karakteristik ekri done nan baz done ClickHouse se ensèsyon dosye yo pa souvan (yon fwa pa segonn). Sa a, aparamman, se pati ki pi "pwoblèm" ke ou rankontre lè ou premye eksperyans travay ak baz done ClickHouse la: konplo a vin yon ti kras pi konplike.
Plugin pou LogStash, ki dirèkteman mete done nan ClickHouse, te ede anpil isit la. Se eleman sa a deplwaye sou sèvè a menm jan ak baz done a li menm. Se konsa, jeneralman pale, li pa rekòmande fè li, men nan yon pwen de vi pratik, se konsa yo pa pwodwi sèvè separe pandan y ap deplwaye sou sèvè a menm. Nou pa t obsève okenn echèk oswa konfli resous ak baz done a. Anplis de sa, li ta dwe remake ke Plugin la gen yon mekanis retry nan ka ta gen erè. Ak nan ka ta gen erè, Plugin la ekri sou disk yon pakèt done ki pa ta ka mete (fòma fichye a se pratik: apre koreksyon, ou ka fasilman insert pakèt la korije lè l sèvi avèk clickhouse-client).

Yon lis konplè sou lojisyèl yo itilize nan konplo a prezante nan tablo a:

Lis lojisyèl yo itilize

Non

Deskripsyon

Lyen distribisyon

NGINX

Reverse-proxy pou limite aksè pa pò yo epi òganize otorizasyon

Kounye a pa itilize nan konplo a

https://nginx.org/ru/download.html

https://nginx.org/download/nginx-1.16.0.tar.gz

FileBeat

Transfè nan mòso bwa dosye.

https://www.elastic.co/downloads/beats/filebeat (twous distribisyon pou Windows 64bit).

https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.3.0-windows-x86_64.zip

logstash

Log pèseptè.

Yo itilize pou kolekte mòso bwa ki soti nan FileBeat, osi byen ke pou kolekte mòso bwa ki soti nan keu RabbitMQ (pou sèvè ki nan DMZ la.)

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

Logstash-output-clickhouse

Plugin Loagstash pou transfere mòso bwa nan baz done ClickHouse an pakèt

https://github.com/mikechris/logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin enstale logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin enstale logstash-filter-prune

/usr/share/logstash/bin/logstash-plugin enstale logstash-filter-multiline

Klike sou Kay

Log depo https://clickhouse.yandex/docs/ru/

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-server-19.5.3.8-1.el7.x86_64.rpm

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-client-19.5.3.8-1.el7.x86_64.rpm

Remak. Kòmanse soti nan mwa Out 2018, "nòmal" rpm bati pou RHEL parèt nan depo Yandex la, kidonk, ou ka eseye sèvi ak yo. Nan moman enstalasyon an, nou t ap itilize pakè ki te konstwi pa Altinity.

grafana

Log vizyalizasyon. Mete kanpe tablodbò yo

https://grafana.com/

https://grafana.com/grafana/download

Redhat & Centos (64 Bit) - dènye vèsyon

Sous done ClickHouse pou Grafana 4.6+

Plugin pou Grafana ak sous done ClickHouse

https://grafana.com/plugins/vertamedia-clickhouse-datasource

https://grafana.com/api/plugins/vertamedia-clickhouse-datasource/versions/1.8.1/download

logstash

Log routeur soti nan FileBeat nan keu RabbitMQ.

Remak. Malerezman FileBeat pa soti dirèkteman nan RabbitMQ, kidonk yon lyen entèmedyè nan fòm Logstash obligatwa.

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

Lapen MQ

keu mesaj. Sa a se tanpon nan boutèy demi lit nan DMZ la

https://www.rabbitmq.com/download.html

https://github.com/rabbitmq/rabbitmq-server/releases/download/v3.7.14/rabbitmq-server-3.7.14-1.el7.noarch.rpm

Erlang Runtime (Oblije pou RabbitMQ)

Erlang ègzekutabl. Obligatwa pou RabbitMQ travay

http://www.erlang.org/download.html

https://www.rabbitmq.com/install-rpm.html#install-erlang http://www.erlang.org/downloads/21.3

Konfigirasyon sèvè a ak baz done ClickHouse prezante nan tablo sa a:

Non

Valè

Note

Konfigirasyon

HDD: 40GB
RAM: 8GB
Processeur: Nwayo 2 2Ghz

Li nesesè yo peye atansyon sou konsèy yo pou opere baz done ClickHouse (https://clickhouse.yandex/docs/ru/operations/tips/)

Lojisyèl sistèm jeneral

OS: Red Hat Enterprise Linux Server (Maipo)

JRE (Java 8)

 

Kòm ou ka wè, sa a se yon estasyon travay òdinè.

Estrikti tab la pou estoke mòso bwa se jan sa a:

log_web.sql

CREATE TABLE log_web (
  logdate Date,
  logdatetime DateTime CODEC(Delta, LZ4HC),
   
  fld_log_file_name LowCardinality( String ),
  fld_server_name LowCardinality( String ),
  fld_app_name LowCardinality( String ),
  fld_app_module LowCardinality( String ),
  fld_website_name LowCardinality( String ),
 
  serverIP LowCardinality( String ),
  method LowCardinality( String ),
  uriStem String,
  uriQuery String,
  port UInt32,
  username LowCardinality( String ),
  clientIP String,
  clientRealIP String,
  userAgent String,
  referer String,
  response String,
  subresponse String,
  win32response String,
  timetaken UInt64
   
  , uriQuery__utm_medium String
  , uriQuery__utm_source String
  , uriQuery__utm_campaign String
  , uriQuery__utm_term String
  , uriQuery__utm_content String
  , uriQuery__yclid String
  , uriQuery__region String
 
) Engine = MergeTree()
PARTITION BY toYYYYMM(logdate)
ORDER BY (fld_app_name, fld_app_module, logdatetime)
SETTINGS index_granularity = 8192;

Nou itilize patisyon default (pa mwa) ak granularite endèks. Tout jaden pratikman koresponn ak antre log IIS pou antre demann http. Separeman, nou sonje ke gen jaden separe pou estoke utm-tags (yo analize nan etap nan mete nan tab la soti nan jaden an fisèl rechèch).

Epitou, plizyè jaden sistèm yo te ajoute sou tab la nan magazen enfòmasyon sou sistèm, konpozan, serveurs. Gade tablo ki anba a pou yon deskripsyon jaden sa yo. Nan yon tab, nou estoke mòso bwa pou plizyè sistèm.

Non

Deskripsyon

Egzanp

fld_app_name

Aplikasyon/non sistèm
Valè valab:

  • site1.domain.com Sit ekstèn 1
  • site2.domain.com Sit ekstèn 2
  • internal-site1.domain.local Sit entèn 1

site1.domain.com

fld_app_module

Modil sistèm
Valè valab:

  • entènèt - Sit entènèt
  • svc - Sèvis sit entènèt
  • intgr - Sèvis Entegrasyon Web
  • bo - Admin (BackOffice)

sou sit wèb

fld_website_name

Non sit nan IIS

Plizyè sistèm ka deplwaye sou yon sèl sèvè, oswa menm plizyè ka nan yon sèl modil sistèm

prensipal entènèt

fld_server_name

Non sèvè

web1.domain.com

fld_log_file_name

Chemen nan dosye a boutèy demi lit sou sèvè a

C:inetpublogsLogFiles
W3SVC1u_ex190711.log

Sa pèmèt ou byen konstwi graf nan Grafana. Pou egzanp, gade demann ki soti nan entèfas nan yon sistèm patikilye. Sa a se menm jan ak kontwa sit la nan Yandex.Metrica.

Men kèk estatistik sou itilizasyon baz done a pou de mwa.

Kantite dosye ki dekonpoze pa sistèm ak konpozan yo

SELECT
    fld_app_name,
    fld_app_module,
    count(fld_app_name) AS rows_count
FROM log_web
GROUP BY
    fld_app_name,
    fld_app_module
    WITH TOTALS
ORDER BY
    fld_app_name ASC,
    rows_count DESC
 
┌─fld_app_name─────┬─fld_app_module─┬─rows_count─┐
│ site1.domain.ru  │ web            │     131441 │
│ site2.domain.ru  │ web            │    1751081 │
│ site3.domain.ru  │ web            │  106887543 │
│ site3.domain.ru  │ svc            │   44908603 │
│ site3.domain.ru  │ intgr          │    9813911 │
│ site4.domain.ru  │ web            │     772095 │
│ site5.domain.ru  │ web            │   17037221 │
│ site5.domain.ru  │ intgr          │     838559 │
│ site5.domain.ru  │ bo             │       7404 │
│ site6.domain.ru  │ web            │     595877 │
│ site7.domain.ru  │ web            │   27778858 │
└──────────────────┴────────────────┴────────────┘
 
Totals:
┌─fld_app_name─┬─fld_app_module─┬─rows_count─┐
│              │                │  210522593 │
└──────────────┴────────────────┴────────────┘
 
11 rows in set. Elapsed: 4.874 sec. Processed 210.52 million rows, 421.67 MB (43.19 million rows/s., 86.51 MB/s.)

Kantite done sou disk la

SELECT
    formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed,
    formatReadableSize(sum(data_compressed_bytes)) AS compressed,
    sum(rows) AS total_rows
FROM system.parts
WHERE table = 'log_web'
 
┌─uncompressed─┬─compressed─┬─total_rows─┐
│ 54.50 GiB    │ 4.86 GiB   │  211427094 │
└──────────────┴────────────┴────────────┘
 
1 rows in set. Elapsed: 0.035 sec.

Degre konpresyon done nan kolòn

SELECT
    name,
    formatReadableSize(data_uncompressed_bytes) AS uncompressed,
    formatReadableSize(data_compressed_bytes) AS compressed,
    data_uncompressed_bytes / data_compressed_bytes AS compress_ratio
FROM system.columns
WHERE table = 'log_web'
 
┌─name───────────────────┬─uncompressed─┬─compressed─┬─────compress_ratio─┐
│ logdate                │ 401.53 MiB   │ 1.80 MiB   │ 223.16665968777315 │
│ logdatetime            │ 803.06 MiB   │ 35.91 MiB  │ 22.363966401202305 │
│ fld_log_file_name      │ 220.66 MiB   │ 2.60 MiB   │  84.99905736932571 │
│ fld_server_name        │ 201.54 MiB   │ 50.63 MiB  │  3.980924816977078 │
│ fld_app_name           │ 201.17 MiB   │ 969.17 KiB │ 212.55518183686877 │
│ fld_app_module         │ 201.17 MiB   │ 968.60 KiB │ 212.67805817411906 │
│ fld_website_name       │ 201.54 MiB   │ 1.24 MiB   │  162.7204926761546 │
│ serverIP               │ 201.54 MiB   │ 50.25 MiB  │  4.010824061219731 │
│ method                 │ 201.53 MiB   │ 43.64 MiB  │  4.617721053304486 │
│ uriStem                │ 5.13 GiB     │ 832.51 MiB │  6.311522291936919 │
│ uriQuery               │ 2.58 GiB     │ 501.06 MiB │  5.269731450124478 │
│ port                   │ 803.06 MiB   │ 3.98 MiB   │ 201.91673864241824 │
│ username               │ 318.08 MiB   │ 26.93 MiB  │ 11.812513794583598 │
│ clientIP               │ 2.35 GiB     │ 82.59 MiB  │ 29.132328640073343 │
│ clientRealIP           │ 2.49 GiB     │ 465.05 MiB │  5.478382297052563 │
│ userAgent              │ 18.34 GiB    │ 764.08 MiB │  24.57905114484208 │
│ referer                │ 14.71 GiB    │ 1.37 GiB   │ 10.736792723669906 │
│ response               │ 803.06 MiB   │ 83.81 MiB  │  9.582334090987247 │
│ subresponse            │ 399.87 MiB   │ 1.83 MiB   │  218.4831068635027 │
│ win32response          │ 407.86 MiB   │ 7.41 MiB   │ 55.050315514606815 │
│ timetaken              │ 1.57 GiB     │ 402.06 MiB │ 3.9947395692010637 │
│ uriQuery__utm_medium   │ 208.17 MiB   │ 12.29 MiB  │ 16.936148912472955 │
│ uriQuery__utm_source   │ 215.18 MiB   │ 13.00 MiB  │ 16.548367623199912 │
│ uriQuery__utm_campaign │ 381.46 MiB   │ 37.94 MiB  │ 10.055156353418509 │
│ uriQuery__utm_term     │ 231.82 MiB   │ 10.78 MiB  │ 21.502540454070672 │
│ uriQuery__utm_content  │ 441.34 MiB   │ 87.60 MiB  │  5.038260760449327 │
│ uriQuery__yclid        │ 216.88 MiB   │ 16.58 MiB  │  13.07721335008116 │
│ uriQuery__region       │ 204.35 MiB   │ 9.49 MiB   │  21.52661903446796 │
└────────────────────────┴──────────────┴────────────┴────────────────────┘
 
28 rows in set. Elapsed: 0.005 sec.

Deskripsyon konpozan yo itilize

FileBeat. Transfere dosye mòso bwa

Eleman sa a swiv chanjman nan dosye log sou disk epi li pase enfòmasyon yo bay LogStash. Enstale sou tout sèvè kote yo ekri dosye log (anjeneral IIS). Travay nan mòd ke (sa vle di transfere sèlman dosye yo ajoute nan dosye a). Men, separeman li kapab configuré pou transfere tout fichiers. Sa a itil lè ou bezwen telechaje done ki soti nan mwa anvan yo. Jis mete dosye boutèy la nan yon katab epi li pral li li an antye.

Lè sèvis la sispann, done yo pa transfere pi lwen nan depo a.

Yon egzanp konfigirasyon sanble sa a:

filebeat.yml

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/W3SVC1/*.log
  exclude_files: ['.gz$','.zip$']
  tail_files: true
  ignore_older: 24h
  fields:
    fld_server_name: "site1.domain.ru"
    fld_app_name: "site1.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
 
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/__Import/access_log-*
  exclude_files: ['.gz$','.zip$']
  tail_files: false
  fields:
    fld_server_name: "site2.domain.ru"
    fld_app_name: "site2.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
    fld_logformat: "logformat__apache"
 
 
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
  reload.period: 2s
 
output.logstash:
  hosts: ["log.domain.com:5044"]
 
  ssl.enabled: true
  ssl.certificate_authorities: ["C:/filebeat/certs/ca.pem", "C:/filebeat/certs/ca-issuing.pem"]
  ssl.certificate: "C:/filebeat/certs/site1.domain.ru.cer"
  ssl.key: "C:/filebeat/certs/site1.domain.ru.key"
 
#================================ Processors =====================================
 
processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~

logstash. Log Pèseptè

Eleman sa a fèt pou resevwa antre nan boutèy demi lit nan FileBeat (oswa atravè keu RabbitMQ), analize ak mete lo nan baz done ClickHouse la.

Pou ensèsyon nan ClickHouse, yo itilize Logstash-output-clickhouse plugin. Plugin Logstash la gen yon mekanis reesye demann, men ak yon fèmen regilye, li pi bon yo sispann sèvis la tèt li. Lè yo sispann, mesaj yo pral akimile nan keu RabbitMQ la, kidonk si sispann a se pou yon tan long, Lè sa a, li pi bon yo sispann Filebeats sou serveurs yo. Nan yon konplo kote RabbitMQ pa itilize (sou rezo lokal la, Filebeat dirèkteman voye mòso bwa nan Logstash), Filebeats travay byen akseptab ak an sekirite, kidonk pou yo indisponibilite pwodiksyon pase san konsekans.

Yon egzanp konfigirasyon sanble sa a:

log_web__filebeat_clickhouse.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/logstash/certs/ca.cer", "/etc/logstash/certs/ca-issuing.cer"]
        ssl_certificate => "/etc/logstash/certs/server.cer"
        ssl_key => "/etc/logstash/certs/server-pkcs8.key"
        ssl_verify_mode => "peer"
 
            add_field => {
                "fld_server_name" => "%{[fields][fld_server_name]}"
                "fld_app_name" => "%{[fields][fld_app_name]}"
                "fld_app_module" => "%{[fields][fld_app_module]}"
                "fld_website_name" => "%{[fields][fld_website_name]}"
                "fld_log_file_name" => "%{source}"
                "fld_logformat" => "%{[fields][fld_logformat]}"
            }
    }
 
    rabbitmq {
        host => "queue.domain.com"
        port => 5671
        user => "q-reader"
        password => "password"
        queue => "web_log"
        heartbeat => 30
        durable => true
        ssl => true
        #ssl_certificate_path => "/etc/logstash/certs/server.p12"
        #ssl_certificate_password => "password"
 
        add_field => {
            "fld_server_name" => "%{[fields][fld_server_name]}"
            "fld_app_name" => "%{[fields][fld_app_name]}"
            "fld_app_module" => "%{[fields][fld_app_module]}"
            "fld_website_name" => "%{[fields][fld_website_name]}"
            "fld_log_file_name" => "%{source}"
            "fld_logformat" => "%{[fields][fld_logformat]}"
        }
    }
 
}
 
filter { 
 
      if [message] =~ "^#" {
        drop {}
      }
 
      if [fld_logformat] == "logformat__iis_with_xrealip" {
     
          grok {
            match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken} %{NOTSPACE:xrealIP} %{NOTSPACE:xforwarderfor}"]
          }
      } else {
   
          grok {
             match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken}"]
          }
 
      }
 
      date {
        match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
          timezone => "Etc/UTC"
        remove_field => [ "log_timestamp", "@timestamp" ]
        target => [ "log_timestamp2" ]
      }
 
        ruby {
            code => "tstamp = event.get('log_timestamp2').to_i
                        event.set('logdatetime', Time.at(tstamp).strftime('%Y-%m-%d %H:%M:%S'))
                        event.set('logdate', Time.at(tstamp).strftime('%Y-%m-%d'))"
        }
 
      if [bytesSent] {
        ruby {
          code => "event['kilobytesSent'] = event['bytesSent'].to_i / 1024.0"
        }
      }
 
 
      if [bytesReceived] {
        ruby {
          code => "event['kilobytesReceived'] = event['bytesReceived'].to_i / 1024.0"
        }
      }
 
   
        ruby {
            code => "event.set('clientRealIP', event.get('clientIP'))"
        }
        if [xrealIP] {
            ruby {
                code => "event.set('clientRealIP', event.get('xrealIP'))"
            }
        }
        if [xforwarderfor] {
            ruby {
                code => "event.set('clientRealIP', event.get('xforwarderfor'))"
            }
        }
 
      mutate {
        convert => ["bytesSent", "integer"]
        convert => ["bytesReceived", "integer"]
        convert => ["timetaken", "integer"] 
        convert => ["port", "integer"]
 
        add_field => {
            "clientHostname" => "%{clientIP}"
        }
      }
 
        useragent {
            source=> "useragent"
            prefix=> "browser"
        }
 
        kv {
            source => "uriQuery"
            prefix => "uriQuery__"
            allow_duplicate_values => false
            field_split => "&"
            include_keys => [ "utm_medium", "utm_source", "utm_campaign", "utm_term", "utm_content", "yclid", "region" ]
        }
 
        mutate {
            join => { "uriQuery__utm_source" => "," }
            join => { "uriQuery__utm_medium" => "," }
            join => { "uriQuery__utm_campaign" => "," }
            join => { "uriQuery__utm_term" => "," }
            join => { "uriQuery__utm_content" => "," }
            join => { "uriQuery__yclid" => "," }
            join => { "uriQuery__region" => "," }
        }
 
}
 
output { 
  #stdout {codec => rubydebug}
    clickhouse {
      headers => ["Authorization", "Basic abcdsfks..."]
      http_hosts => ["http://127.0.0.1:8123"]
      save_dir => "/etc/logstash/tmp"
      table => "log_web"
      request_tolerance => 1
      flush_size => 10000
      idle_flush_time => 1
        mutations => {
            "fld_log_file_name" => "fld_log_file_name"
            "fld_server_name" => "fld_server_name"
            "fld_app_name" => "fld_app_name"
            "fld_app_module" => "fld_app_module"
            "fld_website_name" => "fld_website_name"
 
            "logdatetime" => "logdatetime"
            "logdate" => "logdate"
            "serverIP" => "serverIP"
            "method" => "method"
            "uriStem" => "uriStem"
            "uriQuery" => "uriQuery"
            "port" => "port"
            "username" => "username"
            "clientIP" => "clientIP"
            "clientRealIP" => "clientRealIP"
            "userAgent" => "userAgent"
            "referer" => "referer"
            "response" => "response"
            "subresponse" => "subresponse"
            "win32response" => "win32response"
            "timetaken" => "timetaken"
             
            "uriQuery__utm_medium" => "uriQuery__utm_medium"
            "uriQuery__utm_source" => "uriQuery__utm_source"
            "uriQuery__utm_campaign" => "uriQuery__utm_campaign"
            "uriQuery__utm_term" => "uriQuery__utm_term"
            "uriQuery__utm_content" => "uriQuery__utm_content"
            "uriQuery__yclid" => "uriQuery__yclid"
            "uriQuery__region" => "uriQuery__region"
        }
    }
 
}

pipelines.yml

# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
#   https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html
 
- pipeline.id: log_web__filebeat_clickhouse
  path.config: "/etc/logstash/log_web__filebeat_clickhouse.conf"

clickhouse. Log depo

Jounal pou tout sistèm yo estoke nan yon sèl tab (gade nan kòmansman atik la). Li gen entansyon pou estoke enfòmasyon sou demann: tout paramèt yo sanble pou fòma diferan, tankou mòso bwa IIS, apache ak mòso nginx. Pou mòso bwa aplikasyon, nan ki, pou egzanp, erè, mesaj enfòmasyon, avètisman yo anrejistre, yo pral bay yon tab separe ak estrikti ki apwopriye a (kounye a nan etap konsepsyon an).

Lè w ap desine yon tab, li trè enpòtan pou deside sou kle prensipal la (ki done yo pral klase pandan depo). Degre konpresyon done ak vitès demann depann sou sa a. Nan egzanp nou an, kle a se
LÒD BY (fld_app_name, fld_app_module, logdatetime)
Sa vle di, pa non sistèm lan, non eleman sistèm lan ak dat evènman an. Okòmansman, dat evènman an te vini an premye. Apre yo fin deplase li nan dènye plas la, demann yo te kòmanse travay apeprè de fwa pi vit. Chanje kle prensipal la pral mande pou rekreye tab la ak rechaje done yo pou ClickHouse re-sort done yo sou disk. Sa a se yon operasyon lou, kidonk li se yon bon lide yo reflechi anpil sou sa ki ta dwe enkli nan kle nan sòt.

Li ta dwe remake tou ke kalite done LowCardinality te parèt nan vèsyon relativman resan yo. Lè w ap itilize li, gwosè done konprese yo redwi drastikman pou jaden sa yo ki gen kardinalite ba (kèk opsyon).

Vèsyon 19.6 se kounye a nan itilize epi nou planifye pou eseye mete ajou ak dènye vèsyon an. Yo gen karakteristik bèl bagay tankou Adaptive Granularite, Sote endis ak Codec DoubleDelta a, pou egzanp.

Pa default, pandan enstalasyon an, se nivo anrejistreman an mete trase. Jounal yo wotasyon ak achiv, men an menm tan an yo elaji jiska yon gigaocte. Si pa gen okenn bezwen, Lè sa a, ou ka mete nivo avètisman an, Lè sa a, gwosè a nan boutèy la redwi drastikman. Anviwònman anrejistreman an mete nan fichye config.xml:

<!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger. h#L105 -->
<level>warning</level>

Kèk kòmandman itil

Поскольку оригинальные пакеты установки собираются по Debian, то для других версий Linux необходимо использовать пакеты собранные компанией Altinity.
 
Вот по этой ссылке есть инструкции с ссылками на их репозиторий: https://www.altinity.com/blog/2017/12/18/logstash-with-clickhouse
sudo yum search clickhouse-server
sudo yum install clickhouse-server.noarch
  
1. проверка статуса
sudo systemctl status clickhouse-server
 
2. остановка сервера
sudo systemctl stop clickhouse-server
 
3. запуск сервера
sudo systemctl start clickhouse-server
 
Запуск для выполнения запросов в многострочном режиме (выполнение после знака ";")
clickhouse-client --multiline
clickhouse-client --multiline --host 127.0.0.1 --password pa55w0rd
clickhouse-client --multiline --host 127.0.0.1 --port 9440 --secure --user default --password pa55w0rd
 
Плагин кликлауза для логстеш в случае ошибки в одной строке сохраняет всю пачку в файл /tmp/log_web_failed.json
Можно вручную исправить этот файл и попробовать залить его в БД вручную:
clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /tmp/log_web_failed__fixed.json
 
sudo mv /etc/logstash/tmp/log_web_failed.json /etc/logstash/tmp/log_web_failed__fixed.json
sudo chown user_dev /etc/logstash/tmp/log_web_failed__fixed.json
sudo clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /etc/logstash/tmp/log_web_failed__fixed.json
sudo mv /etc/logstash/tmp/log_web_failed__fixed.json /etc/logstash/tmp/log_web_failed__fixed_.json
 
выход из командной строки
quit;
## Настройка TLS
https://www.altinity.com/blog/2019/3/5/clickhouse-networking-part-2
 
openssl s_client -connect log.domain.com:9440 < /dev/null

logstash. Log routeur soti nan FileBeat nan keu RabbitMQ

Yo itilize eleman sa a pou mennen mòso bwa ki soti nan FileBeat nan keu RabbitMQ la. Gen de pwen isit la:

  1. Malerezman, FileBeat pa gen yon plugin pwodiksyon pou ekri dirèkteman nan RabbitMQ. Ak fonksyonalite sa yo, jije pa pwoblèm nan sou github yo, pa planifye pou aplikasyon. Gen yon plugin pou Kafka, men pou kèk rezon nou pa ka sèvi ak li lakay ou.
  2. Gen kondisyon pou kolekte mòso bwa nan DMZ la. Ki baze sou yo, mòso bwa yo dwe premye ajoute nan keu a epi Lè sa a, LogStash li antre yo soti nan keu a soti an deyò de la.

Se poutèt sa, li se pou ka a kote serveurs yo sitiye nan DMZ a ke yon moun gen yo sèvi ak tankou yon konplo yon ti kras konplike. Yon egzanp konfigirasyon sanble sa a:

iis_w3c_logs__filebeat_rabbitmq.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/pki/tls/certs/app/ca.pem", "/etc/pki/tls/certs/app/ca-issuing.pem"]
        ssl_certificate => "/etc/pki/tls/certs/app/queue.domain.com.cer"
        ssl_key => "/etc/pki/tls/certs/app/queue.domain.com-pkcs8.key"
        ssl_verify_mode => "peer"
    }
 
}
 
output { 
  #stdout {codec => rubydebug}
 
    rabbitmq {
        host => "127.0.0.1"
        port => 5672
        exchange => "monitor.direct"
        exchange_type => "direct"
        key => "%{[fields][fld_app_name]}"
        user => "q-writer"
        password => "password"
        ssl => false
    }
}

RabbitMQ. keu mesaj

Se eleman sa a itilize pou tampon antre nan DMZ la. Anrejistreman fèt atravè yon pakèt Filebeat → LogStash. Lekti fèt deyò DMZ atravè LogStash. Lè opere atravè RabboitMQ, apeprè 4 mil mesaj pou chak segonn yo trete.

Wout mesaj yo konfigirasyon pa non sistèm, sa vle di ki baze sou done konfigirasyon FileBeat. Tout mesaj ale nan yon sèl keu. Si pou kèk rezon sèvis la keuing sispann, Lè sa a, sa a pa pral mennen nan pèt la nan mesaj: FileBeats pral resevwa erè koneksyon epi tanporèman sispann voye. Ak LogStash ki li nan keu la ap resevwa tou erè rezo epi tann pou koneksyon an retabli. Nan ka sa a, done yo, nan kou, yo p ap ekri nan baz done a ankò.

Enstriksyon sa yo yo itilize pou kreye ak konfigirasyon ke moun kap kriye:

sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare exchange --vhost=/ name=monitor.direct type=direct sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare queue --vhost=/ name=web_log durable=true
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site1.domain.ru"
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site2.domain.ru"

Grafana. Dachboards

Yo itilize eleman sa a pou visualize done siveyans yo. Nan ka sa a, ou bezwen enstale sous done ClickHouse pou Plugin Grafana 4.6+. Nou te oblije ajiste li yon ti jan amelyore efikasite nan pwosesis filtè SQL sou tablodbò a.

Pou egzanp, nou itilize varyab, epi si yo pa mete nan jaden an filtre, Lè sa a, nou ta renmen li pa jenere yon kondisyon nan KOTE nan fòm nan ( uriStem = » AND uriStem != » ). Nan ka sa a, ClickHouse pral li kolòn uriStem la. An jeneral, nou te eseye opsyon diferan ak evantyèlman korije Plugin la ($valueIfEmpty makro a) pou ke nan ka a nan yon valè vid li retounen 1, san yo pa mansyone kolòn nan tèt li.

Epi kounye a ou ka itilize rechèch sa a pou graf la

$columns(response, count(*) c) from $table where $adhoc
and $valueIfEmpty($fld_app_name, 1, fld_app_name = '$fld_app_name')
and $valueIfEmpty($fld_app_module, 1, fld_app_module = '$fld_app_module') and $valueIfEmpty($fld_server_name, 1, fld_server_name = '$fld_server_name') and $valueIfEmpty($uriStem, 1, uriStem like '%$uriStem%')
and $valueIfEmpty($clientRealIP, 1, clientRealIP = '$clientRealIP')

ki tradwi SQL sa a (remake byen ke jaden uriStem vid yo te konvèti nan jis 1)

SELECT
t,
groupArray((response, c)) AS groupArr
FROM (
SELECT
(intDiv(toUInt32(logdatetime), 60) * 60) * 1000 AS t, response,
count(*) AS c FROM default.log_web
WHERE (logdate >= toDate(1565061982)) AND (logdatetime >= toDateTime(1565061982)) AND 1 AND (fld_app_name = 'site1.domain.ru') AND (fld_app_module = 'web') AND 1 AND 1 AND 1
GROUP BY
t, response
ORDER BY
t ASC,
response ASC
)
GROUP BY t ORDER BY t ASC

Konklizyon

Aparans baz done ClickHouse la vin tounen yon evènman bòn tè nan mache a. Li te difisil imajine ke, konplètman gratis, nan yon moman nou te ame ak yon zouti pwisan ak pratik pou travay ak done gwo. Natirèlman, ak bezwen ogmante (pa egzanp, sharding ak replikasyon sou plizyè serveurs), konplo a ap vin pi konplike. Men, sou premye enpresyon, travay ak baz done sa a se trè bèl. Li ka wè ke se pwodwi a te fè "pou moun."

Konpare ak ElasticSearch, yo estime pri pou estoke ak trete mòso bwa yo redwi a senk a dis fwa. Nan lòt mo, si pou kantite aktyèl la nan done nou ta dwe mete kanpe yon gwoup nan plizyè machin, Lè sa a, lè w ap itilize ClickHouse, yon machin ki ba-pouvwa ase pou nou. Wi, nan kou, ElasticSearch tou gen mekanis konpresyon done sou disk ak lòt karakteristik ki ka siyifikativman diminye konsomasyon resous, men konpare ak ClickHouse, sa a pral pi chè.

San okenn optimize espesyal sou pati nou an, sou anviwònman default, chaje done ak chwazi nan baz done a ap travay nan yon vitès etonan. Nou poko gen anpil done (apeprè 200 milyon dosye), men sèvè a li menm fèb. Nou ka itilize zouti sa a nan lavni pou lòt rezon ki pa gen rapò ak estoke mòso bwa. Pou egzanp, pou analiz bout-a-fen, nan domèn sekirite, aprantisaj machin.

Nan fen a, yon ti kras sou avantaj yo ak dezavantaj.

Kont

  1. Chaje dosye nan pakèt gwo. Sou yon bò, sa a se yon karakteristik, men ou toujou gen yo sèvi ak eleman adisyonèl pou dosye tanpon. Travay sa a pa toujou fasil, men li toujou ka rezoud. Apre sa, mwen ta renmen senplifye konplo a.
  2. Gen kèk fonksyonalite ekzotik oswa nouvo karakteristik souvan kraze nan nouvo vèsyon. Sa lakòz enkyetid, diminye dezi a ajou nan yon nouvo vèsyon. Pou egzanp, motè tab la Kafka se yon karakteristik trè itil ki pèmèt ou dirèkteman li evènman nan Kafka, san yo pa aplike konsomatè yo. Men, jije pa kantite Pwoblèm sou github la, nou toujou fè atansyon pou nou pa sèvi ak motè sa a nan pwodiksyon an. Sepandan, si ou pa fè jès toudenkou sou bò a epi sèvi ak fonksyonalite prensipal la, Lè sa a, li travay estab.

Pou

  1. Pa ralanti.
  2. Ba papòt antre.
  3. Sous louvri.
  4. Gratis.
  5. Echèl byen (sharding/replication soti nan bwat la)
  6. Ki enkli nan rejis la nan lojisyèl Ris rekòmande pa Ministè Kominikasyon an.
  7. Prezans nan sipò ofisyèl nan Yandex.

Sous: www.habr.com

Add nouvo kòmantè