ClickHouse dhatabhesi yevanhu, kana Alien matekinoroji

Alexey Lizunov, mukuru wenzvimbo yehunyanzvi yematanho ekure masevhisi eInformation Technology Directorate yeICB

ClickHouse dhatabhesi yevanhu, kana Alien matekinoroji

Seimwe nzira kune ELK stack (ElasticSearch, Logstash, Kibana), tiri kuita tsvakiridzo pakushandisa iyo ClickHouse dhatabhesi sedata rekuchengetedza matanda.

Muchinyorwa chino tinoda kutaura nezve chiitiko chedu tichishandisa iyo ClickHouse dhatabhesi uye yekutanga mhedzisiro kubva mutyairi kushanda. Zvakakosha kucherechedza pakarepo kuti mhedzisiro yacho yakanakisa.


ClickHouse dhatabhesi yevanhu, kana Alien matekinoroji

Tevere tichatsanangura zvakadzama kuti sisitimu yedu inogadziriswa sei uye kuti ine zvikamu zvipi. Asi ikozvino ndinoda kutaura zvishoma nezve dhatabhesi iri rose, uye nei zvakakodzera kutarisisa. Iyo ClickHouse dhatabhesi ndeyepamusoro-inoshanda analytical columnar dhatabhesi kubva kuYandex. Inoshandiswa muYandex masevhisi, pakutanga iyi ndiyo huru yekuchengetedza data yeYandex.Metrica. Open-source system, yemahara. Kubva pakuona kwemugadziri, ndaigara ndichinetseka kuti vakazviita sei izvi, nekuti kune data rakakura kwazvo. Uye iyo Metrica mushandisi interface pachayo inoshanduka uye inoshanda nekukurumidza. Paunotanga kujairana neiyi dhatabhesi, unowana fungidziro: “Zvakanaka, pakupedzisira! Yakagadzirirwa "vanhu"! Kubva pakugadzwa kusvika pakutumira zvikumbiro."

Iyi dhatabhesi ine yakaderera yekupinda chipinganidzo. Kunyangwe mugadziri wepakati anogona kuisa iyi dhatabhesi mumaminetsi mashoma uye otanga kuishandisa. Zvose zvinoshanda nemazvo. Kunyangwe vanhu vatsva kuLinux vanogona kukurumidza kubata nekuisirwa uye kuita mashandiro ari nyore. Kana pakutanga, pakunzwa mazwi ekuti Big Data, Hadoop, Google BigTable, HDFS, mugadziri wepakati aive nepfungwa yekuti vaitaura nezve mamwe ma terabytes, petabytes, kuti mamwe ma superhumans akabatanidzwa mukugadzira nekugadzira masisitimu aya, ipapo nekuuya. yeClickHouse dhatabhesi takawana chishandiso chakareruka, chinonzwisisika chaunogona kugadzirisa naro dambudziko risingawanikwe. Zvese zvinotora mumwe chete avhareji muchina uye maminetsi mashanu kuisa. Ndiko kuti, isu tine dhatabhesi senge, semuenzaniso, MySql, asi chete yekuchengetedza mabhiriyoni emarekodhi! Mhando ye superarchiver ine SQL mutauro. Zvakafanana nevanhu vakapihwa zvombo zvevatorwa.

Nezve yedu log yekuunganidza system

Kuti utore ruzivo, IIS log mafaera ewebhu maapplication eyakajairwa fomati anoshandiswa (isu tiri kuitawo kufambisa matanda ekunyorera, asi chinangwa chedu chikuru padanho rekutyaira kuunganidza matanda eIIS).

Hatina kukwanisa kusiya zvachose ELK stack nokuda kwezvikonzero zvakasiyana-siyana, uye isu tinoenderera mberi nekushandisa LogStash uye Filebeat zvikamu, izvo zvakazviratidza zvakanaka uye zvinoshanda zvakavimbika uye zvinofanotaura.

Iyo general logging scheme inoratidzwa mumufananidzo uri pazasi:

ClickHouse dhatabhesi yevanhu, kana Alien matekinoroji

Chinhu chekurekodha data muClickHouse dhatabhesi ndiyo isingawanzo (kamwe pasekondi) kuisa marekodhi mumabheji makuru. Izvi, sezviri pachena, ndiyo yakanyanya "dambudziko" chikamu chaunosangana nacho paunenge uchishanda neClickHouse dhatabhesi kekutanga: chirongwa chinova chakaoma zvishoma.
Iyo plugin yeLogStash, iyo inoisa zvakananga data muClickHouse, yakabatsira zvakanyanya pano. Ichi chikamu chinoiswa pane imwechete server se database pachayo. Saka, kazhinji kutaura, hazvikurudzirwe kuita izvi, asi kubva pamaonero anoshanda, kuti urege kugadzira maseva akaparadzana paanenge achiiswa pane imwechete sevha. Hatina kuona chero kutadza kana kusawirirana kwezviwanikwa nedatabase. Mukuwedzera, zvinofanirwa kucherechedzwa kuti plugin ine retray michina kana paine zvikanganiso. Uye kana paine zvikanganiso, iyo plugin inonyora kudhisiki batch yedata yaisakwanisa kuiswa (iyo faira fomati iri nyore: mushure mekugadzirisa, unogona kuisa nyore batch yakagadziriswa uchishandisa clickhouse-client).

Rondedzero yakazara yesoftware inoshandiswa muchirongwa inoratidzwa mutafura:

Rondedzero yesoftware yakashandiswa

Title

tsananguro

Link kugovera

NGINX

Reverse-proxy yekurambidza kupinda nechiteshi uye kuronga mvumo

Parizvino haisati yashandiswa muchirongwa

https://nginx.org/ru/download.html

https://nginx.org/download/nginx-1.16.0.tar.gz

FileBeat

Kuendeswa kwefaira matanda.

https://www.elastic.co/downloads/beats/filebeat (kugovera kweWindows 64bit).

https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.3.0-windows-x86_64.zip

LogStash

Log collector.

Inoshandiswa kuunganidza matanda kubva FileBeat, pamwe nekuunganidza matanda kubva kuRabbitMQ mutsara (yemaseva ari muDMZ.)

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

Logstash- yakabuda- clickhouse

Loagstash plugin yekuendesa matanda kuClickHouse dhatabhesi mumabhechi

https://github.com/mikechris/logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin isa logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin isa logstash-filter-prune

/usr/share/logstash/bin/logstash-plugin isa logstash-filter-multiline

DzvanyaImba

Log storage https://clickhouse.yandex/docs/ru/

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-server-19.5.3.8-1.el7.x86_64.rpm

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-client-19.5.3.8-1.el7.x86_64.rpm

Cherechedza. Kutanga kubva Nyamavhuvhu 2018, "zvakajairika" rpm inovaka yeRHEL yakaonekwa muYandex repository, saka unogona kuedza kuvashandisa. Panguva yekuiswa isu taishandisa mapakeji akaunganidzwa neAltinity.

grafana

Kuona matanda. Kugadzira dashboards

https://grafana.com/

https://grafana.com/grafana/download

Redhat & Centos (64 Bit) - yazvino vhezheni

ClickHouse datasource yeGrafana 4.6+

Plugin yeGrafana ine ClickHouse data sosi

https://grafana.com/plugins/vertamedia-clickhouse-datasource

https://grafana.com/api/plugins/vertamedia-clickhouse-datasource/versions/1.8.1/download

LogStash

Log router kubva kuFayileBeat kuenda kuRabbitMQ mutsara.

Cherechedza. Nehurombo FileBeat haina kuburitsa zvakananga kuRabbitMQ, saka chinongedzo chepakati muchimiro cheLogstash chinodiwa.

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

TsuroMQ

Mutsara wemharidzo. Iyi ibhefa yekupinda kwelogi muDMZ

https://www.rabbitmq.com/download.html

https://github.com/rabbitmq/rabbitmq-server/releases/download/v3.7.14/rabbitmq-server-3.7.14-1.el7.noarch.rpm

Erlang Runtime (Inodiwa kune RabbitMQ)

Erlang nguva yekumhanya. Inodiwa kuti RabbitMQ ishande

http://www.erlang.org/download.html

https://www.rabbitmq.com/install-rpm.html#install-erlang http://www.erlang.org/downloads/21.3

Iyo server kumisikidzwa neClickHouse dhatabhesi inoratidzwa mune inotevera tafura:

Title

ukoshi

taura pfungwa

Kugadziriswa

HDD: 40GB
RAM: 8GB
processor: Core 2 2Ghz

Iwe unofanirwa kuterera kune matipi ekushandisa iyo ClickHouse database (https://clickhouse.yandex/docs/ru/operations/tips/)

System-wide software

OS: Red Hat Enterprise Linux Server (Maipo)

JRE (Java 8)

 

Sezvauri kuona, iyi inguva yekushanda.

Chimiro chetafura chekuchengeta matanda ndeichi:

log_web.sql

CREATE TABLE log_web (
  logdate Date,
  logdatetime DateTime CODEC(Delta, LZ4HC),
   
  fld_log_file_name LowCardinality( String ),
  fld_server_name LowCardinality( String ),
  fld_app_name LowCardinality( String ),
  fld_app_module LowCardinality( String ),
  fld_website_name LowCardinality( String ),
 
  serverIP LowCardinality( String ),
  method LowCardinality( String ),
  uriStem String,
  uriQuery String,
  port UInt32,
  username LowCardinality( String ),
  clientIP String,
  clientRealIP String,
  userAgent String,
  referer String,
  response String,
  subresponse String,
  win32response String,
  timetaken UInt64
   
  , uriQuery__utm_medium String
  , uriQuery__utm_source String
  , uriQuery__utm_campaign String
  , uriQuery__utm_term String
  , uriQuery__utm_content String
  , uriQuery__yclid String
  , uriQuery__region String
 
) Engine = MergeTree()
PARTITION BY toYYYYMM(logdate)
ORDER BY (fld_app_name, fld_app_module, logdatetime)
SETTINGS index_granularity = 8192;

Isu tinoshandisa default kukosha kwekugovera (pamwedzi) uye index granularity. Minda yese inowirirana neIIS log entries yekurekodha http zvikumbiro. Takaparadzana, tinocherekedza kuti kune minda yakaparadzana yekuchengetera utm tag (iwo akapatsanurwa padanho rekupinza mutafura kubva kumunda wetambo yemibvunzo).

Zvakare, akati wandei masisitimu akawedzerwa patafura kuchengetedza ruzivo nezve masisitimu, zvikamu, uye maseva. Kuti uwane tsananguro yeminda iyi, ona tafura iri pazasi. Mune imwe tafura tinochengeta matanda kune akati wandei masisitimu.

Title

tsananguro

Muenzaniso:

fld_app_name

Zita rekushandisa/system
Hunhu hunoshanda:

  • site1.domain.com Yekunze saiti 1
  • site2.domain.com Yekunze saiti 2
  • internal-site1.domain.local Yemukati saiti 1

site1.domain.com

fld_app_module

System module
Hunhu hunoshanda:

  • dandemutande - Webhusaiti
  • svc - Webhusaiti sevhisi
  • intgr - Webhu yekubatanidza sevhisi
  • bo - Administrator (BackOffice)

dandemutande

fld_website_zita

Zita resaiti muIIS

Masystem akati wandei anogona kuiswa pane imwe sevha, kana kunyange akati wandei eimwe system module

web-main

fld_server_name

Zita reseva

web1.domain.com

fld_log_file_name

Nzira yekupinda faira pane server

Kubva: inetpublogsLogFiles
W3SVC1u_ex190711.log

Izvi zvinokutendera kuti ugone kuvaka magirafu muGrafana. Semuenzaniso, tarisa zvikumbiro kubva kumberi kweimwe system. Izvi zvakafanana neiyo saiti counter muYandex.Metrica.

Heano mamwe manhamba ekushandiswa kwe database kwemwedzi miviri.

Nhamba yezvinyorwa nehurongwa uye chikamu

SELECT
    fld_app_name,
    fld_app_module,
    count(fld_app_name) AS rows_count
FROM log_web
GROUP BY
    fld_app_name,
    fld_app_module
    WITH TOTALS
ORDER BY
    fld_app_name ASC,
    rows_count DESC
 
┌─fld_app_name─────┬─fld_app_module─┬─rows_count─┐
│ site1.domain.ru  │ web            │     131441 │
│ site2.domain.ru  │ web            │    1751081 │
│ site3.domain.ru  │ web            │  106887543 │
│ site3.domain.ru  │ svc            │   44908603 │
│ site3.domain.ru  │ intgr          │    9813911 │
│ site4.domain.ru  │ web            │     772095 │
│ site5.domain.ru  │ web            │   17037221 │
│ site5.domain.ru  │ intgr          │     838559 │
│ site5.domain.ru  │ bo             │       7404 │
│ site6.domain.ru  │ web            │     595877 │
│ site7.domain.ru  │ web            │   27778858 │
└──────────────────┴────────────────┴────────────┘
 
Totals:
┌─fld_app_name─┬─fld_app_module─┬─rows_count─┐
│              │                │  210522593 │
└──────────────┴────────────────┴────────────┘
 
11 rows in set. Elapsed: 4.874 sec. Processed 210.52 million rows, 421.67 MB (43.19 million rows/s., 86.51 MB/s.)

Disk data volume

SELECT
    formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed,
    formatReadableSize(sum(data_compressed_bytes)) AS compressed,
    sum(rows) AS total_rows
FROM system.parts
WHERE table = 'log_web'
 
┌─uncompressed─┬─compressed─┬─total_rows─┐
│ 54.50 GiB    │ 4.86 GiB   │  211427094 │
└──────────────┴────────────┴────────────┘
 
1 rows in set. Elapsed: 0.035 sec.

Column data compression ratio

SELECT
    name,
    formatReadableSize(data_uncompressed_bytes) AS uncompressed,
    formatReadableSize(data_compressed_bytes) AS compressed,
    data_uncompressed_bytes / data_compressed_bytes AS compress_ratio
FROM system.columns
WHERE table = 'log_web'
 
┌─name───────────────────┬─uncompressed─┬─compressed─┬─────compress_ratio─┐
│ logdate                │ 401.53 MiB   │ 1.80 MiB   │ 223.16665968777315 │
│ logdatetime            │ 803.06 MiB   │ 35.91 MiB  │ 22.363966401202305 │
│ fld_log_file_name      │ 220.66 MiB   │ 2.60 MiB   │  84.99905736932571 │
│ fld_server_name        │ 201.54 MiB   │ 50.63 MiB  │  3.980924816977078 │
│ fld_app_name           │ 201.17 MiB   │ 969.17 KiB │ 212.55518183686877 │
│ fld_app_module         │ 201.17 MiB   │ 968.60 KiB │ 212.67805817411906 │
│ fld_website_name       │ 201.54 MiB   │ 1.24 MiB   │  162.7204926761546 │
│ serverIP               │ 201.54 MiB   │ 50.25 MiB  │  4.010824061219731 │
│ method                 │ 201.53 MiB   │ 43.64 MiB  │  4.617721053304486 │
│ uriStem                │ 5.13 GiB     │ 832.51 MiB │  6.311522291936919 │
│ uriQuery               │ 2.58 GiB     │ 501.06 MiB │  5.269731450124478 │
│ port                   │ 803.06 MiB   │ 3.98 MiB   │ 201.91673864241824 │
│ username               │ 318.08 MiB   │ 26.93 MiB  │ 11.812513794583598 │
│ clientIP               │ 2.35 GiB     │ 82.59 MiB  │ 29.132328640073343 │
│ clientRealIP           │ 2.49 GiB     │ 465.05 MiB │  5.478382297052563 │
│ userAgent              │ 18.34 GiB    │ 764.08 MiB │  24.57905114484208 │
│ referer                │ 14.71 GiB    │ 1.37 GiB   │ 10.736792723669906 │
│ response               │ 803.06 MiB   │ 83.81 MiB  │  9.582334090987247 │
│ subresponse            │ 399.87 MiB   │ 1.83 MiB   │  218.4831068635027 │
│ win32response          │ 407.86 MiB   │ 7.41 MiB   │ 55.050315514606815 │
│ timetaken              │ 1.57 GiB     │ 402.06 MiB │ 3.9947395692010637 │
│ uriQuery__utm_medium   │ 208.17 MiB   │ 12.29 MiB  │ 16.936148912472955 │
│ uriQuery__utm_source   │ 215.18 MiB   │ 13.00 MiB  │ 16.548367623199912 │
│ uriQuery__utm_campaign │ 381.46 MiB   │ 37.94 MiB  │ 10.055156353418509 │
│ uriQuery__utm_term     │ 231.82 MiB   │ 10.78 MiB  │ 21.502540454070672 │
│ uriQuery__utm_content  │ 441.34 MiB   │ 87.60 MiB  │  5.038260760449327 │
│ uriQuery__yclid        │ 216.88 MiB   │ 16.58 MiB  │  13.07721335008116 │
│ uriQuery__region       │ 204.35 MiB   │ 9.49 MiB   │  21.52661903446796 │
└────────────────────────┴──────────────┴────────────┴────────────────────┘
 
28 rows in set. Elapsed: 0.005 sec.

Tsanangudzo yezvikamu zvakashandiswa

FileBeat. Kutamisa mafaira matanda

Ichi chikamu chinotarisisa chinoshanduka kuisa mafaera padhisiki uye chinopfuudza ruzivo kuLogStash. Yakaiswa pamaseva ese anonyorwa mafaira elogi (kazhinji IIS). Inoshanda mumuswe modhi (kureva, inoendesa chete akawedzera marekodhi kufaira). Asi iwe unogona kuzvigadzirisa zvakasiyana kuendesa mafaera ese. Izvi zviri nyore kana iwe uchida kudhawunirodha data yemwedzi yapfuura. Ingoisa faira regi muforodha uye rinoriverenga rose.

Kana sevhisi yamira, data inomira kuendeswa mberi kune yekuchengetedza.

Muenzaniso configuration inoita seizvi:

filebeat.yml

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/W3SVC1/*.log
  exclude_files: ['.gz$','.zip$']
  tail_files: true
  ignore_older: 24h
  fields:
    fld_server_name: "site1.domain.ru"
    fld_app_name: "site1.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
 
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/__Import/access_log-*
  exclude_files: ['.gz$','.zip$']
  tail_files: false
  fields:
    fld_server_name: "site2.domain.ru"
    fld_app_name: "site2.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
    fld_logformat: "logformat__apache"
 
 
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
  reload.period: 2s
 
output.logstash:
  hosts: ["log.domain.com:5044"]
 
  ssl.enabled: true
  ssl.certificate_authorities: ["C:/filebeat/certs/ca.pem", "C:/filebeat/certs/ca-issuing.pem"]
  ssl.certificate: "C:/filebeat/certs/site1.domain.ru.cer"
  ssl.key: "C:/filebeat/certs/site1.domain.ru.key"
 
#================================ Processors =====================================
 
processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~

LogStash. Log Collector

Ichi chikamu chakagadzirirwa kugashira marekodhi kubva kuFayileBeat (kana kuburikidza neRabbitMQ mutsara), tsanangura uye uzviise mumabhechi muClickHouse dhatabhesi.

Kuisa muClickHouse, shandisa Logstash-output-clickhouse plugin. Iyo Logstash plugin ine maitiro ekudzoreredza zvikumbiro, asi panguva yekuvhara nguva dzose, zviri nani kumisa sevhisi pachayo. Kana yakamira, mameseji achaunganidza mumutsara weRabbitMQ, saka kana kumira kuri kwenguva yakareba, saka zviri nani kumisa Filebeats pamaseva. Muchirongwa apo RabbitMQ isingashandiswi (panzvimbo yemunharaunda Filebeat inotumira zvakananga matanda kuLogstash), Filebeats inoshanda zvakagamuchirwa uye yakachengeteka, saka kwavari kusavapo kwekubuda hakuna migumisiro.

Muenzaniso configuration inoita seizvi:

log_web__filebeat_clickhouse.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/logstash/certs/ca.cer", "/etc/logstash/certs/ca-issuing.cer"]
        ssl_certificate => "/etc/logstash/certs/server.cer"
        ssl_key => "/etc/logstash/certs/server-pkcs8.key"
        ssl_verify_mode => "peer"
 
            add_field => {
                "fld_server_name" => "%{[fields][fld_server_name]}"
                "fld_app_name" => "%{[fields][fld_app_name]}"
                "fld_app_module" => "%{[fields][fld_app_module]}"
                "fld_website_name" => "%{[fields][fld_website_name]}"
                "fld_log_file_name" => "%{source}"
                "fld_logformat" => "%{[fields][fld_logformat]}"
            }
    }
 
    rabbitmq {
        host => "queue.domain.com"
        port => 5671
        user => "q-reader"
        password => "password"
        queue => "web_log"
        heartbeat => 30
        durable => true
        ssl => true
        #ssl_certificate_path => "/etc/logstash/certs/server.p12"
        #ssl_certificate_password => "password"
 
        add_field => {
            "fld_server_name" => "%{[fields][fld_server_name]}"
            "fld_app_name" => "%{[fields][fld_app_name]}"
            "fld_app_module" => "%{[fields][fld_app_module]}"
            "fld_website_name" => "%{[fields][fld_website_name]}"
            "fld_log_file_name" => "%{source}"
            "fld_logformat" => "%{[fields][fld_logformat]}"
        }
    }
 
}
 
filter { 
 
      if [message] =~ "^#" {
        drop {}
      }
 
      if [fld_logformat] == "logformat__iis_with_xrealip" {
     
          grok {
            match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken} %{NOTSPACE:xrealIP} %{NOTSPACE:xforwarderfor}"]
          }
      } else {
   
          grok {
             match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken}"]
          }
 
      }
 
      date {
        match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
          timezone => "Etc/UTC"
        remove_field => [ "log_timestamp", "@timestamp" ]
        target => [ "log_timestamp2" ]
      }
 
        ruby {
            code => "tstamp = event.get('log_timestamp2').to_i
                        event.set('logdatetime', Time.at(tstamp).strftime('%Y-%m-%d %H:%M:%S'))
                        event.set('logdate', Time.at(tstamp).strftime('%Y-%m-%d'))"
        }
 
      if [bytesSent] {
        ruby {
          code => "event['kilobytesSent'] = event['bytesSent'].to_i / 1024.0"
        }
      }
 
 
      if [bytesReceived] {
        ruby {
          code => "event['kilobytesReceived'] = event['bytesReceived'].to_i / 1024.0"
        }
      }
 
   
        ruby {
            code => "event.set('clientRealIP', event.get('clientIP'))"
        }
        if [xrealIP] {
            ruby {
                code => "event.set('clientRealIP', event.get('xrealIP'))"
            }
        }
        if [xforwarderfor] {
            ruby {
                code => "event.set('clientRealIP', event.get('xforwarderfor'))"
            }
        }
 
      mutate {
        convert => ["bytesSent", "integer"]
        convert => ["bytesReceived", "integer"]
        convert => ["timetaken", "integer"] 
        convert => ["port", "integer"]
 
        add_field => {
            "clientHostname" => "%{clientIP}"
        }
      }
 
        useragent {
            source=> "useragent"
            prefix=> "browser"
        }
 
        kv {
            source => "uriQuery"
            prefix => "uriQuery__"
            allow_duplicate_values => false
            field_split => "&"
            include_keys => [ "utm_medium", "utm_source", "utm_campaign", "utm_term", "utm_content", "yclid", "region" ]
        }
 
        mutate {
            join => { "uriQuery__utm_source" => "," }
            join => { "uriQuery__utm_medium" => "," }
            join => { "uriQuery__utm_campaign" => "," }
            join => { "uriQuery__utm_term" => "," }
            join => { "uriQuery__utm_content" => "," }
            join => { "uriQuery__yclid" => "," }
            join => { "uriQuery__region" => "," }
        }
 
}
 
output { 
  #stdout {codec => rubydebug}
    clickhouse {
      headers => ["Authorization", "Basic abcdsfks..."]
      http_hosts => ["http://127.0.0.1:8123"]
      save_dir => "/etc/logstash/tmp"
      table => "log_web"
      request_tolerance => 1
      flush_size => 10000
      idle_flush_time => 1
        mutations => {
            "fld_log_file_name" => "fld_log_file_name"
            "fld_server_name" => "fld_server_name"
            "fld_app_name" => "fld_app_name"
            "fld_app_module" => "fld_app_module"
            "fld_website_name" => "fld_website_name"
 
            "logdatetime" => "logdatetime"
            "logdate" => "logdate"
            "serverIP" => "serverIP"
            "method" => "method"
            "uriStem" => "uriStem"
            "uriQuery" => "uriQuery"
            "port" => "port"
            "username" => "username"
            "clientIP" => "clientIP"
            "clientRealIP" => "clientRealIP"
            "userAgent" => "userAgent"
            "referer" => "referer"
            "response" => "response"
            "subresponse" => "subresponse"
            "win32response" => "win32response"
            "timetaken" => "timetaken"
             
            "uriQuery__utm_medium" => "uriQuery__utm_medium"
            "uriQuery__utm_source" => "uriQuery__utm_source"
            "uriQuery__utm_campaign" => "uriQuery__utm_campaign"
            "uriQuery__utm_term" => "uriQuery__utm_term"
            "uriQuery__utm_content" => "uriQuery__utm_content"
            "uriQuery__yclid" => "uriQuery__yclid"
            "uriQuery__region" => "uriQuery__region"
        }
    }
 
}

pipelines.yml

# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
#   https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html
 
- pipeline.id: log_web__filebeat_clickhouse
  path.config: "/etc/logstash/log_web__filebeat_clickhouse.conf"

ClickHouse. Log storage

Marogi eese masisitimu anochengetwa mune imwe tafura (ona pakutanga kwechinyorwa). Yakagadzirirwa kuchengetedza ruzivo nezve zvikumbiro: ese ma parameter akafanana kune akasiyana mafomati, semuenzaniso IIS matanda, apache uye nginx matanda. Kune matanda ekushandisa umo, semuenzaniso, zvikanganiso, meseji yeruzivo, yambiro yakanyorwa, tafura yakaparadzana ichapihwa iyo yakakodzera chimiro (ikozvino padanho rekugadzira).

Paunenge uchigadzira tafura, zvakakosha kwazvo kusarudza pane kiyi yekutanga (iyo iyo data ichagadziriswa panguva yekuchengetedza). Iyo dhigirii yekumanikidza data uye kumhanya kwemubvunzo zvinoenderana neizvi. Mumuenzaniso wedu, kiyi ndeye
ORDER BY (fld_app_name, fld_app_module, logdatetime)
Ndiko kuti, nezita rehurongwa, zita rechikamu chegadziriro uye zuva rechiitiko. Pakutanga, zuva rechiitiko rakauya pakutanga. Mushure mekuiendesa kunzvimbo yekupedzisira, mibvunzo yakatanga kushanda zvakapetwa kaviri nekukurumidza. Kuchinja kiyi yekutanga kunoda kugadzira zvakare tafura uye kurodha zvakare iyo data kuitira kuti ClickHouse igadzirise zvakare data padhisiki. Uku kuvhiya kwakaoma, saka zvinokurudzirwa kuti ufungisise pamberi pezvinofanira kuiswa mukiyi yemhando.

Izvo zvinofanirwa kucherechedzwa kuti iyo LowCardinality data yemhando yakaonekwa mune ichangoburwa shanduro. Paunenge uchiishandisa, saizi ye data yakamanikidzwa inodzikiswa zvakanyanya kune iyo minda ine yakaderera kadhini (mashoma sarudzo).

Parizvino tiri kushandisa vhezheni 19.6 uye tinoronga kuedza kugadzirisa kune yazvino vhezheni. Vane zvinhu zvinoshamisa zvakadai seAdaptive Granularity, Skipping indices uye DoubleDelta codec, semuenzaniso.

Nekumisikidza, panguva yekuisa iyo yekumisikidza yekutema nhanho inoiswa kuti itevedze. Matanda anotenderedzwa uye akachengetwa, asi panguva imwechete anowedzera kusvika kune gigabyte. Kana pasina kudiwa, saka iwe unogona kuseta nhanho kune yambiro, ipapo saizi yelogi inoderera zvakanyanya. Zvirongwa zvekutema zvinotsanangurwa mufaira reconfig.xml:

<!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger. h#L105 -->
<level>warning</level>

Mimwe mirairo inobatsira

Поскольку оригинальные пакеты установки собираются по Debian, то для других версий Linux необходимо использовать пакеты собранные компанией Altinity.
 
Вот по этой ссылке есть инструкции с ссылками на их репозиторий: https://www.altinity.com/blog/2017/12/18/logstash-with-clickhouse
sudo yum search clickhouse-server
sudo yum install clickhouse-server.noarch
  
1. проверка статуса
sudo systemctl status clickhouse-server
 
2. остановка сервера
sudo systemctl stop clickhouse-server
 
3. запуск сервера
sudo systemctl start clickhouse-server
 
Запуск для выполнения запросов в многострочном режиме (выполнение после знака ";")
clickhouse-client --multiline
clickhouse-client --multiline --host 127.0.0.1 --password pa55w0rd
clickhouse-client --multiline --host 127.0.0.1 --port 9440 --secure --user default --password pa55w0rd
 
Плагин кликлауза для логстеш в случае ошибки в одной строке сохраняет всю пачку в файл /tmp/log_web_failed.json
Можно вручную исправить этот файл и попробовать залить его в БД вручную:
clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /tmp/log_web_failed__fixed.json
 
sudo mv /etc/logstash/tmp/log_web_failed.json /etc/logstash/tmp/log_web_failed__fixed.json
sudo chown user_dev /etc/logstash/tmp/log_web_failed__fixed.json
sudo clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /etc/logstash/tmp/log_web_failed__fixed.json
sudo mv /etc/logstash/tmp/log_web_failed__fixed.json /etc/logstash/tmp/log_web_failed__fixed_.json
 
выход из командной строки
quit;
## Настройка TLS
https://www.altinity.com/blog/2019/3/5/clickhouse-networking-part-2
 
openssl s_client -connect log.domain.com:9440 < /dev/null

LogStash. Log router kubva kuFayileBeat kuenda kuRabbitMQ mutsara

Ichi chikamu chinoshandiswa kuendesa matanda kubva kuFayileBeat kuenda kuRabbitMQ mutsara. Pane mapoinzi maviri pano:

  1. Nehurombo, FileBeat haina inobuda plugin yekunyora zvakananga kuRabbitMQ. Uye kushanda kwakadaro, kutonga nepositi pa github yavo, hakuna kurongwa kuti ishandiswe. Pane plugin yeKafka, asi nekuda kwezvimwe zvikonzero isu hatigone kuishandisa isu pachedu.
  2. Pane zvinodiwa pakuunganidza matanda muDMZ. Zvichibva pazviri, matanda anofanirwa kutanga aiswa mumutsara uyezve LogStash inoverenga marekodhi kubva pamutsetse kunze.

Naizvozvo, kunyanya kune nyaya yemaseva ari muDMZ, zvinodikanwa kushandisa chirongwa chakadai chakaoma. Muenzaniso configuration inoita seizvi:

iis_w3c_logs__filebeat_rabbitmq.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/pki/tls/certs/app/ca.pem", "/etc/pki/tls/certs/app/ca-issuing.pem"]
        ssl_certificate => "/etc/pki/tls/certs/app/queue.domain.com.cer"
        ssl_key => "/etc/pki/tls/certs/app/queue.domain.com-pkcs8.key"
        ssl_verify_mode => "peer"
    }
 
}
 
output { 
  #stdout {codec => rubydebug}
 
    rabbitmq {
        host => "127.0.0.1"
        port => 5672
        exchange => "monitor.direct"
        exchange_type => "direct"
        key => "%{[fields][fld_app_name]}"
        user => "q-writer"
        password => "password"
        ssl => false
    }
}

RabbitMQ. Mutsara weMharidzo

Ichi chikamu chinoshandiswa kubhafa zvinyorwa muDMZ. Kurekodha kunoitwa kuburikidza neFayilebeat → LogStash link. Kuverenga kunoitwa kubva kunze kweDMZ kuburikidza neLogStash. Kana uchishanda kuburikidza neRabbitMQ, anenge zviuru zvina zvemashoko pasekondi anogadziriswa.

Mharidzo yekutumira inogadziriswa nezita rehurongwa, kureva, kubva pane FileBeat configuration data. Mameseji ese anoenda mumutsara mumwe. Kana nekuda kwechimwe chikonzero iyo queuing sevhisi yakamiswa, izvi hazvizotungamira mukurasikirwa meseji: FileBeats ichagamuchira zvikanganiso zvekubatanidza uye ichamira kutumira kwenguva pfupi. Uye LogStash, iyo inoverenga kubva kumutsara, ichagamuchirawo zvikanganiso zvetiweki uye kumirira kuti kubatana kudzoserwe. Muchiitiko ichi, hongu, iyo data haichazonyorwi kune database.

Mirayiridzo inotevera inoshandiswa kugadzira nekugadzirisa mitsetse:

sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare exchange --vhost=/ name=monitor.direct type=direct sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare queue --vhost=/ name=web_log durable=true
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site1.domain.ru"
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site2.domain.ru"

Grafana. Dashboards

Ichi chikamu chinoshandiswa kuona data yekutarisa. Muchiitiko ichi, unofanirwa kuisa iyo ClickHouse datasource yeGrafana 4.6+ plugin. Isu taifanira kuigadzirisa zvishoma kuti tivandudze mashandiro ekugadzirisa mafirita eSQL pane dashboard.

Semuenzaniso, isu tinoshandisa zvinoshanduka, uye kana zvisina kutsanangurwa mumunda wesefa, saka tinoda kuti irege kuburitsa mamiriro muWHERE yefomu ( uriStem = "UYE uriStem!= "). Muchiitiko ichi, ClickHouse ichaverenga uriStem column. Saka, takaedza sarudzo dzakasiyana uye pakupedzisira takagadzirisa plugin (iyo $ valueIfEmpty macro) kudzorera 1 kana paine kukosha kusina chinhu, pasina kutaura iyo column pachayo.

Uye ikozvino unogona kushandisa mubvunzo uyu kune girafu

$columns(response, count(*) c) from $table where $adhoc
and $valueIfEmpty($fld_app_name, 1, fld_app_name = '$fld_app_name')
and $valueIfEmpty($fld_app_module, 1, fld_app_module = '$fld_app_module') and $valueIfEmpty($fld_server_name, 1, fld_server_name = '$fld_server_name') and $valueIfEmpty($uriStem, 1, uriStem like '%$uriStem%')
and $valueIfEmpty($clientRealIP, 1, clientRealIP = '$clientRealIP')

iyo inoshandurwa kuita SQL seizvi (ona kuti minda isina uriStem inoshandurwa kuita 1 chete)

SELECT
t,
groupArray((response, c)) AS groupArr
FROM (
SELECT
(intDiv(toUInt32(logdatetime), 60) * 60) * 1000 AS t, response,
count(*) AS c FROM default.log_web
WHERE (logdate >= toDate(1565061982)) AND (logdatetime >= toDateTime(1565061982)) AND 1 AND (fld_app_name = 'site1.domain.ru') AND (fld_app_module = 'web') AND 1 AND 1 AND 1
GROUP BY
t, response
ORDER BY
t ASC,
response ASC
)
GROUP BY t ORDER BY t ASC

mhedziso

Chitarisiko cheClickHouse dhatabhesi chave chiitiko chakakosha pamusika. Zvakanga zvakaoma kufungidzira kuti nekukasira, pasina muripo, takange takashongedzerwa nechishandiso chine simba uye chinoshanda chekushanda nedata hombe. Ehe, sezvo zvinodiwa zvichiwedzera (semuenzaniso, sharding uye kudzokorora kune akawanda maseva), chirongwa chinozonyanya kuoma. Asi maererano nekutanga kufungidzira, kushanda nedatabase iyi kunofadza zvikuru. Zviri pachena kuti chigadzirwa chinogadzirwa "kuvanhu".

Kuenzaniswa neElasticSearch, mutengo wekuchengeta nekugadzirisa matanda, zvinoenderana nefungidziro yekutanga, inoderedzwa nekashanu kusvika kagumi. Mune mamwe mazwi, kana kune yazvino vhoriyamu yedata taizofanira kuseta sumbu remichina akati wandei, saka kana tichishandisa ClickHouse isu tinongoda mushini mumwe wakaderera-simba. Hongu, hongu, ElasticSearch zvakare ine pa-dhisiki data compression masisitimu uye mamwe maficha anogona zvakanyanya kuderedza kushandiswa kwezviwanikwa, asi zvichienzaniswa neClickHouse izvi zvinoda mari yakawanda.

Pasina chero yakakosha optimizations pane yedu, ine default marongero, kurodha data uye kudzoreredza kubva ku database kunoshanda nekumhanya kunoshamisa. Hatisati tine data yakawanda (inenge mamiriyoni mazana maviri ezvinyorwa), asi sevha pachayo haina simba. Tinogona kushandisa chishandiso ichi mune ramangwana kune zvimwe zvinangwa zvisina hukama nekuchengeta matanda. Semuenzaniso, kumagumo-kusvika-kumagumo analytics, mumunda wekuchengetedza, muchina kudzidza.

Pakupedzisira, zvishoma pamusoro pezvakanakira uye zvakaipira.

Минусы

  1. Kurodha marekodhi mumabhechi makuru. Kune rimwe divi, ichi chinhu, asi iwe uchiri kufanira kushandisa zvimwe zvikamu kubhafa marekodhi. Iri basa harisi nyore nguva dzose, asi richiri kugadziriswa. Uye ndinoda kurerutsa chirongwa.
  2. Mamwe eexotic mashandiro kana maficha matsva anowanzo putika mumavhezheni matsva. Izvi zvinomutsa kunetseka, kuderedza chido chekusimudzira kune imwe shanduro. Semuenzaniso, iyo Kafka tafura injini chinhu chinobatsira kwazvo chinokubvumira kuti uverenge zvakananga zviitiko kubva kuKafka, pasina kushandisa vatengi. Asi tichifunga nehuwandu hweMatambudziko paGithub, isu tichiri kutyira kushandisa injini iyi mukugadzira. Nekudaro, kana iwe ukasaita kamwe kamwe mafambiro kudivi uye kushandisa iyo yekutanga mashandiro, zvino inoshanda zvakadzikama.

Плюсы

  1. Hainonoke.
  2. Low kupinda chikumbaridzo.
  3. Open-source.
  4. Free.
  5. Scalable (kugova/kubuda-kwe-kwe-bhokisi kudzokorora)
  6. Inosanganisirwa murejista yeRussia software inokurudzirwa neBazi reKutaurirana.
  7. Kuwanikwa kwekutsigirwa kwepamutemo kubva kuYandex.

Source: www.habr.com

Voeg