ClickHouse Database for Humans, kapena Alien Technologies

Alexey Lizunov, wamkulu wa likulu la luso lamayendedwe akutali a Information Technology Directorate ya ICB

ClickHouse Database for Humans, kapena Alien Technologies

Monga m'malo mwa ELK stack (ElasticSearch, Logstash, Kibana), tikuchita kafukufuku wogwiritsa ntchito database ya ClickHouse ngati malo osungiramo zipika.

M'nkhaniyi tikufuna kulankhula za zomwe takumana nazo pogwiritsa ntchito database ya ClickHouse ndi zotsatira zoyambirira kuchokera ku ntchito yoyendetsa ndege. Ndikoyenera kuzindikira nthawi yomweyo kuti zotsatira zake zinali zochititsa chidwi.


ClickHouse Database for Humans, kapena Alien Technologies

Kenako tifotokoza mwatsatanetsatane momwe dongosolo lathu limapangidwira komanso kuti lili ndi zigawo ziti. Koma tsopano ndikufuna kulankhula pang'ono za database yonseyi, komanso chifukwa chake ndikofunikira kulabadira. Dongosolo la ClickHouse ndidawunilodi yowunikira bwino kwambiri yochokera ku Yandex. Zogwiritsidwa ntchito mu mautumiki a Yandex, poyamba izi ndizosungirako deta yaikulu ya Yandex.Metrica. Open-source system, yaulere. Kuchokera pamalingaliro a wopanga mapulogalamu, nthawi zonse ndimadzifunsa momwe amachitira izi, chifukwa pali deta yayikulu kwambiri. Ndipo mawonekedwe a Metrica pawokha ndi osinthika kwambiri ndipo amagwira ntchito mwachangu. Mukangodziwana ndi database iyi, mumamva kuti: "Chabwino, pomaliza! Zapangidwira “anthu”! Kuyambira pakukhazikitsa mpaka kutumiza zopempha."

Database iyi ili ndi chotchinga chotsika kwambiri. Ngakhale wokonza wamba akhoza kukhazikitsa database iyi mumphindi zochepa ndikuyamba kuigwiritsa ntchito. Zonse zimayenda bwino. Ngakhale anthu omwe ali atsopano ku Linux amatha kuthana ndi kukhazikitsa ndikuchita zosavuta. Ngati kale, pomva mawu akuti Big Data, Hadoop, Google BigTable, HDFS, wopanga mapulogalamu ambiri anali ndi lingaliro lakuti amalankhula za ma terabytes, ma petabytes, kuti ena opambana aumunthu adagwira nawo ntchito yokonza ndi kupanga machitidwewa, ndiye ndi Kubwera kwa Nawonsobe ya ClickHouse tili ndi chida chosavuta, chomveka chomwe mutha kuthana ndi mavuto omwe sanapezekepo kale. Zomwe zimatengera ndi makina amodzi okhazikika komanso mphindi zisanu kuti muyike. Ndiye kuti, tili ndi nkhokwe ngati, mwachitsanzo, MySql, koma kungosunga mabiliyoni a mbiri! Mtundu wa superarchiver wokhala ndi chilankhulo cha SQL. Zili ngati anthu anapatsidwa zida zachilendo.

Za dongosolo lathu lotolera logi

Kutolera zidziwitso, mafayilo amtundu wa IIS akugwiritsa ntchito masamba amtundu wokhazikika amagwiritsidwa ntchito (pakali pano tikugwira ntchito yogawa zipika, koma cholinga chathu chachikulu pakuyesa ndikutolera zipika za IIS).

Sitinathe kusiyiratu ELK stack pazifukwa zosiyanasiyana, ndipo tikupitiriza kugwiritsa ntchito zigawo za LogStash ndi Filebeat, zomwe zadziwonetsera bwino ndikugwira ntchito modalirika komanso zodziwikiratu.

Chiwembu chodula mitengo chikuwonetsedwa pachithunzi chili pansipa:

ClickHouse Database for Humans, kapena Alien Technologies

Mbali yojambulira deta mu nkhokwe ya ClickHouse ndizosawerengeka (kamodzi pa sekondi) kuyika zolemba m'magulu akuluakulu. Izi, mwachiwonekere, ndi gawo "lovuta" kwambiri lomwe mumakumana nalo pogwira ntchito ndi database ya ClickHouse kwa nthawi yoyamba: dongosololi limakhala lovuta kwambiri.
Pulogalamu yowonjezera ya LogStash, yomwe imayika mwachindunji deta mu ClickHouse, yathandiza kwambiri apa. Chigawochi chimayikidwa pa seva yomweyo monga database yokha. Chifukwa chake, nthawi zambiri, sikoyenera kuchita izi, koma kuchokera pamalingaliro othandiza, kuti musapange ma seva osiyana pomwe akutumizidwa pa seva yomweyo. Sitinawone kulephera kulikonse kapena kusamvana kwazinthu ndi nkhokwe. Kuphatikiza apo, ziyenera kudziwidwa kuti pulogalamu yowonjezera ili ndi makina obwereza ngati pachitika zolakwika. Ndipo pakakhala zolakwika, pulogalamu yowonjezera imalemba ku diski gulu la data lomwe silingayikidwe (mafayilo amafayilo ndi abwino: mutatha kusintha, mutha kuyika batch yokonzedwa mosavuta pogwiritsa ntchito kasitomala wa clickhouse).

Mndandanda wathunthu wa mapulogalamu omwe amagwiritsidwa ntchito mu chiwembu akuwonetsedwa patebulo:

Mndandanda wa mapulogalamu omwe amagwiritsidwa ntchito

Mutu

mafotokozedwe

Lumikizani kugawa

NGINX

Reverse-proxy yoletsa kulowa ndi doko ndi chilolezo chokonzekera

Pakali pano sichikugwiritsidwa ntchito mu chiwembu

https://nginx.org/ru/download.html

https://nginx.org/download/nginx-1.16.0.tar.gz

FileBeat

Kusamutsa zipika za fayilo.

https://www.elastic.co/downloads/beats/filebeat (kugawa kwa Windows 64bit).

https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.3.0-windows-x86_64.zip

LogStash

Wosonkhanitsa zipika.

Amagwiritsidwa ntchito kusonkhanitsa zipika kuchokera ku FileBeat, komanso kutolera zipika kuchokera pamzere wa RabbitMQ (kwa ma seva omwe ali mu DMZ.)

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

Logstash- output-clickhouse

Loagstash plugin yosinthira zipika ku database ya ClickHouse m'magulu

https://github.com/mikechris/logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin kukhazikitsa logstash-output-clickhouse

/usr/share/logstash/bin/logstash-plugin kukhazikitsa logstash-sefa-prune

/usr/share/logstash/bin/logstash-plugin kukhazikitsa logstash-sefa-multiline

Dinani Nyumba

Kusungirako zipika https://clickhouse.yandex/docs/ru/

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-server-19.5.3.8-1.el7.x86_64.rpm

https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-client-19.5.3.8-1.el7.x86_64.rpm

Zindikirani. Kuyambira mu Ogasiti 2018, "zabwinobwino" rpm imamanga RHEL idawonekera munkhokwe ya Yandex, kotero mutha kuyesa kuzigwiritsa ntchito. Pa nthawi yoyika tinali kugwiritsa ntchito phukusi lopangidwa ndi Altinity.

grafana

Mawonekedwe a matabwa. Kupanga ma dashboards

https://grafana.com/

https://grafana.com/grafana/download

Redhat & Centos(64 Bit) - mtundu waposachedwa

ClickHouse datasource ya Grafana 4.6+

Pulagi ya Grafana yokhala ndi gwero la data la ClickHouse

https://grafana.com/plugins/vertamedia-clickhouse-datasource

https://grafana.com/api/plugins/vertamedia-clickhouse-datasource/versions/1.8.1/download

LogStash

Log rauta kuchokera ku FileBeat kupita ku mzere wa RabbitMQ.

Zindikirani. Tsoka ilo FileBeat ilibe zotulutsa mwachindunji kwa RabbitMQ, kotero ulalo wapakatikati mwa mawonekedwe a Logstash ukufunika.

https://www.elastic.co/products/logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.0.1.rpm

KaluluMQ

Mzere wa mauthenga. Ichi ndi chosungira cha zolemba mu DMZ

https://www.rabbitmq.com/download.html

https://github.com/rabbitmq/rabbitmq-server/releases/download/v3.7.14/rabbitmq-server-3.7.14-1.el7.noarch.rpm

Erlang Runtime (Yofunikira pa RabbitMQ)

Nthawi yothamanga. Zofunikira kuti RabbitMQ igwire ntchito

http://www.erlang.org/download.html

https://www.rabbitmq.com/install-rpm.html#install-erlang http://www.erlang.org/downloads/21.3

Kukonzekera kwa seva ndi database ya ClickHouse kukuwonetsedwa patebulo ili:

Mutu

mtengo

ndemanga

Kukhazikika

HDD: 40GB
RAM: 8GB
Purosesa: Core 2 2Ghz

Muyenera kumvera malangizo ogwiritsira ntchito database ya ClickHouse (https://clickhouse.yandex/docs/ru/operations/tips/)

Pulogalamu yapadongosolo

OS: Red Hat Enterprise Linux Server (Maipo)

JRE (Java 8)

 

Monga mukuonera, iyi ndi ntchito yokhazikika.

Mapangidwe a tebulo losungira zipika ali motere:

log_web.sql

CREATE TABLE log_web (
  logdate Date,
  logdatetime DateTime CODEC(Delta, LZ4HC),
   
  fld_log_file_name LowCardinality( String ),
  fld_server_name LowCardinality( String ),
  fld_app_name LowCardinality( String ),
  fld_app_module LowCardinality( String ),
  fld_website_name LowCardinality( String ),
 
  serverIP LowCardinality( String ),
  method LowCardinality( String ),
  uriStem String,
  uriQuery String,
  port UInt32,
  username LowCardinality( String ),
  clientIP String,
  clientRealIP String,
  userAgent String,
  referer String,
  response String,
  subresponse String,
  win32response String,
  timetaken UInt64
   
  , uriQuery__utm_medium String
  , uriQuery__utm_source String
  , uriQuery__utm_campaign String
  , uriQuery__utm_term String
  , uriQuery__utm_content String
  , uriQuery__yclid String
  , uriQuery__region String
 
) Engine = MergeTree()
PARTITION BY toYYYYMM(logdate)
ORDER BY (fld_app_name, fld_app_module, logdatetime)
SETTINGS index_granularity = 8192;

Timagwiritsa ntchito zikhalidwe zokhazikika pakugawa (mwezi uliwonse) ndi index granularity. Magawo onse amafanana ndi zolemba za IIS zojambulira zopempha za http. Payokha, tikuwona kuti pali magawo osiyana osungira ma tag a utm (amagawidwa pagawo loyika patebulo kuchokera pagawo la zingwe zamafunso).

Komanso, magawo angapo amachitidwe awonjezedwa patebulo kuti asunge zambiri zamakina, zigawo, ndi ma seva. Kuti mudziwe zambiri za magawowa, onani tebulo ili m'munsimu. Mu tebulo limodzi timasungira zipika za machitidwe angapo.

Mutu

mafotokozedwe

Chitsanzo:

fld_app_name

Dzina la ntchito/dongosolo
Makhalidwe ovomerezeka:

  • site1.domain.com Tsamba lakunja 1
  • site2.domain.com Tsamba lakunja 2
  • Internal-site1.domain.local Internal malo 1

site1.domain.com

fld_app_module

System module
Makhalidwe ovomerezeka:

  • Webusayiti - Webusayiti
  • svc - Utumiki wapaintaneti
  • intgr - Ntchito yophatikizira pa intaneti
  • bo - Administrator (BackOffice)

Webusaiti

fld_website_name

Dzina latsamba mu IIS

Machitidwe angapo amatha kutumizidwa pa seva imodzi, kapena ngakhale maulendo angapo a gawo limodzi la dongosolo

tsamba lalikulu

fld_server_name

Dzina la seva

web1.domain.com

fld_log_file_name

Njira yopita ku fayilo ya log pa seva

Kuchokera ku:inetpublogsLogFiles
W3SVC1u_ex190711.log

Izi zimakupatsani mwayi wopanga ma graph ku Grafana. Mwachitsanzo, onani zopempha kuchokera kumapeto kwa dongosolo linalake. Izi ndizofanana ndi tsamba latsamba la Yandex.Metrica.

Nazi ziwerengero pakugwiritsa ntchito database kwa miyezi iwiri.

Chiwerengero cha zolembedwa ndi dongosolo ndi gawo

SELECT
    fld_app_name,
    fld_app_module,
    count(fld_app_name) AS rows_count
FROM log_web
GROUP BY
    fld_app_name,
    fld_app_module
    WITH TOTALS
ORDER BY
    fld_app_name ASC,
    rows_count DESC
 
┌─fld_app_name─────┬─fld_app_module─┬─rows_count─┐
│ site1.domain.ru  │ web            │     131441 │
│ site2.domain.ru  │ web            │    1751081 │
│ site3.domain.ru  │ web            │  106887543 │
│ site3.domain.ru  │ svc            │   44908603 │
│ site3.domain.ru  │ intgr          │    9813911 │
│ site4.domain.ru  │ web            │     772095 │
│ site5.domain.ru  │ web            │   17037221 │
│ site5.domain.ru  │ intgr          │     838559 │
│ site5.domain.ru  │ bo             │       7404 │
│ site6.domain.ru  │ web            │     595877 │
│ site7.domain.ru  │ web            │   27778858 │
└──────────────────┴────────────────┴────────────┘
 
Totals:
┌─fld_app_name─┬─fld_app_module─┬─rows_count─┐
│              │                │  210522593 │
└──────────────┴────────────────┴────────────┘
 
11 rows in set. Elapsed: 4.874 sec. Processed 210.52 million rows, 421.67 MB (43.19 million rows/s., 86.51 MB/s.)

Voliyumu ya data ya Disk

SELECT
    formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed,
    formatReadableSize(sum(data_compressed_bytes)) AS compressed,
    sum(rows) AS total_rows
FROM system.parts
WHERE table = 'log_web'
 
┌─uncompressed─┬─compressed─┬─total_rows─┐
│ 54.50 GiB    │ 4.86 GiB   │  211427094 │
└──────────────┴────────────┴────────────┘
 
1 rows in set. Elapsed: 0.035 sec.

Chiyerekezo cha compression ya data pagawo

SELECT
    name,
    formatReadableSize(data_uncompressed_bytes) AS uncompressed,
    formatReadableSize(data_compressed_bytes) AS compressed,
    data_uncompressed_bytes / data_compressed_bytes AS compress_ratio
FROM system.columns
WHERE table = 'log_web'
 
┌─name───────────────────┬─uncompressed─┬─compressed─┬─────compress_ratio─┐
│ logdate                │ 401.53 MiB   │ 1.80 MiB   │ 223.16665968777315 │
│ logdatetime            │ 803.06 MiB   │ 35.91 MiB  │ 22.363966401202305 │
│ fld_log_file_name      │ 220.66 MiB   │ 2.60 MiB   │  84.99905736932571 │
│ fld_server_name        │ 201.54 MiB   │ 50.63 MiB  │  3.980924816977078 │
│ fld_app_name           │ 201.17 MiB   │ 969.17 KiB │ 212.55518183686877 │
│ fld_app_module         │ 201.17 MiB   │ 968.60 KiB │ 212.67805817411906 │
│ fld_website_name       │ 201.54 MiB   │ 1.24 MiB   │  162.7204926761546 │
│ serverIP               │ 201.54 MiB   │ 50.25 MiB  │  4.010824061219731 │
│ method                 │ 201.53 MiB   │ 43.64 MiB  │  4.617721053304486 │
│ uriStem                │ 5.13 GiB     │ 832.51 MiB │  6.311522291936919 │
│ uriQuery               │ 2.58 GiB     │ 501.06 MiB │  5.269731450124478 │
│ port                   │ 803.06 MiB   │ 3.98 MiB   │ 201.91673864241824 │
│ username               │ 318.08 MiB   │ 26.93 MiB  │ 11.812513794583598 │
│ clientIP               │ 2.35 GiB     │ 82.59 MiB  │ 29.132328640073343 │
│ clientRealIP           │ 2.49 GiB     │ 465.05 MiB │  5.478382297052563 │
│ userAgent              │ 18.34 GiB    │ 764.08 MiB │  24.57905114484208 │
│ referer                │ 14.71 GiB    │ 1.37 GiB   │ 10.736792723669906 │
│ response               │ 803.06 MiB   │ 83.81 MiB  │  9.582334090987247 │
│ subresponse            │ 399.87 MiB   │ 1.83 MiB   │  218.4831068635027 │
│ win32response          │ 407.86 MiB   │ 7.41 MiB   │ 55.050315514606815 │
│ timetaken              │ 1.57 GiB     │ 402.06 MiB │ 3.9947395692010637 │
│ uriQuery__utm_medium   │ 208.17 MiB   │ 12.29 MiB  │ 16.936148912472955 │
│ uriQuery__utm_source   │ 215.18 MiB   │ 13.00 MiB  │ 16.548367623199912 │
│ uriQuery__utm_campaign │ 381.46 MiB   │ 37.94 MiB  │ 10.055156353418509 │
│ uriQuery__utm_term     │ 231.82 MiB   │ 10.78 MiB  │ 21.502540454070672 │
│ uriQuery__utm_content  │ 441.34 MiB   │ 87.60 MiB  │  5.038260760449327 │
│ uriQuery__yclid        │ 216.88 MiB   │ 16.58 MiB  │  13.07721335008116 │
│ uriQuery__region       │ 204.35 MiB   │ 9.49 MiB   │  21.52661903446796 │
└────────────────────────┴──────────────┴────────────┴────────────────────┘
 
28 rows in set. Elapsed: 0.005 sec.

Kufotokozera kwa zigawo zomwe zimagwiritsidwa ntchito

FileBeat. Kusamutsa zipika za fayilo

Chigawochi chimayang'anira kusintha kwa mafayilo olowera pa diski ndikupititsa chidziwitso ku LogStash. Imayikidwa pa maseva onse pomwe mafayilo a log amalembedwa (nthawi zambiri IIS). Imagwira ntchito mumchira (mwachitsanzo, imasamutsa zolemba zowonjezeredwa ku fayilo). Koma inu mukhoza padera sintha izo kusamutsa lonse owona. Izi ndi zabwino pamene muyenera kukopera deta kwa miyezi yapita. Ingoyikani fayilo ya chipika mufoda ndipo idzawerenga yonse.

Utumiki ukayima, deta imasiya kusamutsidwa kupita kumalo osungira.

Kukonzekera kwachitsanzo kumawoneka motere:

filebeat.yml

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/W3SVC1/*.log
  exclude_files: ['.gz$','.zip$']
  tail_files: true
  ignore_older: 24h
  fields:
    fld_server_name: "site1.domain.ru"
    fld_app_name: "site1.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
 
- type: log
  enabled: true
  paths:
    - C:/inetpub/logs/LogFiles/__Import/access_log-*
  exclude_files: ['.gz$','.zip$']
  tail_files: false
  fields:
    fld_server_name: "site2.domain.ru"
    fld_app_name: "site2.domain.ru"
    fld_app_module: "web"
    fld_website_name: "web-main"
    fld_logformat: "logformat__apache"
 
 
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
  reload.period: 2s
 
output.logstash:
  hosts: ["log.domain.com:5044"]
 
  ssl.enabled: true
  ssl.certificate_authorities: ["C:/filebeat/certs/ca.pem", "C:/filebeat/certs/ca-issuing.pem"]
  ssl.certificate: "C:/filebeat/certs/site1.domain.ru.cer"
  ssl.key: "C:/filebeat/certs/site1.domain.ru.key"
 
#================================ Processors =====================================
 
processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~

LogStash. Log Collector

Chigawochi chapangidwa kuti chilandire zolemba za chipika kuchokera ku FileBeat (kapena kudzera pamzere wa RabbitMQ), fufuzani ndikuziyika m'magulu mu database ya ClickHouse.

Kuti muyike mu ClickHouse, gwiritsani ntchito pulogalamu yowonjezera ya Logstash-output-clickhouse. Pulogalamu yowonjezera ya Logstash ili ndi njira yopezera zopempha, koma nthawi yotseka, ndibwino kuyimitsa ntchitoyo. Akayimitsidwa, mauthenga adzaunjikana pamzere wa RabbitMQ, kotero ngati kuyimitsidwa kwa nthawi yayitali, ndiye kuti ndi bwino kuyimitsa Filebeats pa maseva. Mu chiwembu chomwe RabbitMQ sichigwiritsidwa ntchito (pa netiweki yakomweko Filebeat imatumiza mwachindunji zipika ku Logstash), Filebeats imagwira ntchito yovomerezeka komanso yotetezeka, kotero kwa iwo kusapezeka kwa zotulutsa kulibe zotsatirapo.

Kukonzekera kwachitsanzo kumawoneka motere:

log_web__filebeat_clickhouse.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/logstash/certs/ca.cer", "/etc/logstash/certs/ca-issuing.cer"]
        ssl_certificate => "/etc/logstash/certs/server.cer"
        ssl_key => "/etc/logstash/certs/server-pkcs8.key"
        ssl_verify_mode => "peer"
 
            add_field => {
                "fld_server_name" => "%{[fields][fld_server_name]}"
                "fld_app_name" => "%{[fields][fld_app_name]}"
                "fld_app_module" => "%{[fields][fld_app_module]}"
                "fld_website_name" => "%{[fields][fld_website_name]}"
                "fld_log_file_name" => "%{source}"
                "fld_logformat" => "%{[fields][fld_logformat]}"
            }
    }
 
    rabbitmq {
        host => "queue.domain.com"
        port => 5671
        user => "q-reader"
        password => "password"
        queue => "web_log"
        heartbeat => 30
        durable => true
        ssl => true
        #ssl_certificate_path => "/etc/logstash/certs/server.p12"
        #ssl_certificate_password => "password"
 
        add_field => {
            "fld_server_name" => "%{[fields][fld_server_name]}"
            "fld_app_name" => "%{[fields][fld_app_name]}"
            "fld_app_module" => "%{[fields][fld_app_module]}"
            "fld_website_name" => "%{[fields][fld_website_name]}"
            "fld_log_file_name" => "%{source}"
            "fld_logformat" => "%{[fields][fld_logformat]}"
        }
    }
 
}
 
filter { 
 
      if [message] =~ "^#" {
        drop {}
      }
 
      if [fld_logformat] == "logformat__iis_with_xrealip" {
     
          grok {
            match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken} %{NOTSPACE:xrealIP} %{NOTSPACE:xforwarderfor}"]
          }
      } else {
   
          grok {
             match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{IP:serverIP} %{WORD:method} %{NOTSPACE:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:userAgent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:timetaken}"]
          }
 
      }
 
      date {
        match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
          timezone => "Etc/UTC"
        remove_field => [ "log_timestamp", "@timestamp" ]
        target => [ "log_timestamp2" ]
      }
 
        ruby {
            code => "tstamp = event.get('log_timestamp2').to_i
                        event.set('logdatetime', Time.at(tstamp).strftime('%Y-%m-%d %H:%M:%S'))
                        event.set('logdate', Time.at(tstamp).strftime('%Y-%m-%d'))"
        }
 
      if [bytesSent] {
        ruby {
          code => "event['kilobytesSent'] = event['bytesSent'].to_i / 1024.0"
        }
      }
 
 
      if [bytesReceived] {
        ruby {
          code => "event['kilobytesReceived'] = event['bytesReceived'].to_i / 1024.0"
        }
      }
 
   
        ruby {
            code => "event.set('clientRealIP', event.get('clientIP'))"
        }
        if [xrealIP] {
            ruby {
                code => "event.set('clientRealIP', event.get('xrealIP'))"
            }
        }
        if [xforwarderfor] {
            ruby {
                code => "event.set('clientRealIP', event.get('xforwarderfor'))"
            }
        }
 
      mutate {
        convert => ["bytesSent", "integer"]
        convert => ["bytesReceived", "integer"]
        convert => ["timetaken", "integer"] 
        convert => ["port", "integer"]
 
        add_field => {
            "clientHostname" => "%{clientIP}"
        }
      }
 
        useragent {
            source=> "useragent"
            prefix=> "browser"
        }
 
        kv {
            source => "uriQuery"
            prefix => "uriQuery__"
            allow_duplicate_values => false
            field_split => "&"
            include_keys => [ "utm_medium", "utm_source", "utm_campaign", "utm_term", "utm_content", "yclid", "region" ]
        }
 
        mutate {
            join => { "uriQuery__utm_source" => "," }
            join => { "uriQuery__utm_medium" => "," }
            join => { "uriQuery__utm_campaign" => "," }
            join => { "uriQuery__utm_term" => "," }
            join => { "uriQuery__utm_content" => "," }
            join => { "uriQuery__yclid" => "," }
            join => { "uriQuery__region" => "," }
        }
 
}
 
output { 
  #stdout {codec => rubydebug}
    clickhouse {
      headers => ["Authorization", "Basic abcdsfks..."]
      http_hosts => ["http://127.0.0.1:8123"]
      save_dir => "/etc/logstash/tmp"
      table => "log_web"
      request_tolerance => 1
      flush_size => 10000
      idle_flush_time => 1
        mutations => {
            "fld_log_file_name" => "fld_log_file_name"
            "fld_server_name" => "fld_server_name"
            "fld_app_name" => "fld_app_name"
            "fld_app_module" => "fld_app_module"
            "fld_website_name" => "fld_website_name"
 
            "logdatetime" => "logdatetime"
            "logdate" => "logdate"
            "serverIP" => "serverIP"
            "method" => "method"
            "uriStem" => "uriStem"
            "uriQuery" => "uriQuery"
            "port" => "port"
            "username" => "username"
            "clientIP" => "clientIP"
            "clientRealIP" => "clientRealIP"
            "userAgent" => "userAgent"
            "referer" => "referer"
            "response" => "response"
            "subresponse" => "subresponse"
            "win32response" => "win32response"
            "timetaken" => "timetaken"
             
            "uriQuery__utm_medium" => "uriQuery__utm_medium"
            "uriQuery__utm_source" => "uriQuery__utm_source"
            "uriQuery__utm_campaign" => "uriQuery__utm_campaign"
            "uriQuery__utm_term" => "uriQuery__utm_term"
            "uriQuery__utm_content" => "uriQuery__utm_content"
            "uriQuery__yclid" => "uriQuery__yclid"
            "uriQuery__region" => "uriQuery__region"
        }
    }
 
}

mapaipi.yml

# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
#   https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html
 
- pipeline.id: log_web__filebeat_clickhouse
  path.config: "/etc/logstash/log_web__filebeat_clickhouse.conf"

ClickHouse. Kusungirako zipika

Zipika zamakina onse zimasungidwa patebulo limodzi (onani koyambirira kwa nkhaniyi). Zapangidwa kuti zisunge zidziwitso za zopempha: magawo onse ndi ofanana ndi mawonekedwe osiyanasiyana, mwachitsanzo zipika za IIS, zipika za apache ndi nginx. Kwa zipika zogwiritsira ntchito zomwe, mwachitsanzo, zolakwika, mauthenga a mauthenga, machenjezo amalembedwa, tebulo lapadera lidzaperekedwa ndi dongosolo loyenera (pakali pano pa siteji ya mapangidwe).

Popanga tebulo, ndikofunika kwambiri kusankha pa kiyi yoyamba (yomwe deta idzasankhidwe panthawi yosungira). Mlingo wa kuponderezana kwa data ndi liwiro la mafunso zimadalira izi. Mu chitsanzo chathu, chinsinsi ndi
KONZANI (fld_app_name, fld_app_module, logdatetime)
Ndiko kuti, ndi dzina la dongosolo, dzina la gawo la dongosolo ndi tsiku la chochitikacho. Poyamba, tsiku la chochitikacho linali loyamba. Atasunthira kumalo omaliza, mafunso adayamba kugwira ntchito pafupifupi kuwirikiza kawiri. Kusintha kiyi yoyamba kudzafunika kupanganso tebulo ndikuyikanso deta kuti ClickHouse ikonzenso deta pa disk. Iyi ndi ntchito yovuta, choncho ndi bwino kuganizira pasadakhale zomwe ziyenera kuphatikizidwa mu kiyi yamtundu.

Tiyeneranso kudziwa kuti mtundu wa data wa LowCardinality udawonekera m'matembenuzidwe aposachedwa. Mukaigwiritsa ntchito, kukula kwa deta yoponderezedwa kumachepetsedwa kwambiri kwa minda yomwe ili ndi makadi otsika (zosankha zochepa).

Panopa tikugwiritsa ntchito mtundu wa 19.6 ndipo tikufuna kuyesa kusinthira ku mtundu waposachedwa. Iwo ali ndi zinthu zodabwitsa monga Adaptive Granularity, Kudumpha ma indices ndi DoubleDelta codec, mwachitsanzo.

Mwachikhazikitso, pakukhazikitsa kasinthidwe odula mitengo amayikidwa kuti atsate. Mitengo imazunguliridwa ndikusungidwa, koma nthawi yomweyo imakula mpaka gigabyte. Ngati palibe chosowa, ndiye kuti mutha kukhazikitsa mlingo wochenjeza, ndiye kukula kwa chipika kudzachepa kwambiri. Zokonda zodula zafotokozedwa mufayilo ya config.xml:

<!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger. h#L105 -->
<level>warning</level>

Malamulo ena othandiza

Поскольку оригинальные пакеты установки собираются по Debian, то для других версий Linux необходимо использовать пакеты собранные компанией Altinity.
 
Вот по этой ссылке есть инструкции с ссылками на их репозиторий: https://www.altinity.com/blog/2017/12/18/logstash-with-clickhouse
sudo yum search clickhouse-server
sudo yum install clickhouse-server.noarch
  
1. проверка статуса
sudo systemctl status clickhouse-server
 
2. остановка сервера
sudo systemctl stop clickhouse-server
 
3. запуск сервера
sudo systemctl start clickhouse-server
 
Запуск для выполнения запросов в многострочном режиме (выполнение после знака ";")
clickhouse-client --multiline
clickhouse-client --multiline --host 127.0.0.1 --password pa55w0rd
clickhouse-client --multiline --host 127.0.0.1 --port 9440 --secure --user default --password pa55w0rd
 
Плагин кликлауза для логстеш в случае ошибки в одной строке сохраняет всю пачку в файл /tmp/log_web_failed.json
Можно вручную исправить этот файл и попробовать залить его в БД вручную:
clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /tmp/log_web_failed__fixed.json
 
sudo mv /etc/logstash/tmp/log_web_failed.json /etc/logstash/tmp/log_web_failed__fixed.json
sudo chown user_dev /etc/logstash/tmp/log_web_failed__fixed.json
sudo clickhouse-client --host 127.0.0.1 --password password --query="INSERT INTO log_web FORMAT JSONEachRow" < /etc/logstash/tmp/log_web_failed__fixed.json
sudo mv /etc/logstash/tmp/log_web_failed__fixed.json /etc/logstash/tmp/log_web_failed__fixed_.json
 
выход из командной строки
quit;
## Настройка TLS
https://www.altinity.com/blog/2019/3/5/clickhouse-networking-part-2
 
openssl s_client -connect log.domain.com:9440 < /dev/null

LogStash. Log rauta kuchokera ku FileBeat kupita ku mzere wa RabbitMQ

Chigawochi chimagwiritsidwa ntchito poyendetsa zipika zochokera ku FileBeat kupita ku mzere wa RabbitMQ. Pali mfundo ziwiri apa:

  1. Tsoka ilo, FileBeat ilibe pulogalamu yowonjezera yolembera mwachindunji ku RabbitMQ. Ndipo magwiridwe antchito otere, kuweruza positi pa github yawo, sanakonzekere kukhazikitsidwa. Pali pulogalamu yowonjezera ya Kafka, koma pazifukwa zina sitingathe kuigwiritsa ntchito tokha.
  2. Pali zofunikira pakutolera zipika mu DMZ. Kutengera iwo, zipikazo ziyenera kuyikidwa kaye pamzere kenako LogStash imawerenga zolemba kuchokera pamzere kunja.

Chifukwa chake, makamaka pankhani ya ma seva omwe ali mu DMZ, ndikofunikira kugwiritsa ntchito chiwembu chovuta kwambiri chotere. Kukonzekera kwachitsanzo kumawoneka motere:

iis_w3c_logs__filebeat_rabbitmq.conf

input {
 
    beats {
        port => 5044
        type => 'iis'
        ssl => true
        ssl_certificate_authorities => ["/etc/pki/tls/certs/app/ca.pem", "/etc/pki/tls/certs/app/ca-issuing.pem"]
        ssl_certificate => "/etc/pki/tls/certs/app/queue.domain.com.cer"
        ssl_key => "/etc/pki/tls/certs/app/queue.domain.com-pkcs8.key"
        ssl_verify_mode => "peer"
    }
 
}
 
output { 
  #stdout {codec => rubydebug}
 
    rabbitmq {
        host => "127.0.0.1"
        port => 5672
        exchange => "monitor.direct"
        exchange_type => "direct"
        key => "%{[fields][fld_app_name]}"
        user => "q-writer"
        password => "password"
        ssl => false
    }
}

RabbitMQ. Mndandanda wa Mauthenga

Chigawochi chimagwiritsidwa ntchito kusungitsa zolemba mu DMZ. Kujambula kumachitika kudzera pa Filebeat → LogStash ulalo. Kuwerenga kumachitika kuchokera kunja kwa DMZ kudzera pa LogStash. Mukamagwiritsa ntchito RabbitMQ, pafupifupi mauthenga 4 zikwi pa sekondi iliyonse amakonzedwa.

Mayendedwe a mauthenga amakonzedwa ndi dzina la dongosolo, mwachitsanzo, pogwiritsa ntchito deta yokonzekera FileBeat. Mauthenga onse amapita pamzere umodzi. Ngati pazifukwa zina ntchito ya pamzere itayimitsidwa, izi sizingabweretse kutayika kwa uthenga: FileBeats adzalandira zolakwika zolumikizana ndipo adzasiya kutumiza kwakanthawi. Ndipo LogStash, yomwe imawerenga pamzere, idzalandiranso zolakwika zapaintaneti ndikudikirira kuti kulumikizana kubwezeretsedwe. Pankhaniyi, ndithudi, deta sidzalembedwanso ku database.

Malangizo otsatirawa amagwiritsidwa ntchito popanga ndi kukonza mizere:

sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare exchange --vhost=/ name=monitor.direct type=direct sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin declare queue --vhost=/ name=web_log durable=true
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site1.domain.ru"
sudo /usr/local/bin/rabbitmqadmin/rabbitmqadmin --vhost="/" declare binding source="monitor.direct" destination_type="queue" destination="web_log" routing_key="site2.domain.ru"

Grafana. Dashboards

Chigawochi chimagwiritsidwa ntchito powonera deta yowunika. Pankhaniyi, muyenera kukhazikitsa ClickHouse datasource ya Grafana 4.6+ plugin. Tidayenera kuyisintha pang'ono kuti tithandizire kukonza zosefera za SQL pa dashboard.

Mwachitsanzo, timagwiritsa ntchito zosintha, ndipo ngati sizinatchulidwe m'gawo la zosefera, ndiye kuti tikufuna kuti zisapange chikhalidwe PAMENE mawonekedwe ( uriStem = "NDI uriStem != "). Pankhaniyi, ClickHouse iwerenga ndime ya uriStem. Chifukwa chake, tidayesa zosankha zosiyanasiyana ndipo pomaliza tidakonza pulogalamu yowonjezera (the $valueIfEmpty macro) kuti tibwerere 1 ngati ilibe phindu, osatchula gawo lokha.

Ndipo tsopano mutha kugwiritsa ntchito funsoli pa graph

$columns(response, count(*) c) from $table where $adhoc
and $valueIfEmpty($fld_app_name, 1, fld_app_name = '$fld_app_name')
and $valueIfEmpty($fld_app_module, 1, fld_app_module = '$fld_app_module') and $valueIfEmpty($fld_server_name, 1, fld_server_name = '$fld_server_name') and $valueIfEmpty($uriStem, 1, uriStem like '%$uriStem%')
and $valueIfEmpty($clientRealIP, 1, clientRealIP = '$clientRealIP')

zomwe zimasinthidwa kukhala SQL monga chonchi (zindikirani kuti minda yopanda kanthu ya uriStem imasinthidwa kukhala 1 yokha)

SELECT
t,
groupArray((response, c)) AS groupArr
FROM (
SELECT
(intDiv(toUInt32(logdatetime), 60) * 60) * 1000 AS t, response,
count(*) AS c FROM default.log_web
WHERE (logdate >= toDate(1565061982)) AND (logdatetime >= toDateTime(1565061982)) AND 1 AND (fld_app_name = 'site1.domain.ru') AND (fld_app_module = 'web') AND 1 AND 1 AND 1
GROUP BY
t, response
ORDER BY
t ASC,
response ASC
)
GROUP BY t ORDER BY t ASC

Pomaliza

Maonekedwe a database ya ClickHouse akhala chochitika chodziwika bwino pamsika. Zinali zovuta kuganiza kuti nthawi yomweyo, kwaulere, tili ndi chida champhamvu komanso chothandiza chogwirira ntchito ndi data yayikulu. Zoonadi, pamene zosowa zikuwonjezeka (mwachitsanzo, sharding ndi kubwereza kwa ma seva angapo), ndondomekoyi idzakhala yovuta kwambiri. Koma malinga ndi zoyamba, kugwira ntchito ndi database iyi ndikosangalatsa kwambiri. Zikuwonekeratu kuti mankhwalawa amapangidwira "anthu".

Poyerekeza ndi ElasticSearch, mtengo wosungira ndi kukonza zipika, malinga ndi kuyerekezera koyambirira, umachepetsedwa kasanu mpaka khumi. Mwa kuyankhula kwina, ngati kuchuluka kwa deta komweko tikuyenera kukhazikitsa gulu la makina angapo, ndiye kuti tikamagwiritsa ntchito ClickHouse timangofunika makina amodzi otsika kwambiri. Inde, ElasticSearch ilinso ndi makina osindikizira a data pa disk ndi zina zomwe zingathe kuchepetsa kwambiri kugwiritsa ntchito zipangizo, koma poyerekeza ndi ClickHouse izi zidzafuna ndalama zambiri.

Popanda kukhathamiritsa kwapadera kulikonse kumbali yathu, ndi zoikamo zosasinthika, kutsitsa deta ndikuchotsanso deta kuchokera ku database kumagwira ntchito mwachangu kwambiri. Tilibe deta yambiri (pafupifupi 200 miliyoni zolemba), koma seva yokhayo ndi yofooka. Titha kugwiritsa ntchito chida ichi m'tsogolo pazinthu zina zosakhudzana ndi kusunga zipika. Mwachitsanzo, pakuwunika komaliza, pankhani yachitetezo, kuphunzira pamakina.

Pamapeto pake, pang'ono za ubwino ndi kuipa.

Минусы

  1. Kutsegula zolemba m'magulu akuluakulu. Kumbali imodzi, iyi ndi mawonekedwe, komabe muyenera kugwiritsa ntchito zina zowonjezera kuti musungire zolemba. Ntchito imeneyi si nthawi zonse yosavuta, koma solvable. Ndipo ndikufuna kufewetsa chiwembucho.
  2. Zina zachilendo kapena zatsopano nthawi zambiri zimatuluka m'mitundu yatsopano. Izi zimadzetsa nkhawa, kuchepetsa chikhumbo chofuna kusinthira ku mtundu watsopano. Mwachitsanzo, injini ya tebulo la Kafka ndi chinthu chothandiza kwambiri chomwe chimakulolani kuti muwerenge zochitika kuchokera ku Kafka, popanda kugwiritsa ntchito ogula. Koma potengera kuchuluka kwa Mavuto pa Github, timasamalabe kugwiritsa ntchito injiniyi popanga. Komabe, ngati simupanga mayendedwe mwadzidzidzi kumbali ndikugwiritsa ntchito zofunikira, ndiye kuti zimagwira ntchito mokhazikika.

Плюсы

  1. Sizichedwa.
  2. Malo otsika olowera.
  3. Open source.
  4. Kwaulere.
  5. Scalable (kugawana / kunja kwa bokosi kubwereza)
  6. Kuphatikizidwa mu kaundula wa mapulogalamu Russian analimbikitsa ndi Utumiki wa Communications.
  7. Kupezeka kwa chithandizo chovomerezeka kuchokera ku Yandex.

Source: www.habr.com

Kuwonjezera ndemanga